[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
argv_ref patch 27: allow NUL through more builtins
From: |
Eric Blake |
Subject: |
argv_ref patch 27: allow NUL through more builtins |
Date: |
Wed, 03 Dec 2008 21:29:21 -0700 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.18) Gecko/20081105 Thunderbird/2.0.0.18 Mnenhy/0.7.5.666 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
It's been a while since I've worked on porting the argv_ref branch to
master, but I finally got one done. This fixes regular expressions,
format, and translit to transparently handle NUL; mostly a matter of
passing lengths around rather than NUL-termination (thank heavens that
glibc, and thus gnulib, already provide a regex interface that handles
NUL). No big change in speed or memory usage. And as long as I was
editing format, I made it better able to detect excess or missing
arguments. Meanwhile, I didn't see a way to use NUL in eval, so that now
warns.
Stage 27: Allow embedded NUL in text processing macros.
Pass NUL through regular expressions, format, and translit, and
diagnose it in eval. Improve warning capabilities of format.
Memory impact: none.
Speed impact: none noticed.
* src/m4.h (evaluate): Add parameter.
* src/builtin.c (compile_pattern) [DEBUG_REGEX]: Support NUL in
output messages.
(set_macro_sequence): Likewise.
(m4_eval): Normalize messages, and adjust caller.
(expand_ranges, substitute): Support NUL in macro expansion.
(m4_translit, m4_regexp, m4_patsubst): Adjust callers, to manage
NUL bytes.
* src/format.c (expand_format): Manage NUL bytes.
* src/eval.c (eval_error): Add EMPTY_ARGUMENT.
(end_text): New variable.
(eval_init_lex): Add parameter.
(eval_lex, evaluate): Detect NUL in macro expansion.
* doc/m4.texinfo (Format): Update to cover new behavior.
(Eval): Mention that result is unquoted.
* examples/null.m4: Enhance test.
* examples/null.err: Update expected output.
* examples/null.out: Likewise.
- --
Don't work too hard, make some time for fun as well!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkk3XKEACgkQ84KuGfSFAYCAvACeMDWbgh+2qAadcmFkWGzTUh83
RfkAoKzg1pjjyqRPgXeD6fvTEqXbvi1V
=8FT2
-----END PGP SIGNATURE-----
From 41d0c77062c8046730101d82b56f682edc70957e Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 2 Dec 2008 22:51:14 -0700
Subject: [PATCH] Stage 27: Allow embedded NUL in text processing macros.
* modules/m4.c (m4_expand_ranges): Don't append extra bytes.
(translit): Manage NUL bytes.
* modules/format.c (format): Likewise.
* modules/gnu.c (substitute, regexp_substitute): Likewise.
(m4_resyntax_encode_safe): Add parameter.
(regexp, patsubst, renamesyms): Update callers.
(regexp_compile): Adjust error message.
* modules/evalparse.c (m4_evaluate): Use consistent message.
(end_text): New variable.
(eval_init_lex): Add parameter.
(eval_lex): Detect embedded NUL.
* src/freeze.c (reload_frozen_state): Likewise.
* doc/m4.texinfo (Format): Update to cover new behavior.
(Eval): Mention that result is unquoted.
* tests/freeze.at (reloading nul): Enhance test.
* tests/null.m4: Likewise.
* tests/null.err: Update expected output.
* tests/null.out: Likewise.
* tests/options.at (--regexp-syntax): Likewise.
Signed-off-by: Eric Blake <address@hidden>
---
ChangeLog | 28 +++++++++++
doc/m4.texinfo | 13 ++++-
modules/evalparse.c | 15 ++++--
modules/format.c | 43 +++++++++++------
modules/gnu.c | 127 ++++++++++++++++++++++++++++++---------------------
modules/m4.c | 57 ++++++++++++++++-------
src/freeze.c | 7 ++-
tests/freeze.at | 6 ++
tests/null.err | 24 ++++++++--
tests/null.m4 | 52 +++++++++++++++------
tests/null.out | 7 ++-
tests/options.at | 8 ++--
12 files changed, 265 insertions(+), 122 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index b350524..69849a9 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,31 @@
+2008-12-02 Eric Blake <address@hidden>
+
+ Stage 27: Allow embedded NUL in text processing macros.
+ Pass NUL through regular expressions, format, and translit, and
+ diagnose it in eval and changeresyntax. Improve warning
+ capabilities of format.
+ Memory impact: none.
+ Speed impact: none noticed.
+ * modules/m4.c (m4_expand_ranges): Don't append extra bytes.
+ (translit): Manage NUL bytes.
+ * modules/format.c (format): Likewise.
+ * modules/gnu.c (substitute, regexp_substitute): Likewise.
+ (m4_resyntax_encode_safe): Add parameter.
+ (regexp, patsubst, renamesyms): Update callers.
+ (regexp_compile): Adjust error message.
+ * modules/evalparse.c (m4_evaluate): Use consistent message.
+ (end_text): New variable.
+ (eval_init_lex): Add parameter.
+ (eval_lex): Detect embedded NUL.
+ * src/freeze.c (reload_frozen_state): Likewise.
+ * doc/m4.texinfo (Format): Update to cover new behavior.
+ (Eval): Mention that result is unquoted.
+ * tests/freeze.at (reloading nul): Enhance test.
+ * tests/null.m4: Likewise.
+ * tests/null.err: Update expected output.
+ * tests/null.out: Likewise.
+ * tests/options.at (--regexp-syntax): Likewise.
+
2008-11-28 Eric Blake <address@hidden>
Resync NEWS with branches.
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 0287a60..bee9aec 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -7288,7 +7288,7 @@ Format
@example
format(`%p', `0')
@error{}m4:stdin:1: Warning: format: unrecognized specifier in `%p'
address@hidden
address@hidden
format(`%*d', `')
@error{}m4:stdin:2: Warning: format: empty string treated as 0
@error{}m4:stdin:2: Warning: format: too few arguments: 2 < 3
@@ -7605,7 +7605,9 @@ Eval
@var{radix} is the empty string. A warning results if the radix is
outside the range of 1 through 36, inclusive. The result of @code{eval}
is always taken to be signed. No radix prefix is output, and for
-radices greater than 10, the digits are lower case. The @var{width}
+radices greater than 10, the digits are lower case (although some
+other implementations use upper case). The output is unquoted, and
+subject to further macro expansion. The @var{width}
argument specifies the minimum output width, excluding any negative
sign. The result is zero-padded to extend the expansion to the
requested width. A warning results if the width is negative. If
@@ -7636,8 +7638,13 @@ Eval
@error{}m4:stdin:10: Warning: eval: negative width: -1
@result{}
eval()
address@hidden:stdin:11: Warning: eval: empty string treated as zero
address@hidden:stdin:11: Warning: eval: empty string treated as 0
address@hidden
+eval(` ')
address@hidden:stdin:12: Warning: eval: empty string treated as 0
@result{}0
+define(`a', `hi')eval(` 10 ', `16')
address@hidden
@end example
@node Mpeval
diff --git a/modules/evalparse.c b/modules/evalparse.c
index 8ad7182..9927e13 100644
--- a/modules/evalparse.c
+++ b/modules/evalparse.c
@@ -99,10 +99,15 @@ static const char *eval_text;
can back up, if we have read too much. */
static const char *last_text;
+/* Detect when to end parsing. */
+static const char *end_text;
+
+/* Prime the lexer at the start of TEXT, with length LEN. */
static void
-eval_init_lex (const char *text)
+eval_init_lex (const char *text, size_t len)
{
eval_text = text;
+ end_text = text + len;
last_text = NULL;
}
@@ -119,12 +124,12 @@ eval_undo (void)
static eval_token
eval_lex (number *val)
{
- while (isspace (to_uchar (*eval_text)))
+ while (eval_text != end_text && isspace (to_uchar (*eval_text)))
eval_text++;
last_text = eval_text;
- if (*eval_text == '\0')
+ if (eval_text == end_text)
return EOTEXT;
if (isdigit (to_uchar (*eval_text)))
@@ -915,13 +920,13 @@ m4_evaluate (m4 *context, m4_obstack *obs, size_t argc,
m4_macro_args *argv)
}
numb_initialise ();
- eval_init_lex (str);
+ eval_init_lex (str, M4ARGLEN (1));
numb_init (val);
et = eval_lex (&val);
if (et == EOTEXT)
{
- m4_warn (context, 0, me, _("empty string treated as zero"));
+ m4_warn (context, 0, me, _("empty string treated as 0"));
numb_set (val, numb_ZERO);
}
else
diff --git a/modules/format.c b/modules/format.c
index e2a1a42..af983cd 100644
--- a/modules/format.c
+++ b/modules/format.c
@@ -123,11 +123,12 @@ format (m4 *context, m4_obstack *obs, int argc,
m4_macro_args *argv)
{
const m4_call_info *me = m4_arg_info (argv);
const char *f; /* Format control string. */
+ size_t f_len; /* Length of f. */
const char *fmt; /* Position within f. */
char fstart[] = "%'+- 0#*.*hhd"; /* Current format spec. */
char *p; /* Position within fstart. */
unsigned char c; /* A simple character. */
- int i = 0; /* Index within argc used so far. */
+ int i = 1; /* Index within argc used so far. */
bool valid_format = true; /* True if entire format string ok. */
/* Flags. */
@@ -156,25 +157,24 @@ format (m4 *context, m4_obstack *obs, int argc,
m4_macro_args *argv)
int result = 0;
enum {CHAR, INT, LONG, DOUBLE, STR} datatype;
- f = fmt = ARG_STR (i, argc, argv);
+ f = fmt = M4ARG (1);
+ f_len = M4ARGLEN (1);
+ assert (!f[f_len]); /* Requiring a terminating NUL makes parsing simpler. */
memset (ok, 0, sizeof ok);
- while (true)
+ while (f_len--)
{
- while ((c = *fmt++) != '%')
+ c = *fmt++;
+ if (c != '%')
{
- if (c == '\0')
- {
- if (valid_format)
- m4_bad_argc (context, argc, me, i, i, true);
- return;
- }
obstack_1grow (obs, c);
+ continue;
}
if (*fmt == '%')
{
obstack_1grow (obs, '%');
fmt++;
+ f_len--;
continue;
}
@@ -225,7 +225,7 @@ format (m4 *context, m4_obstack *obs, int argc,
m4_macro_args *argv)
break;
}
}
- while (!(flags & DONE) && fmt++);
+ while (!(flags & DONE) && (f_len--, fmt++));
if (flags & THOUSANDS)
*p++ = '\'';
if (flags & PLUS)
@@ -247,12 +247,14 @@ format (m4 *context, m4_obstack *obs, int argc,
m4_macro_args *argv)
{
width = ARG_INT (i, argc, argv);
fmt++;
+ f_len--;
}
else
while (isdigit ((unsigned char) *fmt))
{
width = 10 * width + *fmt - '0';
fmt++;
+ f_len--;
}
/* Maximum precision; an explicit negative precision is the same
@@ -263,10 +265,12 @@ format (m4 *context, m4_obstack *obs, int argc,
m4_macro_args *argv)
if (*fmt == '.')
{
ok['c'] = 0;
+ f_len--;
if (*(++fmt) == '*')
{
prec = ARG_INT (i, argc, argv);
++fmt;
+ f_len--;
}
else
{
@@ -275,6 +279,7 @@ format (m4 *context, m4_obstack *obs, int argc,
m4_macro_args *argv)
{
prec = 10 * prec + *fmt - '0';
fmt++;
+ f_len--;
}
}
}
@@ -285,30 +290,34 @@ format (m4 *context, m4_obstack *obs, int argc,
m4_macro_args *argv)
*p++ = 'l';
lflag = 1;
fmt++;
+ f_len--;
ok['c'] = ok['s'] = 0;
}
else if (*fmt == 'h')
{
*p++ = 'h';
fmt++;
+ f_len--;
if (*fmt == 'h')
{
*p++ = 'h';
fmt++;
+ f_len--;
}
ok['a'] = ok['A'] = ok['c'] = ok['e'] = ok['E'] = ok['f'] = ok['F']
= ok['g'] = ok['G'] = ok['s'] = 0;
}
- c = *fmt++;
- if (c > sizeof ok || !ok[c])
+ c = *fmt;
+ if (c > sizeof ok || !ok[c] || !f_len)
{
- m4_warn (context, 0, me, _("unrecognized specifier in `%s'"), f);
+ m4_warn (context, 0, me, _("unrecognized specifier in %s"),
+ quotearg_style_mem (locale_quoting_style, f, M4ARGLEN (1)));
valid_format = false;
- if (c == '\0')
- fmt--;
continue;
}
+ fmt++;
+ f_len--;
/* Specifiers. We don't yet recognize C, S, n, or p. */
switch (c)
@@ -382,4 +391,6 @@ format (m4 *context, m4_obstack *obs, int argc,
m4_macro_args *argv)
we constructed fstart, the result should not be negative. */
assert (0 <= result);
}
+ if (valid_format)
+ m4_bad_argc (context, argc, me, i, i, true);
}
diff --git a/modules/gnu.c b/modules/gnu.c
index fd557eb..8ad1722 100644
--- a/modules/gnu.c
+++ b/modules/gnu.c
@@ -167,8 +167,8 @@ regexp_compile (m4 *context, const m4_call_info *caller,
const char *regexp,
if (msg != NULL)
{
- m4_error (context, 0, 0, caller, _("bad regular expression `%s': %s"),
- regexp, msg);
+ m4_warn (context, 0, caller, _("bad regular expression %s: %s"),
+ quotearg_style_mem (locale_quoting_style, regexp, len), msg);
regfree (pat);
free (pat);
return NULL;
@@ -225,28 +225,38 @@ regexp_search (m4_pattern_buffer *buf, const char
*string, const int size,
/* Function to perform substitution by regular expressions. Used by
the builtins regexp, patsubst and renamesyms. The changed text is
- placed on the obstack OBS. The substitution is REPL, with \&
- substituted by this part of VICTIM matched by the last whole
- regular expression, and \N substituted by the text matched by the
- Nth parenthesized sub-expression in BUF. Any warnings are issued
- on behalf of CALLER. BUF may be NULL for the empty regex. */
+ placed on the obstack OBS. The substitution is REPL of length
+ REPL_LEN, with \& substituted by this part of VICTIM matched by the
+ last whole regular expression, and \N substituted by the text
+ matched by the Nth parenthesized sub-expression in BUF. Any
+ warnings are issued on behalf of CALLER. BUF may be NULL for the
+ empty regex. */
static void
substitute (m4 *context, m4_obstack *obs, const m4_call_info *caller,
- const char *victim, const char *repl, m4_pattern_buffer *buf)
+ const char *victim, const char *repl, size_t repl_len,
+ m4_pattern_buffer *buf)
{
int ch;
- for (;;)
+ while (repl_len--)
{
- while ((ch = *repl++) != '\\')
+ ch = *repl++;
+ if (ch != '\\')
{
- if (ch == '\0')
- return;
obstack_1grow (obs, ch);
+ continue;
+ }
+ if (!repl_len)
+ {
+ m4_warn (context, 0, caller,
+ _("trailing \\ ignored in replacement"));
+ return;
}
- switch ((ch = *repl++))
+ ch = *repl++;
+ repl_len--;
+ switch (ch)
{
case '&':
if (buf)
@@ -265,11 +275,6 @@ substitute (m4 *context, m4_obstack *obs, const
m4_call_info *caller,
buf->regs.end[ch] - buf->regs.start[ch]);
break;
- case '\0':
- m4_warn (context, 0, caller,
- _("trailing \\ ignored in replacement"));
- return;
-
default:
obstack_1grow (obs, ch);
break;
@@ -278,18 +283,19 @@ substitute (m4 *context, m4_obstack *obs, const
m4_call_info *caller,
}
-/* For each match against compiled REGEXP (held in BUF -- as returned
- by regexp_compile) in VICTIM, substitute REPLACE. Non-matching
- characters are copied verbatim, and the result copied to the
- obstack. Errors are reported on behalf of CALLER. Return true if
- a substitution was made. If OPTIMIZE is set, don't worry about
- copying the input if no changes are made. */
+/* For each match against REGEXP of length REGEXP_LEN (precompiled in
+ BUF as returned by regexp_compile) in VICTIM of length LEN,
+ substitute REPLACE of length REPL_LEN. Non-matching characters are
+ copied verbatim, and the result copied to the obstack. Errors are
+ reported on behalf of CALLER. Return true if a substitution was
+ made. If OPTIMIZE is set, don't worry about copying the input if
+ no changes are made. */
static bool
regexp_substitute (m4 *context, m4_obstack *obs, const m4_call_info *caller,
const char *victim, size_t len, const char *regexp,
- m4_pattern_buffer *buf, const char *replace,
- bool optimize)
+ size_t regexp_len, m4_pattern_buffer *buf,
+ const char *replace, size_t repl_len, bool optimize)
{
regoff_t matchpos = 0; /* start position of match */
size_t offset = 0; /* current match offset */
@@ -309,7 +315,9 @@ regexp_substitute (m4 *context, m4_obstack *obs, const
m4_call_info *caller,
if (matchpos == -2)
m4_error (context, 0, 0, caller,
- _("error matching regular expression `%s'"), regexp);
+ _("problem matching regular expression %s"),
+ quotearg_style_mem (locale_quoting_style, regexp,
+ regexp_len));
else if (offset < len && subst)
obstack_grow (obs, victim + offset, len - offset);
break;
@@ -322,7 +330,7 @@ regexp_substitute (m4 *context, m4_obstack *obs, const
m4_call_info *caller,
/* Handle the part of the string that was covered by the match. */
- substitute (context, obs, caller, victim, replace, buf);
+ substitute (context, obs, caller, victim, replace, repl_len, buf);
subst = true;
/* Update the offset to the end of the match. If the regexp
@@ -465,18 +473,24 @@ M4BUILTIN_HANDLER (builtin)
}
-/* Change the current regexp syntax to SPEC, or report failure on
- behalf of CALLER. Currently this affects the builtins: `patsubst',
- `regexp' and `renamesyms'. */
+/* Change the current regexp syntax to SPEC of length LEN, or report
+ failure on behalf of CALLER. Currently this affects the builtins:
+ `patsubst', `regexp' and `renamesyms'. */
static int
m4_resyntax_encode_safe (m4 *context, const m4_call_info *caller,
- const char *spec)
+ const char *spec, size_t len)
{
- int resyntax = m4_regexp_syntax_encode (spec);
+ int resyntax;
+
+ if (strlen (spec) < len)
+ resyntax = -1;
+ else
+ resyntax = m4_regexp_syntax_encode (spec);
if (resyntax < 0)
- m4_warn (context, 0, caller, _("bad syntax-spec: `%s'"), spec);
+ m4_warn (context, 0, caller, _("bad syntax-spec: %s"),
+ quotearg_style_mem (locale_quoting_style, spec, len));
return resyntax;
}
@@ -488,7 +502,7 @@ m4_resyntax_encode_safe (m4 *context, const m4_call_info
*caller,
M4BUILTIN_HANDLER (changeresyntax)
{
int resyntax = m4_resyntax_encode_safe (context, m4_arg_info (argv),
- M4ARG (1));
+ M4ARG (1), M4ARGLEN (1));
if (resyntax >= 0)
m4_set_regexp_syntax_opt (context, resyntax);
@@ -749,31 +763,32 @@ M4BUILTIN_HANDLER (patsubst)
m4_pattern_buffer *buf; /* compiled regular expression */
int resyntax;
- pattern = M4ARG (2);
- replace = M4ARG (3);
-
resyntax = m4_get_regexp_syntax_opt (context);
if (argc >= 5) /* additional args ignored */
{
- resyntax = m4_resyntax_encode_safe (context, me, M4ARG (4));
+ resyntax = m4_resyntax_encode_safe (context, me, M4ARG (4),
+ M4ARGLEN (4));
if (resyntax < 0)
return;
}
/* The empty regex matches everywhere, but if there is no
replacement, we need not waste time with it. */
- if (!*pattern && !*replace)
+ if (m4_arg_empty (argv, 2) && m4_arg_empty (argv, 3))
{
m4_push_arg (context, obs, argv, 1);
return;
}
+ pattern = M4ARG (2);
+ replace = M4ARG (3);
+
buf = regexp_compile (context, me, pattern, M4ARGLEN (2), resyntax);
if (!buf)
return;
- regexp_substitute (context, obs, me, M4ARG (1), M4ARGLEN (1),
- pattern, buf, replace, false);
+ regexp_substitute (context, obs, me, M4ARG (1), M4ARGLEN (1), pattern,
+ M4ARGLEN (2), buf, replace, M4ARGLEN (3), false);
}
@@ -810,7 +825,7 @@ M4BUILTIN_HANDLER (regexp)
is a valid RESYNTAX, yet we want `regexp(aab, a*, )' to return
an empty string as per M4 1.4.x. */
- if ((*replace == '\0') || (resyntax < 0))
+ if (m4_arg_empty (argv, 3) || (resyntax < 0))
/* regexp(VICTIM, REGEXP, REPLACEMENT) */
resyntax = m4_get_regexp_syntax_opt (context);
else
@@ -820,7 +835,8 @@ M4BUILTIN_HANDLER (regexp)
else if (argc >= 5)
{
/* regexp(VICTIM, REGEXP, REPLACEMENT, RESYNTAX) */
- resyntax = m4_resyntax_encode_safe (context, me, M4ARG (4));
+ resyntax = m4_resyntax_encode_safe (context, me, M4ARG (4),
+ M4ARGLEN (4));
if (resyntax < 0)
return;
}
@@ -828,11 +844,11 @@ M4BUILTIN_HANDLER (regexp)
/* regexp(VICTIM, REGEXP) */
replace = NULL;
- if (!*pattern)
+ if (m4_arg_empty (argv, 2))
{
/* The empty regex matches everything. */
if (replace)
- substitute (context, obs, me, M4ARG (1), replace, NULL);
+ substitute (context, obs, me, M4ARG (1), replace, M4ARGLEN (3), NULL);
else
m4_shipout_int (obs, 0);
return;
@@ -848,15 +864,16 @@ M4BUILTIN_HANDLER (regexp)
if (startpos == -2)
{
- m4_error (context, 0, 0, me, _("error matching regular expression `%s'"),
- pattern);
+ m4_error (context, 0, 0, me, _("problem matching regular expression %s"),
+ quotearg_style_mem (locale_quoting_style, pattern,
+ M4ARGLEN (2)));
return;
}
if (replace == NULL)
m4_shipout_int (obs, startpos);
else if (startpos >= 0)
- substitute (context, obs, me, victim, replace, buf);
+ substitute (context, obs, me, victim, replace, M4ARGLEN (3), buf);
}
@@ -874,7 +891,9 @@ M4BUILTIN_HANDLER (renamesyms)
{
const m4_call_info *me = m4_arg_info (argv);
const char *regexp; /* regular expression string */
+ size_t regexp_len;
const char *replace; /* replacement expression string */
+ size_t replace_len;
m4_pattern_buffer *buf; /* compiled regular expression */
@@ -883,17 +902,20 @@ M4BUILTIN_HANDLER (renamesyms)
int resyntax;
regexp = M4ARG (1);
+ regexp_len = M4ARGLEN (1);
replace = M4ARG (2);
+ replace_len = M4ARGLEN (2);
resyntax = m4_get_regexp_syntax_opt (context);
if (argc >= 4)
{
- resyntax = m4_resyntax_encode_safe (context, me, M4ARG (3));
+ resyntax = m4_resyntax_encode_safe (context, me, M4ARG (3),
+ M4ARGLEN (3));
if (resyntax < 0)
return;
}
- buf = regexp_compile (context, me, regexp, M4ARGLEN (1), resyntax);
+ buf = regexp_compile (context, me, regexp, regexp_len, resyntax);
if (!buf)
return;
@@ -905,7 +927,8 @@ M4BUILTIN_HANDLER (renamesyms)
const m4_string *key = &data.base[0];
if (regexp_substitute (context, data.obs, me, key->str, key->len,
- regexp, buf, replace, true))
+ regexp, regexp_len, buf, replace, replace_len,
+ true))
{
size_t newlen = obstack_object_size (data.obs);
m4_symbol_rename (M4SYMTAB, key->str, key->len,
diff --git a/modules/m4.c b/modules/m4.c
index e9695a3..f78a177 100644
--- a/modules/m4.c
+++ b/modules/m4.c
@@ -998,8 +998,7 @@ m4_expand_ranges (const char *s, size_t *len, m4_obstack
*obs)
obstack_1grow (obs, *s);
}
*len = obstack_object_size (obs);
- /* FIXME - use obstack_finish once translit is updated. */
- return (char *) obstack_copy0 (obs, "", 0);
+ return (char *) obstack_finish (obs);
}
/* The macro "translit" translates all characters in the first
@@ -1018,7 +1017,9 @@ M4BUILTIN_HANDLER (translit)
char found[UCHAR_MAX + 1] = {0};
unsigned char ch;
- if (argc <= 2)
+ enum { ASIS, REPLACE, DELETE };
+
+ if (m4_arg_empty (argv, 1) || m4_arg_empty (argv, 2))
{
m4_push_arg (context, obs, argv, 1);
return;
@@ -1026,7 +1027,7 @@ M4BUILTIN_HANDLER (translit)
from = M4ARG (2);
from_len = M4ARGLEN (2);
- if (strchr (from, '-') != NULL)
+ if (memchr (from, '-', from_len) != NULL)
{
from = m4_expand_ranges (from, &from_len, m4_arg_scratch (context));
assert (from);
@@ -1034,35 +1035,57 @@ M4BUILTIN_HANDLER (translit)
to = M4ARG (3);
to_len = M4ARGLEN (3);
- if (strchr (to, '-') != NULL)
+ if (memchr (to, '-', to_len) != NULL)
{
to = m4_expand_ranges (to, &to_len, m4_arg_scratch (context));
assert (to);
}
- /* Calling strchr(from) for each character in data is quadratic,
+ /* Calling memchr(from) for each character in data is quadratic,
since both strings can be arbitrarily long. Instead, create a
from-to mapping in one pass of from, then use that map in one
pass of data, for linear behavior. Traditional behavior is that
only the first instance of a character in from is consulted,
hence the found map. */
- for ( ; (ch = *from) != '\0'; from++)
+ while (from_len--)
{
- if (!found[ch])
+ ch = *from++;
+ if (found[ch] == ASIS)
+ {
+ if (to_len)
+ {
+ found[ch] = REPLACE;
+ map[ch] = *to;
+ }
+ else
+ found[ch] = DELETE;
+ }
+ if (to_len)
{
- found[ch] = 1;
- map[ch] = *to;
+ to++;
+ to_len--;
}
- if (*to != '\0')
- to++;
}
- for (data = M4ARG (1); (ch = *data) != '\0'; data++)
+ data = M4ARG (1);
+ from_len = M4ARGLEN (1);
+ while (from_len--)
{
- if (!found[ch])
- obstack_1grow (obs, ch);
- else if (map[ch])
- obstack_1grow (obs, map[ch]);
+ ch = *data++;
+ switch (found[ch])
+ {
+ case ASIS:
+ obstack_1grow (obs, ch);
+ break;
+ case REPLACE:
+ obstack_1grow (obs, map[ch]);
+ break;
+ case DELETE:
+ break;
+ default:
+ assert (!"translit");
+ abort ();
+ }
}
}
diff --git a/src/freeze.c b/src/freeze.c
index 5d5b4ee..3008f27 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -634,7 +634,7 @@ ill-formed frozen file, version 2 directive `%c'
encountered"), 'd');
if (m4_debug_decode (context, string[0]) < 0)
m4_error (context, EXIT_FAILURE, 0, NULL,
- _("unknown debug mode `%s'"),
+ _("unknown debug mode %s"),
quotearg_style_mem (locale_quoting_style, string[0],
number[0]));
break;
@@ -751,10 +751,11 @@ ill-formed frozen file, version 2 directive `%c'
encountered"), 'R');
m4_set_regexp_syntax_opt (context,
m4_regexp_syntax_encode (string[0]));
- if (m4_get_regexp_syntax_opt (context) < 0)
+ if (m4_get_regexp_syntax_opt (context) < 0
+ || strlen (string[0]) < number[0])
{
m4_error (context, EXIT_FAILURE, 0, NULL,
- _("unknown regexp syntax code `%s'"),
+ _("bad syntax-spec %s"),
quotearg_style_mem (locale_quoting_style, string[0],
number[0]));
}
diff --git a/tests/freeze.at b/tests/freeze.at
index 9b8c946..693ae54 100644
--- a/tests/freeze.at
+++ b/tests/freeze.at
@@ -409,6 +409,12 @@ AT_CHECK_M4([-R frozen.m4f unfrozen.m4], [0], [stdout],
[experr], [], [ ])
AT_CHECK([cat out1 stdout], [0], [expout])
+dnl Check that unexpected embedded NULs are recognized.
+printf '# bogus frozen file\nV2\nR4\ngnu\0\n' > bogus.m4f
+AT_CHECK_M4([-R bogus.m4f], [1], [],
+[[m4:bogus.m4f:4: bad syntax-spec `gnu\0'
+]])
+
AT_CLEANUP
])
diff --git a/tests/null.err b/tests/null.err
index 74ec09d..7b9f798 100644
--- a/tests/null.err
+++ b/tests/null.err
@@ -3,12 +3,14 @@ m4:null.m4:21: Warning: builtin: undefined builtin `-\0-'
changequote:
echo: address@hidden/
m4trace: -1- dumpdef( echo/) -> /
+changeresyntax:
+m4:null.m4:39: Warning: changeresyntax: bad syntax-spec: `\0'
changesyntax:
-m4:null.m4:46: Warning: changesyntax: undefined syntax code: `\0'
+m4:null.m4:48: Warning: changesyntax: undefined syntax code: `\0'
defn:
-m4:null.m4:55: Warning: defn: undefined macro `\0-\0'
+m4:null.m4:57: Warning: defn: undefined macro `\0-\0'
dumpdef:
-m4:null.m4:68: Warning: dumpdef: undefined macro `\0-\0'
+m4:null.m4:70: Warning: dumpdef: undefined macro `\0-\0'
: `empty'
-: `dash'
- -: ``$0': $1'
@@ -16,9 +18,21 @@ m4:null.m4:68: Warning: dumpdef: undefined macro `\0-\0'
--: `dashes'
body: `- -'
errprint: - - - -
+format:
+m4:null.m4:87: Warning: format: unrecognized specifier in `%\0%'
+m4:null.m4:87: Warning: format: unrecognized specifier in `%\0%'
indir:
-m4:null.m4:99: Warning: indir: undefined macro `\0-\0'
-m4:null.m4:101: Warning: \0\0%%: extra arguments ignored: 1 > 0
+m4:null.m4:104: Warning: indir: undefined macro `\0-\0'
+m4:null.m4:106: Warning: \0\0%%: extra arguments ignored: 1 > 0
+patsubst:
+m4:null.m4:124: Warning: patsubst: bad regular expression `\\\0\\': Trailing
backslash
+m4:null.m4:134: Warning: patsubst: bad syntax-spec: `\0'
+regexp:
+m4:null.m4:146: Warning: regexp: bad regular expression `\\\0\\': Trailing
backslash
+m4:null.m4:156: Warning: regexp: bad syntax-spec: `\0'
+renamesyms:
+m4:null.m4:161: Warning: renamesyms: bad regular expression `\\\0\\': Trailing
backslash
+m4:null.m4:167: Warning: renamesyms: bad syntax-spec: `\0'
traceon:
m4trace: -1- - -(`- -') -> `strange: - -'
m4trace: -1- body -> ` - '
diff --git a/tests/null.m4 b/tests/null.m4
index 77b6e67..f7a1587 100644
--- a/tests/null.m4
+++ b/tests/null.m4
@@ -34,7 +34,9 @@ dnl Quotes in trace and dump output:
errprint(`changequote:
')traceon(`dumpdef')dumpdef(`echo'changequote( ,/))changequote`'dnl
traceoff(`dumpdef')dnl
-dnl Warning from changeresyntax: not tested yet. No resyntax includes NUL,
needs to warn
+dnl Warning from changeresyntax:
+errprint(`changeresyntax:
+')changeresyntax(` ')dnl
dnl Macro name in changesyntax:
`changesyntax:' changesyntax(`W+ -')- - - -(-)`'changesyntax()dnl
dnl Escape in changesyntax:
@@ -78,8 +80,11 @@ dnl Generated from esyscmd:
`esyscmd:' esyscmd(__program__` -DNUL '__file__) sysval
dnl First argument of eval: not tested yet. NUL not a number, needs to warn
dnl Other arguments of eval: not tested yet. NUL not a number, needs to warn
-dnl First argument to format: not tested yet
-dnl Invalid specifier in format: not tested yet, needs to warn
+dnl First argument to format:
+`format:' format(`%s %s', `-', `-')dnl
+dnl Invalid specifier in format:
+errprint(`format:
+') format(`% %')
dnl Numeric and string arguments to format: not tested yet, needs to warn
dnl Character argument to format: not tested yet, %c semantics needed
dnl Macro name in ifdef, passed through ifdef:
@@ -114,15 +119,19 @@ m4wrap(``m4wrap:' - -
dnl Warning from maketemp: not tested yet. No file name includes NUL, needs to
warn
dnl Warning from mkdtemp: not tested yet. No file name includes NUL, needs to
warn
dnl Warning from mkstemp: not tested yet. No file name includes NUL, needs to
warn
-dnl Bad regex in patsubst: not tested yet
+dnl Bad regex in patsubst:
+errprint(`patsubst:
+')patsubst(`a', `\ \')dnl
dnl First argument of patsubst:
`patsubst:' patsubst(`- -', `-', `.')dnl
dnl Matching via meta-character in patsubst:
patsubst(`- -', `[^-]')dnl
dnl Second argument of patsubst:
patsubst(`abc', ` b', `-') patsubst(`- -', ` ', `!')dnl
-dnl Third argument of patsubst: not tested yet
-dnl Syntax argument of patsubst: not tested yet, needs to warn
+dnl Third argument of patsubst:
+ patsubst(`-!-', `!', ` ')dnl
+dnl Syntax argument of patsubst:
+patsubst(`a', `a', `b', ` ')dnl
dnl Replacement via reference in patsubst:
patsubst(`-- --', `-\(.\)-', `\1-\1')
dnl Defined argument of popdef:
@@ -132,20 +141,30 @@ dnl Macro name of pushdef:
`pushdef:' pushdef(`- -', `strange: $1')ifdef(`- -', `ok', `oops')`'dnl
dnl Definition of pushdef:
pushdef(`body', ` - ')body
-dnl Bad regex in regexp: not tested yet
+dnl Bad regex in regexp:
+errprint(`regexp:
+')regexp(`a', `\ \')dnl
dnl First argument of regexp:
`regexp:' regexp(`a b', `b')dnl
dnl Matching via meta-character in regexp:
regexp(`- -', `[^-]', `!')dnl
dnl Second argument of regexp:
regexp(`- -', ` ')dnl
-dnl Third argument of regexp: not tested yet
-dnl Syntax argument of patsubst: not tested yet, needs to warn
+dnl Third argument of regexp:
+ regexp(`!', `!', `- -')dnl
+dnl Syntax argument of patsubst:
+regexp(`a', `a', `b', ` ')dnl
dnl Replacement via reference in regexp:
regexp(`-- --', `-\(.\)-', `\1-\1')
-dnl Bad regex in renamesyms: not tested yet
-dnl Direct rename via renamesyms: not tested yet
-dnl Meta-character rename via renamesyms: not tested yet
+dnl Bad regex in renamesyms:
+errprint(`renamesyms:
+')renamesyms(`\ \', `-')dnl
+dnl Direct rename via renamesyms:
+`renamesyms:' renamesyms(` %%', `--%%')indir(`--%%')dnl
+dnl Meta-character rename via renamesyms:
+ renamesyms(`..\(%%\)', ` \1')indir(` %%')
+dnl Syntax argument of renamesyms:
+renamesyms(`a', `b', ` ')dnl
dnl Passed through shift:
`shift:' shift(`hi', `- -', - -)
dnl Warning from sinclude: not tested yet. No file name includes NUL, needs to
warn
@@ -162,9 +181,12 @@ dnl Macro name and arguments of traceon:
')traceon(`- -')indir(`- -', `- -')dnl
dnl Defined text of traceon:
traceon(`body')body
-dnl First argument of translit: not tested yet
-dnl Single character in other arguments of translit: not tested yet
-dnl Character ranges of translit: not tested yet
+dnl First argument of translit:
+`translit:' translit(`. .', `.', `-')dnl
+dnl Single character in other arguments of translit:
+ translit(` . ', `. ', ` .')dnl
+dnl Character ranges of translit:
+ translit(`abcd', ` -b')
dnl Defined argument of undefine:
`undefine:' undefine(`- -')ifdef(`- -', `oops', `ok')
dnl Undefined argument of undefine: not tested yet. Should it warn?
diff --git a/tests/null.out b/tests/null.out
index 5f6df39..97f80dd 100644
--- a/tests/null.out
+++ b/tests/null.out
@@ -11,18 +11,21 @@ define: - -
defn: `$0': $1 - -
divert: - -
esyscmd: [ ] 0
+format: - -
ifdef: yes: - - no: - -
ifelse: yes: - -
index: 2 -1 -1 8
indir: - -: 1 1 0 3
len: 1 3
m4symbols: - -
-patsubst: . . -- abc -!- - - -
+patsubst: . . -- abc -!- - - - - -
popdef: ok
pushdef: ok -
-regexp: 2 ! 0 -
+regexp: 2 ! 1 - - -
+renamesyms: 0 0
shift: - -,- -
substr: - -
traceon: strange: - - -
+translit: - - . . cd
undefine: ok
m4wrap: - -
diff --git a/tests/options.at b/tests/options.at
index 9331a21..dce43f8 100644
--- a/tests/options.at
+++ b/tests/options.at
@@ -714,8 +714,8 @@ AT_CHECK_M4([--regexp-syntax=unknown in], [1], [],
AT_CHECK_M4([--regexp-syntax= in], [0], [[0
]])
-AT_CHECK_M4([-rEXTENDED in], [1], [[
-]], [[m4:in:1: regexp: bad regular expression `(': Unmatched ( or \(
+AT_CHECK_M4([-rEXTENDED in], [0], [[
+]], [[m4:in:1: Warning: regexp: bad regular expression `(': Unmatched ( or \(
]])
AT_CHECK_M4([-rgnu-m4 in], [0], [[0
@@ -725,9 +725,9 @@ AT_CHECK_M4([-r"gnu M4" in], [0], [[0
]])
dnl Test behavior of -r intermixed with files
-AT_CHECK_M4([-rEXTENDED in --regexp-syntax in], [1], [[
+AT_CHECK_M4([-rEXTENDED in --regexp-syntax in], [0], [[
0
-]], [[m4:in:1: regexp: bad regular expression `(': Unmatched ( or \(
+]], [[m4:in:1: Warning: regexp: bad regular expression `(': Unmatched ( or \(
]])
AT_CLEANUP
--
1.6.0.4
From 715c42128d8d357e3e751ec605069137d693c757 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Thu, 17 Jan 2008 14:34:36 -0700
Subject: [PATCH] Stage 27: Allow embedded NUL in text processing macros.
* src/m4.h (evaluate): Add parameter.
* src/builtin.c (compile_pattern) [DEBUG_REGEX]: Support NUL in
output messages.
(set_macro_sequence): Likewise.
(m4_eval): Normalize messages, and adjust caller.
(expand_ranges, substitute): Support NUL in macro expansion.
(m4_translit, m4_regexp, m4_patsubst): Adjust callers, to manage
NUL bytes.
* src/format.c (expand_format): Manage NUL bytes.
* src/eval.c (eval_error): Add EMPTY_ARGUMENT.
(end_text): New variable.
(eval_init_lex): Add parameter.
(eval_lex, evaluate): Detect NUL in macro expansion.
* doc/m4.texinfo (Format): Update to cover new behavior.
(Eval): Mention that result is unquoted.
* examples/null.m4: Enhance test.
* examples/null.err: Update expected output.
* examples/null.out: Likewise.
Signed-off-by: Eric Blake <address@hidden>
(cherry picked from commit 948d1ed0ca4089c2db579fe3d8b3ce172b3e616f)
---
ChangeLog | 26 ++++++
doc/m4.texinfo | 15 +++-
examples/null.err | 11 ++-
examples/null.m4 | 30 +++++--
examples/null.out | 6 +-
src/builtin.c | 232 ++++++++++++++++++++++++++++++++++------------------
src/eval.c | 33 ++++++--
src/format.c | 43 ++++++----
src/m4.h | 2 +-
9 files changed, 275 insertions(+), 123 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 2085dea..e991a8c 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,29 @@
+2008-12-03 Eric Blake <address@hidden>
+
+ Stage 27: Allow embedded NUL in text processing macros.
+ Pass NUL through regular expressions, format, and translit, and
+ diagnose it in eval. Improve warning capabilities of format.
+ Memory impact: none.
+ Speed impact: none noticed.
+ * src/m4.h (evaluate): Add parameter.
+ * src/builtin.c (compile_pattern) [DEBUG_REGEX]: Support NUL in
+ output messages.
+ (set_macro_sequence): Likewise.
+ (m4_eval): Normalize messages, and adjust caller.
+ (expand_ranges, substitute): Support NUL in macro expansion.
+ (m4_translit, m4_regexp, m4_patsubst): Adjust callers, to manage
+ NUL bytes.
+ * src/format.c (expand_format): Manage NUL bytes.
+ * src/eval.c (eval_error): Add EMPTY_ARGUMENT.
+ (end_text): New variable.
+ (eval_init_lex): Add parameter.
+ (eval_lex, evaluate): Detect NUL in macro expansion.
+ * doc/m4.texinfo (Format): Update to cover new behavior.
+ (Eval): Mention that result is unquoted.
+ * examples/null.m4: Enhance test.
+ * examples/null.err: Update expected output.
+ * examples/null.out: Likewise.
+
2008-11-28 Eric Blake <address@hidden>
Add extension to divert builtin.
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 8301bb7..2fb676d 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -6448,7 +6448,7 @@ Format
@example
format(`%p', `0')
@error{}m4:stdin:1: Warning: format: unrecognized specifier in `%p'
address@hidden
address@hidden
format(`%*d', `')
@error{}m4:stdin:2: Warning: format: empty string treated as 0
@error{}m4:stdin:2: Warning: format: too few arguments: 2 < 3
@@ -6734,7 +6734,9 @@ Eval
@var{radix} is the empty string. A warning results if the radix is
outside the range of 1 through 36, inclusive. The result of @code{eval}
is always taken to be signed. No radix prefix is output, and for
-radices greater than 10, the digits are lower case. The @var{width}
+radices greater than 10, the digits are lower case (although some
+other implementations use upper case). The output is unquoted, and
+subject to further macro expansion. The @var{width}
argument specifies the minimum output width, excluding any negative
sign. The result is zero-padded to extend the expansion to the
requested width. A warning results if the width is negative. If
@@ -6759,14 +6761,19 @@ Eval
eval(`10', `16')
@result{}a
eval(`1', `37')
address@hidden:stdin:9: Warning: eval: radix 37 out of range
address@hidden:stdin:9: Warning: eval: radix out of range: 37
@result{}
eval(`1', , `-1')
address@hidden:stdin:10: Warning: eval: negative width
address@hidden:stdin:10: Warning: eval: negative width: -1
@result{}
eval()
@error{}m4:stdin:11: Warning: eval: empty string treated as 0
@result{}0
+eval(` ')
address@hidden:stdin:12: Warning: eval: empty string treated as 0
address@hidden
+define(`a', `hi')eval(` 10 ', `16')
address@hidden
@end example
@node Shell commands
diff --git a/examples/null.err b/examples/null.err
index 897ce34..977b3b7 100644
--- a/examples/null.err
+++ b/examples/null.err
@@ -16,9 +16,16 @@ m4:examples/null.m4:67: Warning: dumpdef: undefined macro
`\0-\0'
--: `dashes'
body: `- -'
errprint: - - - -
+format:
+m4:examples/null.m4:84: Warning: format: unrecognized specifier in `%\0%'
+m4:examples/null.m4:84: Warning: format: unrecognized specifier in `%\0%'
indir:
-m4:examples/null.m4:98: Warning: indir: undefined macro `\0-\0'
-m4:examples/null.m4:100: Warning: \0\0%%: extra arguments ignored: 1 > 0
+m4:examples/null.m4:101: Warning: indir: undefined macro `\0-\0'
+m4:examples/null.m4:103: Warning: \0\0%%: extra arguments ignored: 1 > 0
+patsubst:
+m4:examples/null.m4:116: Warning: patsubst: bad regular expression `\\\0\\':
Trailing backslash
+regexp:
+m4:examples/null.m4:136: Warning: regexp: bad regular expression `\\\0\\':
Trailing backslash
traceon:
m4trace: -1- - -(`- -') -> `strange: - -'
m4trace: -1- body -> ` - '
diff --git a/examples/null.m4 b/examples/null.m4
index 1823073..e60aec5 100644
--- a/examples/null.m4
+++ b/examples/null.m4
@@ -77,8 +77,11 @@ dnl Generated from esyscmd:
`esyscmd:' esyscmd(__program__` -DNUL '__file__) sysval
dnl First argument of eval: not tested yet. NUL not a number, needs to warn
dnl Other arguments of eval: not tested yet, needs to warn
-dnl First argument to format: not tested yet
-dnl Invalid specifier in format: not tested yet, needs to warn
+dnl First argument to format:
+`format:' format(`%s %s', `-', `-')dnl
+dnl Invalid specifier in format:
+errprint(`format:
+') format(`% %')
dnl Numeric and string arguments to format: not tested yet, needs to warn
dnl Character argument to format: not tested yet, %c semantics needed
dnl Macro name in ifdef, passed through ifdef:
@@ -108,14 +111,17 @@ m4wrap(``m4wrap:' - -
')dnl
dnl Warning from maketemp: not tested yet. No file name includes NUL, needs to
warn
dnl Warning from mkstemp: not tested yet. No file name includes NUL, needs to
warn
-dnl Bad regex in patsubst: not tested yet
+dnl Bad regex in patsubst:
+errprint(`patsubst:
+')patsubst(`a', `\ \')dnl
dnl First argument of patsubst:
`patsubst:' patsubst(`- -', `-', `.')dnl
dnl Matching via meta-character in patsubst:
patsubst(`- -', `[^-]')dnl
dnl Second argument of patsubst:
patsubst(`abc', ` b', `-') patsubst(`- -', ` ', `!')dnl
-dnl Third argument of patsubst: not tested yet
+dnl Third argument of patsubst:
+ patsubst(`-!-', `!', ` ')dnl
dnl Replacement via reference in patsubst:
patsubst(`-- --', `-\(.\)-', `\1-\1')
dnl Defined argument of popdef:
@@ -125,14 +131,17 @@ dnl Macro name of pushdef:
`pushdef:' pushdef(`- -', `strange: $1')ifdef(`- -', `ok', `oops')`'dnl
dnl Definition of pushdef:
pushdef(`body', ` - ')body
-dnl Bad regex in regexp: not tested yet
+dnl Bad regex in regexp:
+errprint(`regexp:
+')regexp(`a', `\ \')dnl
dnl First argument of regexp:
`regexp:' regexp(`a b', `b')dnl
dnl Matching via meta-character in regexp:
regexp(`- -', `[^-]', `!')dnl
dnl Second argument of regexp:
regexp(`- -', ` ')dnl
-dnl Third argument of regexp: not tested yet
+dnl Third argument of regexp:
+ regexp(`!', `!', `- -')dnl
dnl Replacement via reference in regexp:
regexp(`-- --', `-\(.\)-', `\1-\1')
dnl Passed through shift:
@@ -150,9 +159,12 @@ dnl Macro name and arguments of traceon:
')traceon(`- -')indir(`- -', `- -')dnl
dnl Defined text of traceon:
traceon(`body')body
-dnl First argument of translit: not tested yet
-dnl Single character in other arguments of translit: not tested yet
-dnl Character ranges of translit: not tested yet
+dnl First argument of translit:
+`translit:' translit(`. .', `.', `-')dnl
+dnl Single character in other arguments of translit:
+ translit(` . ', `. ', ` .')dnl
+dnl Character ranges of translit:
+ translit(`abcd', ` -b')
dnl Defined argument of undefine:
`undefine:' undefine(`- -')ifdef(`- -', `oops', `ok')
dnl Undefined argument of undefine: not tested yet. Should it warn?
diff --git a/examples/null.out b/examples/null.out
index dd83416..c2c1cb9 100644
--- a/examples/null.out
+++ b/examples/null.out
@@ -11,17 +11,19 @@ define: - -
defn: `$0': $1 - -
divert: - -
esyscmd: [ ] 0
+format: - -
ifdef: yes: - - no: - -
ifelse: yes: - -
index: 2 -1 -1 8
indir: - -: 1 1 0 3
len: 1 3
-patsubst: . . -- abc -!- - - -
+patsubst: . . -- abc -!- - - - - -
popdef: ok
pushdef: ok -
-regexp: 2 ! 0 -
+regexp: 2 ! 1 - - -
shift: - -,- -
substr: - -
traceon: strange: - - -
+translit: - - . . cd
undefine: ok
m4wrap: - -
diff --git a/src/builtin.c b/src/builtin.c
index 24f2df6..613e1d2 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -311,7 +311,11 @@ compile_pattern (const char *str, size_t len, struct
re_pattern_buffer **buf,
regex_cache[i].count++;
#ifdef DEBUG_REGEX
if (trace_file)
- xfprintf (trace_file, "cached:{%s}\n", str);
+ {
+ fputs ("cached:{", trace_file);
+ fwrite (str, 1, len, trace_file);
+ fputs ("}\n", trace_file);
+ }
#endif /* DEBUG_REGEX */
return NULL;
}
@@ -321,7 +325,11 @@ compile_pattern (const char *str, size_t len, struct
re_pattern_buffer **buf,
msg = re_compile_pattern (str, len, new_buf);
#ifdef DEBUG_REGEX
if (trace_file)
- xfprintf (trace_file, "compile:{%s}\n", str);
+ {
+ fputs ("compile:{", trace_file);
+ fwrite (str, 1, len, trace_file);
+ fputs ("}\n", trace_file);
+ }
#endif /* DEBUG_REGEX */
if (msg)
{
@@ -356,7 +364,11 @@ compile_pattern (const char *str, size_t len, struct
re_pattern_buffer **buf,
{
#ifdef DEBUG_REGEX
if (trace_file)
- xfprintf (trace_file, "flush:{%s}\n", victim->str);
+ {
+ fputs ("flush:{", trace_file);
+ fwrite (victim->str, 1, victim->len, trace_file);
+ fputs ("}\n", trace_file);
+ }
#endif /* DEBUG_REGEX */
free (victim->str);
regfree (victim->buf);
@@ -404,8 +416,8 @@ set_macro_sequence (const char *regexp)
msg = re_compile_pattern (regexp, strlen (regexp), ¯o_sequence_buf);
if (msg != NULL)
m4_error (EXIT_FAILURE, 0, NULL,
- _("--warn-macro-sequence: bad regular expression `%s': %s"),
- regexp, msg);
+ _("--warn-macro-sequence: bad regular expression %s: %s"),
+ quotearg_style (locale_quoting_style, regexp), msg);
re_set_registers (¯o_sequence_buf, ¯o_sequence_regs,
macro_sequence_regs.num_regs,
macro_sequence_regs.start, macro_sequence_regs.end);
@@ -1208,7 +1220,7 @@ m4_eval (struct obstack *obs, int argc, macro_arguments
*argv)
if (radix < 1 || radix > 36)
{
- m4_warn (0, me, _("radix %d out of range"), radix);
+ m4_warn (0, me, _("radix out of range: %d"), radix);
return;
}
@@ -1216,13 +1228,11 @@ m4_eval (struct obstack *obs, int argc, macro_arguments
*argv)
return;
if (min < 0)
{
- m4_warn (0, me, _("negative width"));
+ m4_warn (0, me, _("negative width: %d"), min);
return;
}
- if (arg_empty (argv, 1))
- m4_warn (0, me, _("empty string treated as 0"));
- else if (evaluate (me, ARG (1), &value))
+ if (evaluate (me, ARG (1), ARG_LEN (1), &value))
return;
if (radix == 1)
@@ -1887,34 +1897,42 @@ m4_substr (struct obstack *obs, int argc,
macro_arguments *argv)
obstack_grow (obs, ARG (1) + start, length);
}
-/*------------------------------------------------------------------------.
-| For "translit", ranges are allowed in the second and third argument. |
-| They are expanded in the following function, and the expanded strings, |
-| without any ranges left, are used to translate the characters of the |
-| first argument. A single - (dash) can be included in the strings by |
-| being the first or the last character in the string. If the first |
-| character in a range is after the first in the character set, the range |
-| is made backwards, thus 9-0 is the string 9876543210.
|
-`------------------------------------------------------------------------*/
+/*------------------------------------------------------------------.
+| For "translit", ranges are allowed in the second and third |
+| argument. They are expanded in the following function, and the |
+| expanded strings, without any ranges left, are used to translate |
+| the characters of the first argument. A single - (dash) can be |
+| included in the strings by being the first or the last character |
+| in the string. If the first character in a range is after the |
+| first in the character set, the range is made backwards, thus 9-0 |
+| is the string 9876543210. This function expands S of length *LEN |
+| using OBS for the expansion, sets *LEN to the new length, and |
+| returns the expansion. |
+`------------------------------------------------------------------*/
static const char *
-expand_ranges (const char *s, struct obstack *obs)
+expand_ranges (const char *s, size_t *len, struct obstack *obs)
{
unsigned char from;
unsigned char to;
+ const char *end = s + *len;
+
+ assert (s != end);
+ from = *s++;
+ obstack_1grow (obs, from);
- for (from = '\0'; *s != '\0'; from = to_uchar (*s++))
+ for ( ; s != end; from = *s++)
{
- if (*s == '-' && from != '\0')
+ if (*s == '-')
{
- to = to_uchar (*++s);
- if (to == '\0')
+ if (++s == end)
{
/* trailing dash */
obstack_1grow (obs, '-');
break;
}
- else if (from <= to)
+ to = *s;
+ if (from <= to)
{
while (from++ < to)
obstack_1grow (obs, from);
@@ -1928,7 +1946,7 @@ expand_ranges (const char *s, struct obstack *obs)
else
obstack_1grow (obs, *s);
}
- obstack_1grow (obs, '\0');
+ *len = obstack_object_size (obs);
return (char *) obstack_finish (obs);
}
@@ -1946,25 +1964,32 @@ m4_translit (struct obstack *obs, int argc,
macro_arguments *argv)
const char *data;
const char *from;
const char *to;
+ size_t from_len;
+ size_t to_len;
char map[UCHAR_MAX + 1] = {0};
char found[UCHAR_MAX + 1] = {0};
unsigned char ch;
- if (bad_argc (arg_info (argv), argc, 2, 3))
+ enum { ASIS, REPLACE, DELETE };
+
+ if (bad_argc (arg_info (argv), argc, 2, 3) || arg_empty (argv, 1)
+ || arg_empty (argv, 2))
{
/* builtin(`translit') is blank, but translit(`abc') is abc. */
- if (argc == 2)
+ if (argc >= 2)
push_arg (obs, argv, 1);
return;
}
from = ARG (2);
- if (strchr (from, '-') != NULL)
- from = expand_ranges (from, arg_scratch ());
+ from_len = ARG_LEN (2);
+ if (memchr (from, '-', from_len) != NULL)
+ from = expand_ranges (from, &from_len, arg_scratch ());
to = ARG (3);
- if (strchr (to, '-') != NULL)
- to = expand_ranges (to, arg_scratch ());
+ to_len = ARG_LEN (3);
+ if (memchr (to, '-', to_len) != NULL)
+ to = expand_ranges (to, &to_len, arg_scratch ());
assert (from && to);
@@ -1974,23 +1999,45 @@ m4_translit (struct obstack *obs, int argc,
macro_arguments *argv)
pass of data, for linear behavior. Traditional behavior is that
only the first instance of a character in from is consulted,
hence the found map. */
- for ( ; (ch = *from) != '\0'; from++)
+ while (from_len--)
{
- if (!found[ch])
+ ch = *from++;
+ if (found[ch] == ASIS)
+ {
+ if (to_len)
+ {
+ found[ch] = REPLACE;
+ map[ch] = *to;
+ }
+ else
+ found[ch] = DELETE;
+ }
+ if (to_len)
{
- found[ch] = 1;
- map[ch] = *to;
+ to++;
+ to_len--;
}
- if (*to != '\0')
- to++;
}
- for (data = ARG (1); (ch = *data) != '\0'; data++)
+ data = ARG (1);
+ from_len = ARG_LEN (1);
+ while (from_len--)
{
- if (!found[ch])
- obstack_1grow (obs, ch);
- else if (map[ch])
- obstack_1grow (obs, map[ch]);
+ ch = *data++;
+ switch (found[ch])
+ {
+ case ASIS:
+ obstack_1grow (obs, ch);
+ break;
+ case REPLACE:
+ obstack_1grow (obs, map[ch]);
+ break;
+ case DELETE:
+ break;
+ default:
+ assert (!"m4_translit");
+ abort ();
+ }
}
}
@@ -2020,20 +2067,27 @@ static int substitute_warned = 0;
static void
substitute (struct obstack *obs, const call_info *me, const char *victim,
- const char *repl, struct re_registers *regs)
+ const char *repl, size_t repl_len, struct re_registers *regs)
{
int ch;
- for (;;)
+ while (repl_len--)
{
- while ((ch = *repl++) != '\\')
+ ch = *repl++;
+ if (ch != '\\')
{
- if (ch == '\0')
- return;
obstack_1grow (obs, ch);
+ continue;
+ }
+ if (!repl_len)
+ {
+ m4_warn (0, me, _("trailing \\ ignored in replacement"));
+ return;
}
- switch ((ch = *repl++))
+ ch = *repl++;
+ repl_len--;
+ switch (ch)
{
case '0':
if (!substitute_warned)
@@ -2060,10 +2114,6 @@ substitute (struct obstack *obs, const call_info *me,
const char *victim,
regs->end[ch] - regs->start[ch]);
break;
- case '\0':
- m4_warn (0, me, _("trailing \\ ignored in replacement"));
- return;
-
default:
obstack_1grow (obs, ch);
break;
@@ -2122,26 +2172,36 @@ m4_regexp (struct obstack *obs, int argc,
macro_arguments *argv)
regexp = ARG (2);
repl = ARG (3);
- if (!*regexp)
+ if (arg_empty (argv, 2))
{
/* The empty regex matches everything! */
if (argc == 3)
shipout_int (obs, 0);
else
- substitute (obs, me, victim, repl, NULL);
+ substitute (obs, me, victim, repl, ARG_LEN (3), NULL);
return;
}
#ifdef DEBUG_REGEX
if (trace_file)
- xfprintf (trace_file, "r:{%s}:%s%s%s\n", regexp,
- argc == 3 ? "" : "{", repl, argc == 3 ? "" : "}");
+ {
+ fputs ("r:{", trace_file);
+ fwrite (regexp, 1, ARG_LEN (2), trace_file);
+ if (argc > 3)
+ {
+ fputs ("}:{", trace_file);
+ fwrite (repl, 1, ARG_LEN (3), trace_file);
+ }
+ fputs ("}\n", trace_file);
+ }
#endif /* DEBUG_REGEX */
msg = compile_pattern (regexp, ARG_LEN (2), &buf, ®s);
if (msg != NULL)
{
- m4_warn (0, me, _("bad regular expression: `%s': %s"), regexp, msg);
+ m4_warn (0, me, _("bad regular expression %s: %s"),
+ quotearg_style_mem (locale_quoting_style, regexp, ARG_LEN (2)),
+ msg);
return;
}
@@ -2151,11 +2211,12 @@ m4_regexp (struct obstack *obs, int argc,
macro_arguments *argv)
argc == 3 ? NULL : regs);
if (startpos == -2)
- m4_warn (0, me, _("problem matching regular expression `%s'"), regexp);
+ m4_warn (0, me, _("problem matching regular expression %s"),
+ quotearg_style_mem (locale_quoting_style, regexp, ARG_LEN (2)));
else if (argc == 3)
shipout_int (obs, startpos);
else if (startpos >= 0)
- substitute (obs, me, victim, repl, regs);
+ substitute (obs, me, victim, repl, ARG_LEN (3), regs);
}
/*------------------------------------------------------------------.
@@ -2170,16 +2231,17 @@ static void
m4_patsubst (struct obstack *obs, int argc, macro_arguments *argv)
{
const call_info *me = arg_info (argv);
- const char *victim; /* first argument */
- const char *regexp; /* regular expression */
- const char *repl;
-
- struct re_pattern_buffer *buf;/* compiled regular expression */
- struct re_registers *regs; /* for subexpression matches */
- const char *msg; /* error message from re_compile_pattern */
- int matchpos; /* start position of match */
- int offset; /* current match offset */
- int length; /* length of first argument */
+ const char *victim; /* First argument. */
+ const char *regexp; /* Regular expression. */
+ const char *repl; /* Replacement text. */
+
+ struct re_pattern_buffer *buf;/* Compiled regular expression. */
+ struct re_registers *regs; /* For subexpression matches. */
+ const char *msg; /* Error message from re_compile_pattern. */
+ int matchpos; /* Start position of match. */
+ int offset; /* Current match offset. */
+ int length; /* Length of first argument. */
+ size_t repl_len; /* Length of replacement. */
if (bad_argc (me, argc, 2, 3))
{
@@ -2189,27 +2251,36 @@ m4_patsubst (struct obstack *obs, int argc,
macro_arguments *argv)
return;
}
- victim = ARG (1);
- regexp = ARG (2);
- repl = ARG (3);
-
/* The empty regex matches everywhere, but if there is no
replacement, we need not waste time with it. */
- if (!*regexp && !*repl)
+ if (arg_empty (argv, 2) && arg_empty (argv, 3))
{
push_arg (obs, argv, 1);
return;
}
+ victim = ARG (1);
+ regexp = ARG (2);
+ repl = ARG (3);
+ repl_len = ARG_LEN (3);
+
#ifdef DEBUG_REGEX
if (trace_file)
- xfprintf (trace_file, "p:{%s}:{%s}\n", regexp, repl);
+ {
+ fputs ("p:{", trace_file);
+ fwrite (regexp, 1, ARG_LEN (2), trace_file);
+ fputs ("}:{", trace_file);
+ fwrite (repl, 1, repl_len, trace_file);
+ fputs ("}\n", trace_file);
+ }
#endif /* DEBUG_REGEX */
msg = compile_pattern (regexp, ARG_LEN (2), &buf, ®s);
if (msg != NULL)
{
- m4_warn (0, me, _("bad regular expression `%s': %s"), regexp, msg);
+ m4_warn (0, me, _("bad regular expression %s: %s"),
+ quotearg_style_mem (locale_quoting_style, regexp, ARG_LEN (2)),
+ msg);
return;
}
@@ -2229,8 +2300,9 @@ m4_patsubst (struct obstack *obs, int argc,
macro_arguments *argv)
copied verbatim. */
if (matchpos == -2)
- m4_warn (0, me, _("problem matching regular expression `%s'"),
- regexp);
+ m4_warn (0, me, _("problem matching regular expression %s"),
+ quotearg_style_mem (locale_quoting_style, regexp,
+ ARG_LEN (2)));
else if (offset < length)
obstack_grow (obs, victim + offset, length - offset);
break;
@@ -2243,7 +2315,7 @@ m4_patsubst (struct obstack *obs, int argc,
macro_arguments *argv)
/* Handle the part of the string that was covered by the match. */
- substitute (obs, me, victim, repl, regs);
+ substitute (obs, me, victim, repl, repl_len, regs);
/* Update the offset to the end of the match. If the regexp
matched a null string, advance offset one more, to avoid
diff --git a/src/eval.c b/src/eval.c
index e2e600b..1b617ed 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -58,7 +58,8 @@ typedef enum eval_error
MISSING_RIGHT,
UNKNOWN_INPUT,
EXCESS_INPUT,
- INVALID_OPERATOR
+ INVALID_OPERATOR,
+ EMPTY_ARGUMENT
}
eval_error;
@@ -87,10 +88,15 @@ static const char *eval_text;
can back up, if we have read too much. */
static const char *last_text;
+/* Detect when to end parsing. */
+static const char *end_text;
+
+/* Prime the lexer at the start of TEXT, with length LEN. */
static void
-eval_init_lex (const char *text)
+eval_init_lex (const char *text, size_t len)
{
eval_text = text;
+ end_text = text + len;
last_text = NULL;
}
@@ -105,12 +111,12 @@ eval_undo (void)
static eval_token
eval_lex (int32_t *val)
{
- while (isspace (to_uchar (*eval_text)))
+ while (eval_text != end_text && isspace (to_uchar (*eval_text)))
eval_text++;
last_text = eval_text;
- if (*eval_text == '\0')
+ if (eval_text == end_text)
return EOTEXT;
if (isdigit (to_uchar (*eval_text)))
@@ -287,14 +293,17 @@ eval_lex (int32_t *val)
`---------------------------------------*/
bool
-evaluate (const call_info *me, const char *expr, int32_t *val)
+evaluate (const call_info *me, const char *expr, size_t len, int32_t *val)
{
eval_token et;
eval_error err;
- eval_init_lex (expr);
+ eval_init_lex (expr, len);
et = eval_lex (val);
- err = logical_or_term (me, et, val);
+ if (et == EOTEXT)
+ err = EMPTY_ARGUMENT;
+ else
+ err = logical_or_term (me, et, val);
if (err == NO_ERROR && *eval_text != '\0')
{
@@ -306,9 +315,15 @@ evaluate (const call_info *me, const char *expr, int32_t
*val)
switch (err)
{
+ /* Cases where result is printed. */
case NO_ERROR:
- break;
+ return false;
+
+ case EMPTY_ARGUMENT:
+ m4_warn (0, me, _("empty string treated as 0"));
+ return false;
+ /* Cases where error makes result meaningless. */
case MISSING_RIGHT:
m4_warn (0, me, _("bad expression (missing right parenthesis): %s"),
expr);
@@ -347,7 +362,7 @@ evaluate (const call_info *me, const char *expr, int32_t
*val)
abort ();
}
- return err != NO_ERROR;
+ return true;
}
/*---------------------------.
diff --git a/src/format.c b/src/format.c
index 3325853..8b2b11a 100644
--- a/src/format.c
+++ b/src/format.c
@@ -126,11 +126,12 @@ expand_format (struct obstack *obs, int argc,
macro_arguments *argv)
{
const call_info *me = arg_info (argv);/* Macro name. */
const char *f; /* Format control string. */
+ size_t f_len; /* Length of f. */
const char *fmt; /* Position within f. */
char fstart[] = "%'+- 0#*.*hhd"; /* Current format spec. */
char *p; /* Position within fstart. */
unsigned char c; /* A simple character. */
- int i = 0; /* Index within argc used so far. */
+ int i = 1; /* Index within argc used so far. */
bool valid_format = true; /* True if entire format string ok. */
/* Flags. */
@@ -159,25 +160,24 @@ expand_format (struct obstack *obs, int argc,
macro_arguments *argv)
int result = 0;
enum {CHAR, INT, LONG, DOUBLE, STR} datatype;
- f = fmt = ARG_STR (i, argc, argv);
+ f = fmt = ARG (1);
+ f_len = ARG_LEN (1);
+ assert (!f[f_len]); /* Requiring a terminating NUL makes parsing simpler. */
memset (ok, 0, sizeof ok);
- while (true)
+ while (f_len--)
{
- while ((c = *fmt++) != '%')
+ c = *fmt++;
+ if (c != '%')
{
- if (c == '\0')
- {
- if (valid_format)
- bad_argc (me, argc, i, i);
- return;
- }
obstack_1grow (obs, c);
+ continue;
}
if (*fmt == '%')
{
obstack_1grow (obs, '%');
fmt++;
+ f_len--;
continue;
}
@@ -228,7 +228,7 @@ expand_format (struct obstack *obs, int argc,
macro_arguments *argv)
break;
}
}
- while (!(flags & DONE) && fmt++);
+ while (!(flags & DONE) && (f_len--, fmt++));
if (flags & THOUSANDS)
*p++ = '\'';
if (flags & PLUS)
@@ -250,12 +250,14 @@ expand_format (struct obstack *obs, int argc,
macro_arguments *argv)
{
width = ARG_INT (i, argc, argv);
fmt++;
+ f_len--;
}
else
while (isdigit (to_uchar (*fmt)))
{
width = 10 * width + *fmt - '0';
fmt++;
+ f_len--;
}
/* Maximum precision; an explicit negative precision is the same
@@ -266,10 +268,12 @@ expand_format (struct obstack *obs, int argc,
macro_arguments *argv)
if (*fmt == '.')
{
ok['c'] = 0;
+ f_len--;
if (*(++fmt) == '*')
{
prec = ARG_INT (i, argc, argv);
++fmt;
+ f_len--;
}
else
{
@@ -278,6 +282,7 @@ expand_format (struct obstack *obs, int argc,
macro_arguments *argv)
{
prec = 10 * prec + *fmt - '0';
fmt++;
+ f_len--;
}
}
}
@@ -288,30 +293,34 @@ expand_format (struct obstack *obs, int argc,
macro_arguments *argv)
*p++ = 'l';
lflag = 1;
fmt++;
+ f_len--;
ok['c'] = ok['s'] = 0;
}
else if (*fmt == 'h')
{
*p++ = 'h';
fmt++;
+ f_len--;
if (*fmt == 'h')
{
*p++ = 'h';
fmt++;
+ f_len--;
}
ok['a'] = ok['A'] = ok['c'] = ok['e'] = ok['E'] = ok['f'] = ok['F']
= ok['g'] = ok['G'] = ok['s'] = 0;
}
- c = *fmt++;
- if (c > sizeof ok || !ok[c])
+ c = *fmt;
+ if (c > sizeof ok || !ok[c] || !f_len)
{
- m4_warn (0, me, _("unrecognized specifier in `%s'"), f);
+ m4_warn (0, me, _("unrecognized specifier in %s"),
+ quotearg_style_mem (locale_quoting_style, f, ARG_LEN (1)));
valid_format = false;
- if (c == '\0')
- fmt--;
continue;
}
+ fmt++;
+ f_len--;
/* Specifiers. We don't yet recognize C, S, n, or p. */
switch (c)
@@ -385,4 +394,6 @@ expand_format (struct obstack *obs, int argc,
macro_arguments *argv)
we constructed fstart, the result should not be negative. */
assert (0 <= result);
}
+ if (valid_format)
+ bad_argc (me, argc, i, i);
}
diff --git a/src/m4.h b/src/m4.h
index f643e49..76c697b 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -549,7 +549,7 @@ FILE *m4_path_search (const char *, char **);
/* File: eval.c --- expression evaluation. */
-bool evaluate (const call_info *, const char *, int32_t *);
+bool evaluate (const call_info *, const char *, size_t, int32_t *);
/* File: format.c --- printf like formatting. */
--
1.6.0.4
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- argv_ref patch 27: allow NUL through more builtins,
Eric Blake <=