[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch-1_4 patsubst bug, and missing arguments
From: |
Eric Blake |
Subject: |
branch-1_4 patsubst bug, and missing arguments |
Date: |
Thu, 24 Aug 2006 08:24:14 -0600 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Thunderbird/1.5.0.5 Mnenhy/0.7.4.666 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
First, I noticed that Solaris produces output for eval, index, incr, decr,
and len, even when given zero arguments. And for index, substr, and
translit, it gives output even with one argument:
$ /usr/xpg/bin/m4
eval index incr decr len
0 0 1 -1 0
index(abc) substr(abc) translit(abc)
0 abc abc
$
The documentation on the branch already states that the first five macros
are blind, and that builtin/indir just warn and expand to the empty string
when invoking a blind macro without arguments, so I didn't think that
worth changing on the branch. I would like to do that on head, though, so
that 'm4 -G -Q' can give the same results as Solaris. But for the
one-argument case, we were issuing a warning about a missing argument (or
not even that, with -Q), but expanding to nothing. So I fixed those
macros, along with regexp and patsubst for consistency, to still issue the
warning but issue output like Solaris.
Second, while testing, I noticed that patsubst has had a bug, and our
testsuite had a flaw, because it had a test that expected the buggy output
from patsubst. Here's the bug:
$ m4
patsubst(abc, ^\|$, -)
- -abc-
patsubst(a, ^\|$, -)
- -a
$
Oops, we were not detecting an empty match at the end of a string if there
was previously an empty match at the preceding character. Classic
off-by-one bug.
Head has similar issues in both cases, but I would like to clean up
argument handling there first - it has always bothered me that min_args
and max_args are off by 1 (ie. m4_bad_argc(..., 2, 2) means a macro takes
exactly one argument); the confusion shows in modules/m4.c where some
builtins had 0 and others 1 for min_arg when a macro is not blind. In the
process of fixing that, I can consolidate the groks_macro_args and
blind_if_no_args bools into a bitfield, making it more extensible to add
other bools to each builtin. I envision adding a bool of whether having
too few args results in an empty expansion or not, so that macros like
regexp can still produce output with too few args without having to also
call m4_bad_argc (also fixing my current hack to make syscmd/esyscmd still
change sysval when there are too few args). This would be a
binary-incompatible change with any existing modules built against
m4-1.4o, but I believe it is okay to change the interface right up until
2.0 is released, as we have documented all along that beta releases are
subject to change.
I am declaring this to be my last non-maintenance fix prior to 1.4.6 - I
am starting the release process after this checkin.
2006-08-24 Eric Blake <address@hidden>
* src/builtin.c (m4_index, m4_substr, m4_translit): Similar to
Solaris, produce output on just one argument.
(m4_regexp, m4_patsubst): For consistency, do likewise.
(m4_patsubst): Allow zero-length match at end of string.
* doc/m4.texinfo (Sysval): Fix overfull hbox.
(Bugs, Macro Arguments): Minor fixes.
(Other tokens): Rearrange node order.
(Index macro, Substr, Translit, Regexp, Patsubst): Add tests.
* NEWS: Document these fixes.
- --
Life is short - so eat dessert first!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFE7baN84KuGfSFAYARAugEAJ95NKxBNUibmG6jFPDv8uA8/OX8qQCePtNa
KtcuIRbKE3mt04rZi5nVknU=
=SAV1
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.55
diff -u -p -r1.1.1.1.2.55 NEWS
--- NEWS 18 Aug 2006 23:11:36 -0000 1.1.1.1.2.55
+++ NEWS 24 Aug 2006 14:20:20 -0000
@@ -34,6 +34,11 @@ Version 1.4.6 - ?? 2006, by ?? (CVS ver
messages have been reformatted to comply with GNU Coding Standards.
* The errprint, m4wrap, and shift macros are now recognized only with
arguments.
+* The index, substr, translit, regexp, and patsubst macros now produce
+ output when given only one argument, but still warn about a missing
+ second argument.
+* The patsubst macro now reliably finds zero-length matches at the end
+ of a string.
Version 1.4.5 - 15 July 2006, by Eric Blake (CVS version 1.4.4c)
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.69
diff -u -p -r1.1.1.1.2.69 m4.texinfo
--- doc/m4.texinfo 22 Aug 2006 19:56:09 -0000 1.1.1.1.2.69
+++ doc/m4.texinfo 24 Aug 2006 14:20:21 -0000
@@ -151,8 +151,8 @@ Lexical and syntactic conventions
* Names:: Macro names
* Quoted strings:: Quoting input to m4
-* Other tokens:: Other kinds of input tokens
* Comments:: Comments in m4 input
+* Other tokens:: Other kinds of input tokens
* Input processing:: How m4 copies input to output
How to invoke macros
@@ -604,7 +604,7 @@ options.
@node Bugs
@section Problems and bugs
-If you have problems with @acronym{GNU} @code{m4} or think you've found a bug,
+If you have problems with @acronym{GNU} M4 or think you've found a bug,
please report it. Before reporting a bug, make sure you've actually
found a real bug. Carefully reread the documentation and see if it
really says you can do what you're trying to do. If it's not clear
@@ -713,8 +713,8 @@ exception of the @sc{nul} character (the
@menu
* Names:: Macro names
* Quoted strings:: Quoting input to m4
-* Other tokens:: Other kinds of input tokens
* Comments:: Comments in m4 input
+* Other tokens:: Other kinds of input tokens
* Input processing:: How m4 copies input to output
@end menu
@@ -734,7 +734,8 @@ Examples of legal names are: @samp{foo},
@section Quoted strings
@cindex quoted string
-A quoted string is a sequence of characters surrounded by the quotes
+A quoted string is a sequence of characters surrounded by quote
+strings, defaulting to
@kbd{`} and @kbd{'}, where the nested begin and end quotes within the
string are balanced. The value of a string token is the text, with one
level of quotes stripped off. Thus
@@ -757,12 +758,6 @@ is the empty string, and double-quoting
The quote characters can be changed at any time, using the builtin macro
@code{changequote}. @xref{Changequote}, for more information.
address@hidden Other tokens
address@hidden Other tokens
-
-Any character, that is neither a part of a name, nor of a quoted string,
-is a token by itself.
-
@node Comments
@section Comments
@@ -787,6 +782,17 @@ The comment delimiters can be changed to
the builtin macro @code{changecom}. @xref{Changecom}, for more
information.
address@hidden Other tokens
address@hidden Other tokens
+
+Any character, that is neither a part of a name, nor of a quoted string,
+nor a comment, is a token by itself. When not in the context of macro
+expansion, all of these tokens are just copied to output. However,
+during macro expansion, whitespace characters (space, tab, newline,
+formfeed, carriage return, vertical tab), parentheses (@samp{(} and
address@hidden)}), comma (@samp{,}), and dollar (@samp{$}) have additional
+roles, explained later.
+
@node Input processing
@section Input Processing
@@ -801,7 +807,8 @@ call will be read and parsed into tokens
@code{m4} expands a macro as soon as possible. If it finds a macro call
when collecting the arguments to another, it will expand the second
-call first. If the input is
+call first. For a running example, examine how @code{m4} handles this
+input:
@comment ignore
@example
@@ -809,8 +816,16 @@ format(`Result is %d', eval(`2**15'))
@end example
@noindent
address@hidden will first expand @samp{eval(2**15)} to @samp{32768}, and only
-then expand the resulting call
+First, @code{m4} sees that the token @samp{format} is a macro name, so
+it collects the tokens @samp{(}, @samp{`Result is %d'}, @samp{,},
+and @address@hidden }}, before encountering another potential macro. Sure
+enough, @samp{eval} is a macro name, so the nested argument collection
+picks up @samp{(}, @samp{`2**15'}, and @samp{)}, invoking the eval macro
+with the lone argument of @samp{2**15}. The expansion of
address@hidden(2**15)} is @samp{32768}, which is then rescanned as the five
+tokens @samp{3}, @samp{2}, @samp{7}, @samp{6}, and @samp{8}; and
+combined with the next @samp{)}, the format macro now has all its
+arguments, as if the user had typed:
@comment ignore
@example
@@ -818,11 +833,14 @@ format(`Result is %d', 32768)
@end example
@noindent
-which will give the output
+The format macro expands to @samp{Result is 32768}, and we have another
+round of scanning for the tokens @samp{Result}, @address@hidden }},
address@hidden, @address@hidden }}, @samp{3}, @samp{2}, @samp{7}, @samp{6}, and
address@hidden None of these are macros, so the final output is
@comment ignore
@example
-Result is 32768
address@hidden is 32768
@end example
The order in which @code{m4} expands the macros can be explored using
@@ -911,6 +929,9 @@ by @samp{m4_} for them to be recognized.
whatsoever on user defined macros. For example, with this option,
one has to write @code{m4_dnl} and even @code{m4_m4exit}.
+Another alternative is to redefine problematic macros to a name less
+likely to cause conflicts, @xref{Definitions}.
+
If your version of @acronym{GNU} @code{m4} has the @code{changeword} feature
compiled in, it offers far more flexibility in specifying the
syntax of macro names, both builtin or user-defined. @xref{Changeword},
@@ -1010,6 +1031,7 @@ supplied, the missing arguments are take
However, some builtins are documented to behave differently for a
missing optional argument than for an explicit empty string. If
there are too many arguments, the excess arguments are ignored.
+Unquoted leading whitespace is stripped off all arguments.
Normally @code{m4} will issue warnings if a builtin macro is called
with an inappropriate number of arguments, but it can be suppressed with
@@ -3313,6 +3335,14 @@ index(`gnus, gnats, and armadillos', `da
@result{}-1
@end example
+Omitting @var{substring} evokes a warning, but still produces output.
+
address@hidden
+index(`abc')
address@hidden:stdin:1: Warning: too few arguments to builtin `index'
address@hidden
address@hidden example
+
@node Regexp
@section Searching for regular expressions
@@ -3378,6 +3408,14 @@ regexp(`abc', `\(\(d\)?\)\(c\)', `\1\2\3
@result{}c
@end example
+Omitting @var{regexp} evokes a warning, but still produces output.
+
address@hidden
+regexp(`abc')
address@hidden:stdin:1: Warning: too few arguments to builtin `regexp'
address@hidden
address@hidden example
+
@node Substr
@section Extracting substrings
@@ -3403,6 +3441,17 @@ substr(`gnus, gnats, and armadillos', `6
@result{}gnats
@end example
+Omitting @var{from} evokes a warning, but still produces output.
+
address@hidden
+substr(`abc')
address@hidden:stdin:1: Warning: too few arguments to builtin `substr'
address@hidden
+substr(`abc',)
address@hidden:stdin:2: empty string treated as 0 in builtin `substr'
address@hidden
address@hidden example
+
@node Translit
@section Translating characters
@@ -3447,6 +3496,14 @@ lowercase to uppercase, and the third `m
while converting them to lowercase. The two first cases are by far the
most common.
+Omitting @var{chars} evokes a warning, but still produces output.
+
address@hidden
+translit(`abc')
address@hidden:stdin:1: Warning: too few arguments to builtin `translit'
address@hidden
address@hidden example
+
@node Patsubst
@section Substituting text by regular expression
@@ -3489,7 +3546,7 @@ patsubst(`GNUs not Unix', `^', `OBS: ')
patsubst(`GNUs not Unix', `\<', `OBS: ')
@result{}OBS: GNUs OBS: not OBS: Unix
patsubst(`GNUs not Unix', `\w*', `(\&)')
address@hidden(GNUs)() (not)() (Unix)
address@hidden(GNUs)() (not)() (Unix)()
patsubst(`GNUs not Unix', `\w+', `(\&)')
@result{}(GNUs) (not) (Unix)
patsubst(`GNUs not Unix', `[A-Z][a-z]+')
@@ -3540,6 +3597,14 @@ patreg(`aba abb 121', `\(.\)\(.\)\1', `\
@result{}bab
@end example
+Omitting @var{regexp} evokes a warning, but still produces output.
+
address@hidden
+patsubst(`abc')
address@hidden:stdin:1: Warning: too few arguments to builtin `patsubst'
address@hidden
address@hidden example
+
@node Format
@section Formatted output
@@ -4020,7 +4085,8 @@ signal number shifted left by eight bits
@example
dnl This test assumes kill is a shell builtin, and that signals are
dnl recognizable.
-ifdef(`__unix__', , `errprint(` skipping: syscmd does not have unix semantics
+ifdef(`__unix__', ,
+ `errprint(` skipping: syscmd does not have unix semantics
')m4exit(`77')')dnl
syscmd(`kill -13 $$')
@result{}
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.36
diff -u -p -r1.1.1.1.2.36 builtin.c
--- src/builtin.c 18 Aug 2006 23:11:36 -0000 1.1.1.1.2.36
+++ src/builtin.c 24 Aug 2006 14:20:21 -0000
@@ -1475,7 +1475,12 @@ m4_index (struct obstack *obs, int argc,
int l1, l2, retval;
if (bad_argc (argv[0], argc, 3, 3))
- return;
+ {
+ /* builtin(`index') is blank, but index(`abc') is 0. */
+ if (argc == 2)
+ shipout_int (obs, 0);
+ return;
+ }
l1 = strlen (ARG (1));
l2 = strlen (ARG (2));
@@ -1506,7 +1511,12 @@ m4_substr (struct obstack *obs, int argc
int length, avail;
if (bad_argc (argv[0], argc, 3, 4))
- return;
+ {
+ /* builtin(`substr') is blank, but substr(`abc') is abc. */
+ if (argc == 2)
+ obstack_grow (obs, ARG (1), strlen (ARG (1)));
+ return;
+ }
length = avail = strlen (ARG (1));
if (!numeric_arg (argv[0], ARG (2), &start))
@@ -1584,7 +1594,12 @@ m4_translit (struct obstack *obs, int ar
int tolen;
if (bad_argc (argv[0], argc, 3, 4))
- return;
+ {
+ /* builtin(`translit') is blank, but translit(`abc') is abc. */
+ if (argc == 2)
+ obstack_grow (obs, ARG (1), strlen (ARG (1)));
+ return;
+ }
from = ARG (2);
if (strchr (from, '-') != NULL)
@@ -1750,7 +1765,12 @@ m4_regexp (struct obstack *obs, int argc
int length; /* length of first argument */
if (bad_argc (argv[0], argc, 3, 4))
- return;
+ {
+ /* builtin(`regexp') is blank, but regexp(`abc') is 0. */
+ if (argc == 2)
+ shipout_int (obs, 0);
+ return;
+ }
victim = TOKEN_DATA_TEXT (argv[1]);
regexp = TOKEN_DATA_TEXT (argv[2]);
@@ -1806,7 +1826,12 @@ m4_patsubst (struct obstack *obs, int ar
int length; /* length of first argument */
if (bad_argc (argv[0], argc, 3, 4))
- return;
+ {
+ /* builtin(`patsubst') is blank, but patsubst(`abc') is abc. */
+ if (argc == 2)
+ obstack_grow (obs, ARG (1), strlen (ARG (1)));
+ return;
+ }
regexp = TOKEN_DATA_TEXT (argv[2]);
@@ -1826,7 +1851,7 @@ m4_patsubst (struct obstack *obs, int ar
offset = 0;
matchpos = 0;
- while (offset < length)
+ while (offset <= length)
{
matchpos = re_search (&buf, victim, length,
offset, length - offset, ®s);
- branch-1_4 patsubst bug, and missing arguments,
Eric Blake <=