[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: M4 syntax $11 vs. ${11}
From: |
Eric Blake |
Subject: |
Re: M4 syntax $11 vs. ${11} |
Date: |
Sat, 27 Jan 2007 18:53:23 -0700 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.9) Gecko/20061207 Thunderbird/1.5.0.9 Mnenhy/0.7.4.666 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Paul Eggert on 1/20/2007 12:43 AM:
> Eric Blake <address@hidden> writes:
>
>> + /* This warning must not kill m4 -E, or it will break autoconf. */
>> + if (text && strstr (text, "${"))
>> + M4ERROR ((0, 0, "Warning: raw `${' in defn of %s will change semantics",
>> + name));
>
> This warning will generate a lot of false positives, right?
> Most of the time, a stray ${ in an M4 file won't be followed
> by a series of digits and then a }. So it will be treated
> as itself (for backward compatibility).
OK, I toned down my patch on the M4 side of things. Originally, the patch
warned for the two-character sequence ${, since I was planning that even
${foo} could have meaning in M4 2.0 (as the current definition of foo),
but we can save that for M4 2.1 and a long transition period. For m4 2.0,
if ${ is followed by a non-digit, then I will be sure to stick with the
old behavior of literal output. This greatly reduces (but not eliminates)
the number of places in autoconf that need extra quoting; I'll follow up
with a patch to autoconf along those lines.
It is also possible in 2.0 to disable ${} handling, using the changesyntax
builtin to assign { and } back to the ordinary character category, at the
expense of no longer being able to refer to more than 9 arguments to a
macro. My patch to autoconf will include an action along those lines, so
that no matter how fancy M4 2.0 actually becomes when handling ${}, it is
possible for autoconf to ignore that new feature for the sake of the large
existing codebase of macros that use raw ${.
Meanwhile, this particular patch is only for the 1.4.x branch, and I'm
going ahead and committing it. I hope it is the last patch prior to
1.4.9, although this week's changes in gnulib regarding <string.h> need to
stabilize first. It adds the --warn-syntax option (off by default) in
order to detect uses of the three-character sequences $<digit><digit>
(which will change to the one-digit argumented concatenated with the
second digit rather than a multi-digit argument; I doubt much code tickles
this) as well as uses of ${<digit> (common when generating shell or
Makefile code; I doubt there are many false positives where a close }
cannot be found, so the warning is simplified by not looking for it). I
will be using this patch to find the problem spots in autoconf; I already
know that m4 1.4.9 + autoconf 2.61 will trigger the warning (and since
autom4te runs m4 -E, it is fatal to autoconf), so this patch is careful to
document that issue. Hopefully, autoconf 2.62 will be immune from this
warning.
2007-01-27 Eric Blake <address@hidden>
* src/m4.h (warn_syntax): Declare.
(init_pattern_buffer): Export.
* src/m4.c (warn_syntax, usage, WARN_SYNTAX_OPTIONS)
(long_options, main): Implement new option.
* src/builtin.c (init_pattern_buffer): Allow NULL regs argument.
(define_user_macro): Warn on $11 and ${1} if requested.
* src/input.c (init_pattern_buffer): Delete duplicate method.
* doc/m4.texinfo (Operation modes): Document it.
(Arguments): Document future direction of ${11} vs. $11.
(Incompatibilities): Fix wording on POSIX limitations.
* checks/get-them: Parse @{ and @} correctly.
* NEWS: Document this change.
- --
Don't work too hard, make some time for fun as well!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFvAIS84KuGfSFAYARAj3PAJwKP/02NcMGixie5CcrW60H7qJigQCg1KzX
4JCSiWLw8Upnu5wY6UNSdjE=
=yJW3
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.90
diff -u -p -r1.1.1.1.2.90 NEWS
--- NEWS 15 Jan 2007 13:51:33 -0000 1.1.1.1.2.90
+++ NEWS 28 Jan 2007 01:50:28 -0000
@@ -15,6 +15,14 @@ Version 1.4.9 - ?? ??? 2007, by ???? (C
of variable assignment as an extension.
* The `include' builtin now affects exit status on failure, as required by
POSIX. Use `sinclude' if you need a successful exit status.
+* A new `--warn-syntax' command-line option allows detection of
+ non-portable syntax that might be broken when upgrading to M4 2.0. For
+ example, POSIX requires a macro definition containing `$11' to expand to
+ the first argument concatenated with 1, rather than the eleventh
+ argument; and allows implementations to choose whether `${11}' is treated
+ as literal text, as in M4 1.4.x, or as the eleventh argument, as in the
+ eventual M4 2.0. Be aware that Autoconf 2.61 will not work with this
+ option enabled.
* Improved portability to platforms such as BSD/OS.
Version 1.4.8 - 20 November 2006, by Eric Blake (CVS version 1.4.7a)
Index: checks/get-them
===================================================================
RCS file: /sources/m4/m4/checks/Attic/get-them,v
retrieving revision 1.1.1.1.2.8
diff -u -p -r1.1.1.1.2.8 get-them
--- checks/get-them 6 Jan 2007 19:56:11 -0000 1.1.1.1.2.8
+++ checks/get-them 28 Jan 2007 01:50:28 -0000
@@ -73,6 +73,8 @@ BEGIN {
else
prefix = "";
gsub("@@", "@", $0);
+ gsub("@{", "{", $0);
+ gsub("@}", "}", $0);
gsub("@w{ }", " ", $0);
gsub("@tabchar{}", "\t", $0);
printf("%s%s\n", prefix, $0) >> file;
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.108
diff -u -p -r1.1.1.1.2.108 m4.texinfo
--- doc/m4.texinfo 15 Jan 2007 13:51:33 -0000 1.1.1.1.2.108
+++ doc/m4.texinfo 28 Jan 2007 01:50:29 -0000
@@ -577,6 +577,13 @@ is also specified.
Suppress warnings, such as missing or superfluous arguments in macro
calls, or treating the empty string as zero.
address@hidden --warn-syntax
+Issue warnings when syntax is encountered that will change semantics in
address@hidden M4 2.0. For now, the only semantics that will change have
+to do with how more than 9 arguments in a macro definition are handled
+(@pxref{Arguments}). This warning is disabled by default because it
+triggers spurious failures in @acronym{GNU} Autoconf 2.61.
+
@item -W @var{REGEXP}
@itemx address@hidden
Use @var{REGEXP} as an alternative syntax for macro names. This
@@ -1354,8 +1361,8 @@ As a @acronym{GNU} extension, the first
not have to be a simple word.
It can be any text string, even the empty string. A macro with a
non-standard name cannot be invoked in the normal way, as the name is
-not recognized. It can only be referenced by the builtins @code{Indir}
-(@pxref{Indir}) and @code{Defn} (@pxref{Defn}).
+not recognized. It can only be referenced by the builtins @code{indir}
+(@pxref{Indir}) and @code{defn} (@pxref{Defn}).
@cindex arrays
Arrays and associative arrays can be simulated by using this trick.
@@ -1375,7 +1382,7 @@ array(eval(`10 + 7'))
@result{}array element no. 17
@end example
-Change the @code{%d} to @code{%s} and it is an associative array.
+Change the @samp{%d} to @samp{%s} and it is an associative array.
@node Arguments
@section Arguments to macros
@@ -1412,13 +1419,6 @@ macro
(You should try and improve this example so that clients of @code{exch}
do not have to double quote; or @pxref{Improved exch, , Answers}).
address@hidden @acronym{GNU} extensions
address@hidden @code{m4} allows the number following the @samp{$} to
-consist of one
-or more digits, allowing macros to have any number of arguments. This
-is not so in UNIX implementations of @code{m4}, which only recognize
-one digit.
-
As a special case, the zeroth argument, @code{$0}, is always the name
of the macro being expanded.
@@ -1443,6 +1443,51 @@ foo
The @samp{foo} in the expansion text is @emph{not} expanded, since it is
a quoted string, and not a name.
address@hidden @acronym{GNU} extensions
address@hidden nine arguments, more than
address@hidden more than nine arguments
address@hidden arguments, more than nine
address@hidden positional parameters, more than nine
address@hidden @code{m4} allows the number following the @samp{$} to
+consist of one or more digits, allowing macros to have any number of
+arguments. The extension of accepting multiple digits is incompatible
+with @acronym{POSIX}, and is different than traditional implementations
+of @code{m4}, which only recognize one digit. Therefore, future
+versions of @acronym{GNU} M4 will phase out this feature.
address@hidden, for an example of how to portably access the eleventh
+argument.
+
address@hidden also states that @samp{$} followed immediately by
address@hidden@{} in a macro definition is implementation-defined. This version
+of M4 passes the literal characters @address@hidden through unchanged, but M4
+2.0 will implement an optional feature similar to @command{sh}, where
address@hidden@address@hidden expands to the eleventh argument, to replace the
current
+recognition of @samp{$11}. Meanwhile, if you want to guarantee that you
+will get a literal @address@hidden in output when expanding a macro, even
+when you upgrade to M4 2.0, you can use nested quoting to your
+advantage:
+
address@hidden
+define(`foo', `single quoted $`'@address@hidden output')
address@hidden
+define(`bar', ``double quoted $'address@hidden@} output'')
address@hidden
+foo(`a', `b')
address@hidden quoted address@hidden@} output
+bar(`a', `b')
address@hidden quoted address@hidden@} output
address@hidden example
+
+To help you detect places in your M4 input files that might change in
+behavior due to the changed behavior of M4 2.0, you can use the
address@hidden command-line option (@pxref{Operation modes, ,
+Invoking m4}). This will add a warning any time a macro definition
+includes @samp{$} followed by multiple digits, or by @address@hidden and a
+digit. The warning is not enabled by default, because it triggers a
+number of warnings in Autoconf 2.61 (and Autoconf uses @option{-E} to
+treat warnings as errors), and because it will still be possible to
+restore traditional behavior in M4 2.0.
+
@node Pseudo Arguments
@section Special arguments to macros
@@ -2588,7 +2633,7 @@ foo
@result{}blah
@end example
-Tracing even works on builtins. However, @command{defn} (@pxref{Defn})
+Tracing even works on builtins. However, @code{defn} (@pxref{Defn})
does not transfer tracing status.
@example
@@ -4721,10 +4766,10 @@ There are a few builtin macros in @code{
commands from within @code{m4}.
Note that the definition of a valid shell command is system dependent.
-On UNIX systems, this is the typical @code{/bin/sh}. But on other
+On UNIX systems, this is the typical @command{/bin/sh}. But on other
systems, such as native Windows, the shell has a different syntax of
commands that it understands. Some examples in this chapter assume
address@hidden/bin/sh}, and also demonstrate how to quit early with a known
address@hidden/bin/sh}, and also demonstrate how to quit early with a known
exit value if this is not the case.
@menu
@@ -4934,7 +4979,7 @@ sysval
@result{}0
@end example
address@hidden results in 127 if there was a problem executing the
address@hidden results in 127 if there was a problem executing the
command, for example, if the system-imposed argument length is exceeded,
or if there were not enough resources to fork. It is not possible to
distinguish between failed execution and successful execution that had
@@ -5262,8 +5307,8 @@ which files are listed on each @code{m4}
user's input file, or else each input file uses @code{include}.
Reading the common base of a big application, over and over again, may
-be time consuming. @acronym{GNU} @code{m4} offers some machinery to speed up
-the start of an application using lengthy common bases.
+be time consuming. @acronym{GNU} @code{m4} offers some machinery to
+speed up the start of an application using lengthy common bases.
@menu
* Using frozen files:: Using frozen files
@@ -5311,7 +5356,7 @@ with the varying input. The first call,
option, only reads and executes file @file{base.m4}, defining
various application macros and computing other initializations.
Once the input file @file{base.m4} has been completely processed, @acronym{GNU}
address@hidden produces on @file{base.m4f} a @dfn{frozen} file, that is, a
address@hidden produces in @file{base.m4f} a @dfn{frozen} file, that is, a
file which contains a kind of snapshot of the @code{m4} internal state.
Later calls, containing the @option{-R} option, are able to reload
@@ -5466,7 +5511,7 @@ Invoking m4}), unless overridden by othe
@itemize @bullet
@item
-In the @address@hidden notation for macro arguments, @var{n} can contain
+In the @address@hidden notation for macro arguments, @var{n} can contain
several digits, while the System V @code{m4} only accepts one digit.
This allows macros in @acronym{GNU} @code{m4} to take any number of
arguments, and not only nine (@pxref{Arguments}).
@@ -5623,10 +5668,11 @@ m4wrap(`a`'m4wrap(`c
@end example
@item
address@hidden requires that all builtins that require arguments, but
-are called without arguments, behave as though empty strings had been
-passed. For example, @code{a`'define`'b} would expand to @code{ab}.
-But @acronym{GNU} @code{m4} ignores certain builtins if they have missing
address@hidden states that builtins that require arguments, but are
+called without arguments, have undefined behavior. Traditional
+implementations simply behave as though empty strings had been passed.
+For example, @code{a`'define`'b} would expand to @code{ab}. But
address@hidden @code{m4} ignores certain builtins if they have missing
arguments, giving @code{adefineb} for the above example.
@item
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.55
diff -u -p -r1.1.1.1.2.55 builtin.c
--- src/builtin.c 27 Jan 2007 00:25:33 -0000 1.1.1.1.2.55
+++ src/builtin.c 28 Jan 2007 01:50:30 -0000
@@ -231,6 +231,7 @@ void
define_user_macro (const char *name, const char *text, symbol_lookup mode)
{
symbol *s;
+ size_t len;
s = lookup_symbol (name, mode);
if (SYMBOL_TYPE (s) == TOKEN_TEXT)
@@ -238,6 +239,43 @@ define_user_macro (const char *name, con
SYMBOL_TYPE (s) = TOKEN_TEXT;
SYMBOL_TEXT (s) = xstrdup (text ? text : "");
+
+ /* In M4 2.0, $11 will mean the first argument concatenated with 1,
+ not the eleventh argument. Also, ${1} will mean the first
+ argument, rather than literal text (although for compatibility
+ sake, it will be possible to restore the traditional meaning of
+ ${1} using changesyntax). Needing more than 9 arguments is
+ somewhat rare, but using M4 to process shell code is quite
+ common; either way, warn on usages that will change in
+ semantics. */
+ if (warn_syntax && text && (len = strlen (text)) >= 3)
+ {
+ static struct re_pattern_buffer buf;
+ static bool init = false;
+ regoff_t offset = 0;
+
+ if (! init)
+ {
+ const char *msg = "\\$[{0-9][0-9]";
+ init_pattern_buffer (&buf, NULL);
+ msg = re_compile_pattern (msg, strlen (msg), &buf);
+ if (msg != NULL)
+ {
+ M4ERROR ((EXIT_FAILURE, 0,
+ "unable to check --warn-syntax: %s", msg));
+ }
+ init = true;
+ }
+ while ((offset = re_search (&buf, text, len, offset, len - offset,
+ NULL)) >= 0)
+ {
+ M4ERROR ((warning_status, 0,
+ "Warning: semantics of `$%c%c%s' in `%s' will change",
+ text[offset + 1], text[offset + 2],
+ text[offset + 1] == '{' ? "...}" : "", name));
+ offset += 3;
+ }
+ }
}
/*-----------------------------------------------.
@@ -1828,15 +1866,18 @@ Warning: trailing \\ ignored in replacem
| Initialize regular expression variables. |
`------------------------------------------*/
-static void
+void
init_pattern_buffer (struct re_pattern_buffer *buf, struct re_registers *regs)
{
buf->translate = NULL;
buf->fastmap = NULL;
buf->buffer = NULL;
buf->allocated = 0;
- regs->start = NULL;
- regs->end = NULL;
+ if (regs)
+ {
+ regs->start = NULL;
+ regs->end = NULL;
+ }
}
/*----------------------------------------.
Index: src/input.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/input.c,v
retrieving revision 1.1.1.1.2.32
diff -u -p -r1.1.1.1.2.32 input.c
--- src/input.c 1 Nov 2006 22:29:08 -0000 1.1.1.1.2.32
+++ src/input.c 28 Jan 2007 01:50:30 -0000
@@ -1,6 +1,6 @@
/* GNU m4 -- A simple macro processor
- Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2004, 2005, 2006
+ Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2004, 2005, 2006, 2007
Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
@@ -752,15 +752,6 @@ set_comment (const char *bc, const char
#ifdef ENABLE_CHANGEWORD
-static void
-init_pattern_buffer (struct re_pattern_buffer *buf)
-{
- buf->translate = NULL;
- buf->fastmap = NULL;
- buf->buffer = NULL;
- buf->allocated = 0;
-}
-
void
set_word_regexp (const char *regexp)
{
@@ -776,7 +767,7 @@ set_word_regexp (const char *regexp)
}
/* Dry run to see whether the new expression is compilable. */
- init_pattern_buffer (&new_word_regexp);
+ init_pattern_buffer (&new_word_regexp, NULL);
msg = re_compile_pattern (regexp, strlen (regexp), &new_word_regexp);
regfree (&new_word_regexp);
Index: src/m4.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/m4.c,v
retrieving revision 1.1.1.1.2.41
diff -u -p -r1.1.1.1.2.41 m4.c
--- src/m4.c 5 Jan 2007 02:58:32 -0000 1.1.1.1.2.41
+++ src/m4.c 28 Jan 2007 01:50:30 -0000
@@ -55,6 +55,9 @@ int suppress_warnings = 0;
/* If not zero, then value of exit status for warning diagnostics. */
int warning_status = 0;
+/* If true, then warn about usage of ${1} in macro definitions. */
+bool warn_syntax = false;
+
/* Artificial limit for expansion_level in macro.c. */
int nesting_limit = 1024;
@@ -142,10 +145,13 @@ for short options too.\n\
Operation modes:\n\
--help display this help and exit\n\
--version output version information and exit\n\
+", stdout);
+ fputs ("\
-E, --fatal-warnings stop execution after first warning\n\
-i, --interactive unbuffer output, ignore interrupts\n\
-P, --prefix-builtins force a `m4_' prefix to all builtins\n\
-Q, --quiet, --silent suppress some warnings for builtins\n\
+ --warn-syntax warn on syntax that will change in future\n\
", stdout);
#ifdef ENABLE_CHANGEWORD
fputs ("\
@@ -221,6 +227,7 @@ enum
{
DEBUGFILE_OPTION = CHAR_MAX + 1, /* no short opt */
DIVERSIONS_OPTION, /* not quite -N, because of message */
+ WARN_SYNTAX_OPTION, /* no short opt */
HELP_OPTION, /* no short opt */
VERSION_OPTION /* no short opt */
@@ -250,6 +257,7 @@ static const struct option long_options[
{"debugfile", required_argument, NULL, DEBUGFILE_OPTION},
{"diversions", required_argument, NULL, DIVERSIONS_OPTION},
+ {"warn-syntax", no_argument, NULL, WARN_SYNTAX_OPTION},
{"help", no_argument, NULL, HELP_OPTION},
{"version", no_argument, NULL, VERSION_OPTION},
@@ -455,6 +463,10 @@ main (int argc, char *const *argv, char
debugfile = optarg;
break;
+ case WARN_SYNTAX_OPTION:
+ warn_syntax = true;
+ break;
+
case VERSION_OPTION:
version_etc (stdout, PACKAGE, PACKAGE_NAME, VERSION, AUTHORS, NULL);
exit (EXIT_SUCCESS);
Index: src/m4.h
===================================================================
RCS file: /sources/m4/m4/src/m4.h,v
retrieving revision 1.1.1.1.2.36
diff -u -p -r1.1.1.1.2.36 m4.h
--- src/m4.h 6 Jan 2007 19:56:11 -0000 1.1.1.1.2.36
+++ src/m4.h 28 Jan 2007 01:50:30 -0000
@@ -110,6 +110,7 @@ extern int max_debug_argument_length; /*
extern int suppress_warnings; /* -Q */
extern int warning_status; /* -E */
extern int nesting_limit; /* -L */
+extern bool warn_syntax; /* --warn-syntax */
#ifdef ENABLE_CHANGEWORD
extern const char *user_word_regexp; /* -W */
#endif
@@ -396,6 +397,8 @@ struct predefined
typedef struct builtin builtin;
typedef struct predefined predefined;
+struct re_pattern_buffer;
+struct re_registers;
void builtin_init (void);
void define_builtin (const char *, const builtin *, symbol_lookup);
@@ -403,6 +406,7 @@ void define_user_macro (const char *, co
void undivert_all (void);
void expand_user_macro (struct obstack *, symbol *, int, token_data **);
void m4_placeholder (struct obstack *, int, token_data **);
+void init_pattern_buffer (struct re_pattern_buffer *, struct re_registers *);
const builtin *find_builtin_by_addr (builtin_func *);
const builtin *find_builtin_by_name (const char *);