[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch-1_4 EOF issues
From: |
Eric Blake |
Subject: |
branch-1_4 EOF issues |
Date: |
Wed, 26 Jul 2006 23:21:03 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Another incompatibility that I'm patching:
We were silently treating EOF as end-of-comment; Solaris treats this as a hard
error:
$ cat <<\EOF > foo
define(hi,HI)dnl
changecom(/*, */)dnl
/* hi
EOF
$ cat <<\EOF > bar
hi */
EOF
$ /usr/xpg4/bin/m4 foo bar
/* hi
/usr/xpg4/bin/m4:foo:3 EOF in comment
$ m4 foo bar
/* hi
HI */
$
Repeating an idea mentioned earlier, it might be slicker if we could allow
comments (and strings and argument collection) to cross command-line file
boundaries (the way they already cross include/sinclude file boundaries),
either with a warning, or only when the next file is not stdin or a tty (on the
grounds that an interactive input should leave the user in a known state,
rather than being at the mercy of the earlier input files). But for now, this
patch fixes the situation to give an error rather than confusingly end the
comment at EOF.
A similar comparision should be made for dnl treating EOF as a newline
character. Here, POSIX only requires m4 to parse text files, and it requires
textfiles to end in a newline, so it is outside the bounds of POSIX. Solaris
silently treated EOF as newline. So I kept our behavior, but now issue a
warning if dnl ends up treating EOF as newline.
But this does mean that because of our include/sinclude not doing EOF parsing,
that we are different than Solaris:
$ printf 'define(hi,HI)0 hi dnl 1 hi ' > foo
$ echo '2 hi' > bar
$ /usr/xpg4/bin/m4 foo bar
0 HI 2 HI
$ m4 foo bar
0 HI 2 HI
$ echo 'include(foo)include(bar)' | /usr/xpg4/bin/m4
0 HI 2 HI
$ echo 'include(foo)include(bar)' | m4
0 HI $
I don't want to change m4 to error out on include/sinclude, in case someone was
depending on that, but could not find an easy way to make it warn when
including a file left the parser in a non-default state. Yet I don't like how
command line and include/sinclude behave differently. Oh well, for now I just
documented it.
I also discovered that Solaris parses macros before quote strings before
comments, although GNU does comment, macro, then quote.
$ /usr/xpg4/bin/m4
define(hi,HI)changecom(q,Q)dnl
q hi Q hi
q HI Q HI
changecom(<,Q)changequote(`<',>)dnl
< hi Q hi >
hi Q hi >
$ m4
define(hi,HI)changecom(q,Q)dnl
q hi Q hi
q hi Q hi
$
For now, I just documented this, rather than rearranging our parser. POSIX is
silent here, should I go for matching Solaris behavior on the branch, or save
that for head?
Meanwhile, I encountered a weird Solaris bug (not worth mimicking in GNU m4 :) -
when the start-comment string is a prefix of the start-quote string, the next
character of the comment is doubled:
$ echo 'changecom(<,>)changequote(`<<'\'',>>)define(hi,HI)<hi>' \
| /usr/xpg4/bin/m4
<hhi>
$
2006-07-26 Eric Blake <address@hidden>
* doc/m4.texinfo (Macro Arguments, Changequote, Changecom)
(Dnl, M4wrap, Include): Document EOF issues, and add examples.
(Incompatibilities): Document incompatibility of changecom
vs. macro names, and of EOF in include.
* src/input.c (next_token): Reject unterminated comments at EOF.
(skip_line): Warn on unterminated dnl at EOF.
* NEWS: Document these changes.
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.43
diff -u -r1.1.1.1.2.43 NEWS
--- NEWS 24 Jul 2006 20:02:16 -0000 1.1.1.1.2.43
+++ NEWS 26 Jul 2006 23:17:55 -0000
@@ -12,6 +12,11 @@
* Fix bugs that occurred when invoked with stdout or stderr closed. Detect
write failures to stdout.
* The m4exit macro now converts values outside the range 0-255 to 1.
+* It is now an error if a command-line input file ends in the middle of a
+ comment, matching the behavior of mid-string and mid-argument
+ collection.
+* The dnl macro now warns if end of file is encountered instead of a
+ newline.
Version 1.4.5 - 15 July 2006, by Eric Blake (CVS version 1.4.4c)
Index: src/input.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/input.c,v
retrieving revision 1.1.1.1.2.10
diff -u -r1.1.1.1.2.10 input.c
--- src/input.c 24 Jul 2006 20:02:16 -0000 1.1.1.1.2.10
+++ src/input.c 26 Jul 2006 23:17:55 -0000
@@ -510,6 +510,8 @@
while ((ch = next_char ()) != CHAR_EOF && ch != '\n')
;
+ if (ch == CHAR_EOF)
+ M4ERROR ((warning_status, 0, "Warning: end of file treated as newline"));
}
@@ -755,6 +757,10 @@
obstack_1grow (&token_stack, ch);
if (ch != CHAR_EOF)
obstack_grow (&token_stack, ecomm.string, ecomm.length);
+ else
+ M4ERROR ((EXIT_FAILURE, 0,
+ "ERROR: end of file in comment"));
+
type = TOKEN_STRING;
}
#ifdef ENABLE_CHANGEWORD
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.52
diff -u -r1.1.1.1.2.52 m4.texinfo
--- doc/m4.texinfo 24 Jul 2006 20:02:16 -0000 1.1.1.1.2.52
+++ doc/m4.texinfo 26 Jul 2006 23:17:56 -0000
@@ -1033,6 +1033,15 @@
@result{}2
@end example
+It is an error if the end of file occurs while collecting arguments.
+
address@hidden
+define(
+^D
address@hidden:0: m4: ERROR: end of file in argument list
address@hidden example
+
+
@node Quoting Arguments
@section Quoting macro arguments
@@ -2292,6 +2301,20 @@
@result{}See how foo was defined, like this?
@end example
+If the end of file is encountered without a newline character, a
+warning is issued and dnl stops consuming input.
+
address@hidden
+define(`hi', `HI')
address@hidden
+m4wrap(`m4wrap(`2 hi
+')0 hi dnl 1 hi')
address@hidden
+^D
address@hidden:0: m4: Warning: end of file treated as newline
address@hidden HI 2 HI
address@hidden example
+
@node Changequote
@section Changing the quote characters
@@ -2346,7 +2369,7 @@
@end example
There is no way in @code{m4} to quote a string containing an unmatched
-left quote, except using @code{changequote} to change the current
+start-quote, except using @code{changequote} to change the current
quotes.
If the quotes should be changed from, say, @samp{[} to @samp{[[},
@@ -2354,13 +2377,62 @@
calls of @code{changequote} must be made, one for the temporary quotes
and one for the new quotes.
-Neither quote string should start with a letter or @samp{_} (underscore),
-as they will be confused with names in the input. Doing so disables
-the quoting mechanism.
-
-Changing the quotes to have the same start and end string disables
-nesting of quotes. This makes it impossible to double-quote strings
-across macro expansions, so it is not done very often.
+Macros are recognized in preference to the start-quote string, so if a
+prefix of @var{start} can be recognized as a macro name, the quoting
+mechanism is effectively disabled. Unless you use @code{changeword}
+(@pxref{Changeword}), this means that @var{start} should not begin with
+a letter or @samp{_} (underscore).
+
address@hidden
+define(`hi', `HI')
address@hidden
+changequote(`q', `Q')
address@hidden
+q hi Q hi
address@hidden HI Q HI
+changequote
address@hidden
+changequote(`-', `EOF')
address@hidden
+- hi EOF hi
address@hidden hi HI
address@hidden example
+
+If @var{end} is a prefix of @var{start}, the end-quote will be
+recognized in preference to a nested start-quote. In particular,
+changing the quotes to have the same string for @var{start} and
address@hidden disables nesting of quotes. When quote nesting is disabled,
+it is impossible to double-quote strings across macro expansions, so
+using the same string is not done very often.
+
address@hidden
+define(`hi', `HI')
address@hidden
+changequote(`""', `"')
address@hidden
+""hi"""hi"
address@hidden
+""hi" ""hi"
address@hidden hi
+""hi"" "hi"
address@hidden" "HI"
+changequote
address@hidden
+`hi`hi'hi'
address@hidden'hi
+changequote(`"', `"')
address@hidden
+"hi"hi"hi"
address@hidden
address@hidden example
+
+It is an error if the end of file occurs within a quoted string.
+
address@hidden
+`dangling quote
+^D
address@hidden:0: m4: ERROR: end of file in string
address@hidden example
@node Changecom
@section Changing comment delimiters
@@ -2416,6 +2488,31 @@
@result{}# comment again
@end example
+Comments are recognized in preference to macros. However, this is not
+compatible with other implementations, where macros take precedence over
+comments, so it may change in a future release. For portability, this
+means that @var{start} should not have a prefix that begins with a
+letter or @samp{_} (underscore).
+
address@hidden
+define(`hi', `HI')
address@hidden
+changecom(`q', `Q')
address@hidden
+q hi Q hi
address@hidden hi Q HI
address@hidden example
+
+It is an error if the end of file occurs within a comment.
+
address@hidden
+changecom(`/*', `*/')
address@hidden
+/*dangling comment
+^D
address@hidden:0: m4: ERROR: end of file in comment
address@hidden example
+
@node Changeword
@section Changing the lexical structure of words
@@ -2619,6 +2716,30 @@
@result{}Answer: 10*9*8*7*6*5*4*3*2*1=3628800
@end example
+Invocations of @code{m4wrap} at the same recursion level are
+concatenated and rescanned as usual:
+
address@hidden
+define(`aa', `AA
+')
address@hidden
+m4wrap(`a')m4wrap(`a')
address@hidden
+^D
address@hidden
address@hidden example
+
address@hidden
+however, the transition between recursion levels behaves like an end of
+file condition between two input files.
+
address@hidden
+m4wrap(`m4wrap(`)')len(abc')
address@hidden
+^D
address@hidden:0: m4: ERROR: end of file in argument list
address@hidden example
+
@node File Inclusion
@chapter File inclusion
@@ -2709,7 +2830,11 @@
This use of @code{include} is not trivial, though, as files can contain
quotes, commas, and parentheses, which can interfere with the way the
address@hidden parser works.
address@hidden parser works. GNU @code{m4} seamlessly concatenates the file
+contents with the next character, even if the included file ended in
+the middle of a comment, string, or macro call. These conditions are
+only treated as end of file errors if specified as input files on the
+command line.
@node Search Path
@section Searching for include files
@@ -4311,6 +4436,27 @@
changequote with just one argument.
@item
+Some implementations of @code{m4} give macros a higher precedence than
+comments when parsing, meaning that if the start delimiter given to
address@hidden (@pxref{Changecom}) starts with a macro name, comments
+are effectively disabled. @acronym{POSIX} does not specify what the
+precedence is, so the GNU @code{m4} parser recognizes comments, then
+macros, then quoted strings.
+
address@hidden
+Traditional implementations allow argument collection, but not string
+and comment processing, to span file boundaries. Thus, if @file{a.m4}
+contains @samp{len(}, and @file{b.m4} contains @samp{abc)},
address@hidden a.m4 b.m4} outputs @samp{3} with traditional @code{m4}, but
+gives an error message that the end of file was encountered inside a
+macro with GNU @code{m4}. On the other hand, traditional
+implementations do end of file processing for files included with
address@hidden or @code{sinclude} (@pxref{Include}), while GNU @code{m4}
+seamlessly integrates the content of those files. Thus
address@hidden(`a.m4')include(`b.m4')} will output @samp{3} instead of
+giving an error.
+
address@hidden
Traditional @code{m4} treats @code{traceon} (@pxref{Trace}) without
arguments as a global variable, independent of named macro tracing.
Also, once a macro is undefined, named tracing of that macro is lost.
@@ -4323,13 +4469,6 @@
that is preserved even if the macro is currently undefined.
@item
-Traditional implementations allow argument collection, but not string
-processing, to span file boundaries. Thus, if @file{a.m4} contains
address@hidden(}, and @file{b.m4} contains @samp{abc)}, @kbd{m4 a.m4 b.m4}
-outputs @samp{3} with traditional @code{m4}, but gives an error message
-that the end of file was encountered inside a macro with GNU @code{m4}.
-
address@hidden
@acronym{POSIX} requires @code{eval} (@pxref{Eval}) to treat all
operators with the same precedence as C. However, GNU @code{m4}
currently follows the traditional precedence of other @code{m4}
- branch-1_4 EOF issues,
Eric Blake <=