[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: branch-1_4 off-by-one in line reporting
From: |
Eric Blake |
Subject: |
Re: branch-1_4 off-by-one in line reporting |
Date: |
Tue, 17 Oct 2006 22:47:16 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Hi, Gary,
Gary V. Vaughan <gary <at> gnu.org> writes:
>
> On 12 Oct 2006, at 16:13, Eric Blake wrote:
> > I still want CVS head to follow Solaris' parsing precedence
> > rules (macros, then quotes, then comments), rather than the current
> > behavior
> > (comments, macros, quotes).
>
> Can you remind me why that is? The first thing that happens in any
> parser I'm familiar with is to discard the comments, why is it a good
> thing for M4 to behave differently? (I think I know an answer, but
> I'm curious to understand your reasoning here)
Most languages have the (rather nice) property that you cannot confuse comments
with other tokens. M4, on the other hand, thanks to changequote and changecom,
can be placed into a position where it is ambiguous whether the parser should
recognize the current character as the start of a macro or the start of a
comment. (Fortunately for changesyntax, we document that syntax designations
are mutually exclusive - you cannot use changesyntax to simultaneously make a
character both a letter and a comment start.) The dilemma is not that macros
are not discarded without expanding macros inside the comment, so much as
recognizing what constitutes a comment.
I guess an analog to this dilemma is the C89 vs. C99 parse question:
int i = 1 //*
//*/
-1; /* Is i 0 or -1? */
In C89, there are no // comments, so it parses as 'int i = 1 / <comment> -1;',
giving -1. In C99, the parser sees 'int i = 1 <comment> <comment> -1;', giving
an answer of 0. Because C99 changed the comment syntax to allow an additional
form, it is possible to encounter (admittedly unusual) test cases that can
expose the difference.
Now, for a concrete example in m4.
$ /usr/xpg4/bin/m4
define(a,A)define(a1a2a,b)changecom(1,2)a1a2a
b
a 1 a 2 a
A 1 a 2 A
$
Here, both Solaris and GNU agree - once you start parsing a macro name, you
greedily consume as many additional characters as fit in a name, even if you
could otherwise recognize a comment or quote were you to not be greedy.
$ /usr/xpg4/bin/m4
define(a,A)define(b,B)changequote(`a',c) a b c
A B c
$
Again, both implementations agree - the a is recognized as a macro name and
expanded to A, and not reconized as a quote start, so b gets expanded and all
three letters printed.
$ /usr/xpg4/bin/m4
define(a,A)define(b,B)changecom(`a',]) a b ]
A B ]
$ m4
define(a,A)define(b,B)changecom(`a',]) a b ]
a b ]
$
Hmm, now we have a difference. Solaris said that 'a' matches a macro name, so
expand it to A, at which point there is no comment recognized and b gets
expanded. GNU 1.4.x said that 'a' matches the comment start string, so look
for ], and everything in between, including 'b', is output untouched.
$ /usr/xpg4/bin/m4
changecom(`[[[',`]]]')changequote(`[[',`]]')define(a,A)
[[a]]
a
[[[a]]]
[a]
changequote changecom changecom(`[[',`]]')changequote(`[[[',`]]]')
[[a]]
[[a]]
[[[a]]]
a
$ m4
changecom(`[[[',`]]]')changequote(`[[',`]]')define(a,A)
[[a]]
a
[[[a]]]
[[[a]]]
changequote changecom changecom(`[[',`]]')changequote(`[[[',`]]]')
[[a]]
[[a]]
[[[a]]]
[[[a]]]
$
Hmm, in Solaris, when the prefix was ambiguous between quote and comment, it
always chose quote when given a chance, even when quote was the shorter
prefix. In GNU, on the other hand, the comment was always recognized first.
If either implementation were a strictly greedy parser, then you would expect
the longer start token to be recognized in preference to the shorter one.
POSIX does not explicitly document precedence in m4 between the three types of
tokens. However, it does document macros, then quotes, then comments, which is
the same precedence that Solaris uses. The only time it should matter is if
comments and quotes share a common prefix; or if comments and/or quotes start
with a letter or underscore. If anything, the reason I am proposing delaying
the recognition of comments until after macro names and quote starts have been
recognized is to match historical behavior, and so that GNU M4 parsing at least
follows the order that the three token types are mentioned in POSIX.
--
Eric Blake
- branch-1_4 off-by-one in line reporting, Eric Blake, 2006/10/11
- Re: branch-1_4 off-by-one in line reporting, Eric Blake, 2006/10/11
- Re: branch-1_4 off-by-one in line reporting, Eric Blake, 2006/10/12
- Re: branch-1_4 off-by-one in line reporting, Eric Blake, 2006/10/12
- head: Re: branch-1_4 off-by-one in line reporting, Eric Blake, 2006/10/13
- Re: branch-1_4 off-by-one in line reporting, Gary V. Vaughan, 2006/10/17
- Re: branch-1_4 off-by-one in line reporting,
Eric Blake <=
- Re: branch-1_4 off-by-one in line reporting, Gary V. Vaughan, 2006/10/18
- comment precedence [was: branch-1_4 off-by-one in line reporting], Eric Blake, 2006/10/28
Re: branch-1_4 off-by-one in line reporting, Eric Blake, 2006/10/25