[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch-1_4 tokens vs. argument collection
From: |
Eric Blake |
Subject: |
branch-1_4 tokens vs. argument collection |
Date: |
Thu, 3 Aug 2006 03:25:59 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
This patch cleans up token processing so that macro.c never looks at raw
characters, and therefore, never consumes an entire quote or comment when it
only meant to consume `(' as part of argument collection.
Note that with this patch, if comments currently start with `(', a call to
changecom can see a comment from peek_token, process the changecom without
arguments, then next_token will see the same input character as just a plain `
(' because the tokenization rules changed midstream (likewise with quotes
starting with `(' and a call to changequote).
2006-08-02 Eric Blake <address@hidden>
Don't confuse leading `(' in comment or quote with start of
argument collection.
* src/m4.h (enum token_type): Add TOKEN_OPEN, TOKEN_COMMA,
TOKEN_CLOSE.
(peek_input): Make private to input.c.
(peek_token): New prototype.
* src/input.c (default_word_regexp): Reduce ifdefs.
(peek_input): Make static.
(next_token): Return new token types.
(match_input, MATCH): Add argument consume, which controls
whether match should be pushed back.
(peek_token): New function.
(token_type_string) [DEBUG_INPUT]: New function.
* src/macro.c (expand_token, expand_argument, collect_arguments):
Handle new token types.
* doc/m4.texinfo (Changequote, Changecom): Document this.
* NEWS: Document this.
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.47
diff -u -r1.1.1.1.2.47 NEWS
--- NEWS 1 Aug 2006 13:05:45 -0000 1.1.1.1.2.47
+++ NEWS 2 Aug 2006 23:16:47 -0000
@@ -17,12 +17,14 @@
collection.
* The dnl macro now warns if end of file is encountered instead of a
newline.
-* The error message when end of file is encountered now uses the file where
- the dangling construct started, rather than "NONE:0".
+* The error message when end of file is encountered now uses the file and
+ line where the dangling construct started, rather than `NONE:0'.
* The __file__ macro, and the -s/--synclines option, now show what
directory a file was found in when the -I/--include option or M4PATH
variable had an effect.
-* The changequote and changecom macros now work with 8-bit characters.
+* The changequote and changecom macros now work with 8-bit characters, and
+ quotes and strings that begin with `(' are properly recognized following
+ a word.
Version 1.4.5 - 15 July 2006, by Eric Blake (CVS version 1.4.4c)
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.59
diff -u -r1.1.1.1.2.59 m4.texinfo
--- doc/m4.texinfo 1 Aug 2006 13:05:45 -0000 1.1.1.1.2.59
+++ doc/m4.texinfo 2 Aug 2006 23:16:47 -0000
@@ -2420,6 +2420,37 @@
@result{} hi HI
@end example
+Quotes are recognized in preference to argument collection. In
+particular, if @var{start} is a single @samp{(}, then argument
+collection is effectively disabled. For portability with other
+implementations, it is a good idea to avoid @samp{(}, @samp{,}, and
address@hidden)} as the first character in @var{start}.
+
address@hidden
+define(`echo', `$#:$@:')
address@hidden
+define(`hi', `HI')
address@hidden
+changequote(`(',`)')
address@hidden
+echo(hi)
address@hidden::hi
+changequote
address@hidden
+changequote(`((', `))')
address@hidden
+echo(hi)
address@hidden:HI:
+echo((hi))
address@hidden::hi
+changequote
address@hidden
+changequote(`,', `)')
address@hidden
+echo(hi,hi)bye)
address@hidden:HIhibye:
address@hidden example
+
If @var{end} is a prefix of @var{start}, the end-quote will be
recognized in preference to a nested begin-quote. In particular,
changing the quotes to have the same string for @var{start} and
@@ -2529,10 +2560,11 @@
@end ignore
Comments are recognized in preference to macros. However, this is not
-compatible with other implementations, where macros take precedence over
-comments, so it may change in a future release. For portability, this
-means that @var{start} should not begin with a letter or @samp{_}
-(underscore).
+compatible with other implementations, where macros and even quoting
+takes precedence over comments, so it may change in a future release.
+For portability, this means that @var{start} should not begin with a
+letter or @samp{_} (underscore), and that neither the start-quote nor
+the start-comment string should be a prefix of the other.
@example
define(`hi', `HI')
@@ -2543,6 +2575,35 @@
@result{}q hi Q HI
@end example
+Comments are recognized in preference to argument collection. In
+particular, if @var{start} is a single @samp{(}, then argument
+collection is effectively disabled. For portability with other
+implementations, it is a good idea to avoid @samp{(}, @samp{,}, and
address@hidden)} as the first character in @var{start}.
+
address@hidden
+define(`echo', `$#:$@:')
address@hidden
+define(`hi', `HI')
address@hidden
+changecom(`(',`)')
address@hidden
+echo(hi)
address@hidden::(hi)
+changecom
address@hidden
+changecom(`((', `))')
address@hidden
+echo(hi)
address@hidden:HI:
+echo((hi))
address@hidden::((hi))
+changecom(`,', `)')
address@hidden
+echo(hi,hi)bye)
address@hidden:HI,hi)bye:
address@hidden example
+
It is an error if the end of file occurs within a comment.
@example
Index: src/input.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/input.c,v
retrieving revision 1.1.1.1.2.16
diff -u -r1.1.1.1.2.16 input.c
--- src/input.c 2 Aug 2006 15:11:58 -0000 1.1.1.1.2.16
+++ src/input.c 2 Aug 2006 23:16:47 -0000
@@ -140,14 +140,20 @@
#ifdef ENABLE_CHANGEWORD
-#define DEFAULT_WORD_REGEXP "[_a-zA-Z][_a-zA-Z0-9]*"
+# define DEFAULT_WORD_REGEXP "[_a-zA-Z][_a-zA-Z0-9]*"
static char *word_start;
static struct re_pattern_buffer word_regexp;
static int default_word_regexp;
static struct re_registers regs;
-#endif /* ENABLE_CHANGEWORD */
+#else /* ! ENABLE_CHANGEWORD */
+# define default_word_regexp 1
+#endif /* ! ENABLE_CHANGEWORD */
+
+#ifdef DEBUG_INPUT
+static const char *token_type_string (token_type);
+#endif
/*-------------------------------------------------------------------------.
@@ -229,7 +235,7 @@
}
next = (input_block *) obstack_alloc (current_input,
- sizeof (struct input_block));
+ sizeof (struct input_block));
next->type = INPUT_STRING;
return current_input;
}
@@ -278,7 +284,7 @@
{
input_block *i;
i = (input_block *) obstack_alloc (wrapup_stack,
- sizeof (struct input_block));
+ sizeof (struct input_block));
i->prev = wsp;
i->type = INPUT_STRING;
i->u.u_s.string = obstack_copy0 (wrapup_stack, s, strlen (s));
@@ -309,16 +315,16 @@
isp->u.u_f.name, isp->u.u_f.lineno);
if (ferror (isp->u.u_f.file))
- {
- M4ERROR ((warning_status, 0, "read error"));
- fclose (isp->u.u_f.file);
- retcode = EXIT_FAILURE;
- }
+ {
+ M4ERROR ((warning_status, 0, "read error"));
+ fclose (isp->u.u_f.file);
+ retcode = EXIT_FAILURE;
+ }
else if (fclose (isp->u.u_f.file) == EOF)
- {
- M4ERROR ((warning_status, errno, "error reading file"));
- retcode = EXIT_FAILURE;
- }
+ {
+ M4ERROR ((warning_status, errno, "error reading file"));
+ retcode = EXIT_FAILURE;
+ }
current_file = isp->u.u_f.name;
current_line = isp->u.u_f.lineno;
output_current_line = isp->u.u_f.out_lineno;
@@ -409,7 +415,7 @@
| input stack. |
`------------------------------------------------------------------------*/
-int
+static int
peek_input (void)
{
int ch;
@@ -536,36 +542,48 @@
}
-/*----------------------------------------------------------------------.
-| This function is for matching a string against a prefix of the input |
-| stream. If the string matches the input, the input is discarded, |
-| otherwise the characters read are pushed back again. The function is |
-| used only when multicharacter quotes or comment delimiters are used. |
-`----------------------------------------------------------------------*/
+/*------------------------------------------------------------------.
+| This function is for matching a string against a prefix of the |
+| input stream. If the string matches the input and consume is |
+| TRUE, the input is discarded; otherwise any characters read are |
+| pushed back again. The function is used only when multicharacter |
+| quotes or comment delimiters are used. |
+`------------------------------------------------------------------*/
-static int
-match_input (const char *s)
+static boolean
+match_input (const char *s, boolean consume)
{
int n; /* number of characters matched */
int ch; /* input character */
const char *t;
+ boolean result = FALSE;
ch = peek_input ();
if (ch != to_uchar (*s))
- return 0; /* fail */
- (void) next_char ();
+ return FALSE; /* fail */
if (s[1] == '\0')
- return 1; /* short match */
+ {
+ if (consume)
+ (void) next_char ();
+ return TRUE; /* short match */
+ }
- for (n = 1, t = s++; (ch = peek_input ()) == to_uchar (*s++); n++)
+ (void) next_char ();
+ for (n = 1, t = s++; (ch = peek_input ()) == to_uchar (*s++); )
{
(void) next_char ();
+ n++;
if (*s == '\0') /* long match */
- return 1;
+ {
+ if (consume)
+ return TRUE;
+ result = TRUE;
+ break;
+ }
}
- /* Failed, push back input. */
+ /* Failed or shouldn't consume, push back input. */
{
struct obstack *h = push_string_init ();
@@ -573,20 +591,23 @@
obstack_grow (h, t, n);
}
push_string_finish ();
- return 0;
+ return result;
}
-/*------------------------------------------------------------------------.
-| The macro MATCH() is used to match a string against the input. The |
-| first character is handled inline, for speed. Hopefully, this will not |
-| hurt efficiency too much when single character quotes and comment |
-| delimiters are used. |
-`------------------------------------------------------------------------*/
+/*--------------------------------------------------------------------.
+| The macro MATCH() is used to match a string S against the input. |
+| The first character is handled inline, for speed. Hopefully, this |
+| will not hurt efficiency too much when single character quotes and |
+| comment delimiters are used. If CONSUME, then CH is the result of |
+| next_char, and a successful match will discard the matched string. |
+| Otherwise, CH is the result of peek_char, and the input stream is |
+| effectively unchanged. |
+`--------------------------------------------------------------------*/
-#define MATCH(ch, s) \
+#define MATCH(ch, s, consume) \
(to_uchar ((s)[0]) == (ch) \
&& (ch) != '\0' \
- && ((s)[1] == '\0' || (match_input ((s) + 1))))
+ && ((s)[1] == '\0' || (match_input ((s) + (consume), consume))))
/*----------------------------------------------------------.
@@ -770,16 +791,17 @@
(void) next_char ();
#ifdef DEBUG_INPUT
fprintf (stderr, "next_token -> MACDEF (%s)\n",
- find_builtin_by_addr (TOKEN_DATA_FUNC (td))->name);
+ find_builtin_by_addr (TOKEN_DATA_FUNC (td))->name);
#endif
return TOKEN_MACDEF;
}
(void) next_char ();
- if (MATCH (ch, bcomm.string))
+ if (MATCH (ch, bcomm.string, TRUE))
{
obstack_grow (&token_stack, bcomm.string, bcomm.length);
- while ((ch = next_char ()) != CHAR_EOF && !MATCH (ch, ecomm.string))
+ while ((ch = next_char ()) != CHAR_EOF
+ && !MATCH (ch, ecomm.string, TRUE))
obstack_1grow (&token_stack, ch);
if (ch != CHAR_EOF)
obstack_grow (&token_stack, ecomm.string, ecomm.length);
@@ -791,11 +813,7 @@
type = TOKEN_STRING;
}
-#ifdef ENABLE_CHANGEWORD
else if (default_word_regexp && (isalpha (ch) || ch == '_'))
-#else
- else if (isalpha (ch) || ch == '_')
-#endif
{
obstack_1grow (&token_stack, ch);
while ((ch = peek_input ()) != CHAR_EOF && (isalnum (ch) || ch == '_'))
@@ -812,7 +830,7 @@
{
obstack_1grow (&token_stack, ch);
while (1)
- {
+ {
ch = peek_input ();
if (ch == CHAR_EOF)
break;
@@ -844,9 +862,23 @@
#endif /* ENABLE_CHANGEWORD */
- else if (!MATCH (ch, lquote.string))
+ else if (!MATCH (ch, lquote.string, TRUE))
{
- type = TOKEN_SIMPLE;
+ switch (ch)
+ {
+ case '(':
+ type = TOKEN_OPEN;
+ break;
+ case ',':
+ type = TOKEN_COMMA;
+ break;
+ case ')':
+ type = TOKEN_CLOSE;
+ break;
+ default:
+ type = TOKEN_SIMPLE;
+ break;
+ }
obstack_1grow (&token_stack, ch);
}
else
@@ -861,13 +893,13 @@
error_at_line (EXIT_FAILURE, 0, file, line,
"ERROR: end of file in string");
- if (MATCH (ch, rquote.string))
+ if (MATCH (ch, rquote.string, TRUE))
{
if (--quote_level == 0)
break;
obstack_grow (&token_stack, rquote.string, rquote.length);
}
- else if (MATCH (ch, lquote.string))
+ else if (MATCH (ch, lquote.string, TRUE))
{
quote_level++;
obstack_grow (&token_stack, lquote.string, lquote.length);
@@ -888,20 +920,127 @@
TOKEN_DATA_ORIG_TEXT (td) = orig_text;
#endif
#ifdef DEBUG_INPUT
- fprintf (stderr, "next_token -> %d (%s)\n", type, TOKEN_DATA_TEXT (td));
+ fprintf (stderr, "next_token -> %s (%s)\n",
+ token_type_string (type), TOKEN_DATA_TEXT (td));
#endif
return type;
}
+
+/*-----------------------------------------------.
+| Peek at the next token from the input stream. |
+`-----------------------------------------------*/
+
+token_type
+peek_token (void)
+{
+ int ch = peek_input ();
+
+ if (ch == CHAR_EOF)
+ {
+#ifdef DEBUG_INPUT
+ fprintf (stderr, "peek_token -> EOF\n");
+#endif
+ return TOKEN_EOF;
+ }
+ if (ch == CHAR_MACRO)
+ {
+#ifdef DEBUG_INPUT
+ fprintf (stderr, "peek_token -> MACDEF\n");
+#endif
+ return TOKEN_MACDEF;
+ }
+
+ if (MATCH (ch, bcomm.string, FALSE))
+ {
+#ifdef DEBUG_INPUT
+ fprintf (stderr, "peek_token -> COMMENT\n");
+#endif
+ return TOKEN_STRING;
+ }
+
+ if ((default_word_regexp && (isalpha (ch) || ch == '_'))
+#ifdef ENABLE_CHANGEWORD
+ || (! default_word_regexp && strchr (word_start, ch))
+#endif /* ENABLE_CHANGEWORD */
+ )
+ {
+#ifdef DEBUG_INPUT
+ fprintf (stderr, "peek_token -> WORD\n");
+#endif
+ return TOKEN_WORD;
+ }
+
+ if (MATCH (ch, lquote.string, FALSE))
+ {
+#ifdef DEBUG_INPUT
+ fprintf (stderr, "peek_token -> QUOTE\n");
+#endif
+ return TOKEN_STRING;
+ }
+
+ switch (ch)
+ {
+ case '(':
+#ifdef DEBUG_INPUT
+ fprintf (stderr, "peek_token -> OPEN\n");
+#endif
+ return TOKEN_OPEN;
+ case ',':
+#ifdef DEBUG_INPUT
+ fprintf (stderr, "peek_token -> COMMA\n");
+#endif
+ return TOKEN_COMMA;
+ case ')':
+#ifdef DEBUG_INPUT
+ fprintf (stderr, "peek_token -> CLOSE\n");
+#endif
+ return TOKEN_CLOSE;
+ default:
+#ifdef DEBUG_INPUT
+ fprintf (stderr, "peek_token -> SIMPLE\n");
+#endif
+ return TOKEN_SIMPLE;
+ }
+}
#ifdef DEBUG_INPUT
+static const char *
+token_type_string (token_type t)
+{
+ switch (t)
+ { /* TOKSW */
+ case TOKEN_EOF:
+ return "EOF";
+ case TOKEN_STRING:
+ return "STRING";
+ case TOKEN_WORD:
+ return "WORD";
+ case TOKEN_OPEN:
+ return "OPEN";
+ case TOKEN_COMMA:
+ return "COMMA";
+ case TOKEN_CLOSE:
+ return "CLOSE";
+ case TOKEN_SIMPLE:
+ return "SIMPLE";
+ case TOKEN_MACDEF:
+ return "MACDEF";
+ default:
+ abort ();
+ }
+ }
+
static void
print_token (const char *s, token_type t, token_data *td)
{
fprintf (stderr, "%s: ", s);
switch (t)
{ /* TOKSW */
+ case TOKEN_OPEN:
+ case TOKEN_COMMA:
+ case TOKEN_CLOSE:
case TOKEN_SIMPLE:
fprintf (stderr, "char:");
break;
Index: src/m4.h
===================================================================
RCS file: /sources/m4/m4/src/m4.h,v
retrieving revision 1.1.1.1.2.23
diff -u -r1.1.1.1.2.23 m4.h
--- src/m4.h 30 Jul 2006 23:46:51 -0000 1.1.1.1.2.23
+++ src/m4.h 2 Aug 2006 23:16:47 -0000
@@ -219,10 +219,13 @@
enum token_type
{
TOKEN_EOF, /* end of file */
- TOKEN_STRING, /* a quoted string */
+ TOKEN_STRING, /* a quoted string or comment */
TOKEN_WORD, /* an identifier */
- TOKEN_SIMPLE, /* a single character */
- TOKEN_MACDEF /* a macros definition (see "defn") */
+ TOKEN_OPEN, /* ( */
+ TOKEN_COMMA, /* , */
+ TOKEN_CLOSE, /* ) */
+ TOKEN_SIMPLE, /* any other single character */
+ TOKEN_MACDEF /* a macro's definition (see "defn") */
};
/* The data for a token, a macro argument, and a macro definition. */
@@ -262,7 +265,7 @@
typedef enum token_data_type token_data_type;
void input_init (void);
-int peek_input (void);
+token_type peek_token (void);
token_type next_token (token_data *);
void skip_line (void);
Index: src/macro.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/macro.c,v
retrieving revision 1.1.1.1.2.8
diff -u -r1.1.1.1.2.8 macro.c
--- src/macro.c 1 Aug 2006 13:05:45 -0000 1.1.1.1.2.8
+++ src/macro.c 2 Aug 2006 23:16:47 -0000
@@ -66,6 +66,9 @@
case TOKEN_MACDEF:
break;
+ case TOKEN_OPEN:
+ case TOKEN_COMMA:
+ case TOKEN_CLOSE:
case TOKEN_SIMPLE:
case TOKEN_STRING:
shipout_text (obs, TOKEN_DATA_TEXT (td), strlen (TOKEN_DATA_TEXT (td)));
@@ -76,7 +79,7 @@
if (sym == NULL || SYMBOL_TYPE (sym) == TOKEN_VOID
|| (SYMBOL_TYPE (sym) == TOKEN_FUNC
&& SYMBOL_BLIND_NO_ARGS (sym)
- && peek_input () != '('))
+ && peek_token () != TOKEN_OPEN))
{
#ifdef ENABLE_CHANGEWORD
shipout_text (obs, TOKEN_DATA_ORIG_TEXT (td),
@@ -134,11 +137,10 @@
switch (t)
{ /* TOKSW */
- case TOKEN_SIMPLE:
- text = TOKEN_DATA_TEXT (&td);
- if ((*text == ',' || *text == ')') && paren_level == 0)
+ case TOKEN_COMMA:
+ case TOKEN_CLOSE:
+ if (paren_level == 0)
{
-
/* The argument MUST be finished, whether we want it or not. */
obstack_1grow (obs, '\0');
text = obstack_finish (obs);
@@ -148,8 +150,12 @@
TOKEN_DATA_TYPE (argp) = TOKEN_TEXT;
TOKEN_DATA_TEXT (argp) = text;
}
- return (boolean) (*TOKEN_DATA_TEXT (&td) == ',');
+ return (boolean) (t == TOKEN_COMMA);
}
+ /* fallthru */
+ case TOKEN_OPEN:
+ case TOKEN_SIMPLE:
+ text = TOKEN_DATA_TEXT (&td);
if (*text == '(')
paren_level++;
@@ -198,7 +204,6 @@
collect_arguments (symbol *sym, struct obstack *argptr,
struct obstack *arguments)
{
- int ch; /* lookahead for ( */
token_data td;
token_data *tdp;
boolean more_args;
@@ -209,8 +214,7 @@
tdp = (token_data *) obstack_copy (arguments, &td, sizeof (td));
obstack_grow (argptr, &tdp, sizeof (tdp));
- ch = peek_input ();
- if (ch == '(')
+ if (peek_token () == TOKEN_OPEN)
{
next_token (&td); /* gobble parenthesis */
do
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch-1_4 tokens vs. argument collection,
Eric Blake <=