[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[11/18] argv_ref speedup: support composite arguments
From: |
Eric Blake |
Subject: |
[11/18] argv_ref speedup: support composite arguments |
Date: |
Tue, 22 Jan 2008 13:59:57 -0700 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Next in the series. Up till now, every byte of rescanned input has been
copied, so the argument collection engine could deal with contiguous text.
~ But with this patch, the argument collection engine has been taught how
to create composite tokens, where links in the token chain can come from
back-references in the input engine. Basically, the input engine has a
new placeholder (CHAR_QUOTE), similar to the placeholder for builtins,
which represents a series of rescanned bytes that came from the same
quoting rules. Meanwhile, all of the argv accessor methods will flatten
text from a composite token on an as-needed basis, rather than wasting
effort on flattening it up front when the argument is not used. As a
result, the amount of memory usage drops (dramatically on boxed recursion,
but even real-life autoconf and unboxed recursion test cases see some
benefits). More importantly, with less copying, m4 operates much faster
when rescanning back-references. This patch still flattens composite
arguments into contiguous text in push_arg (ie. no references to a
reference yet), and still handles argument lists one argument at a time,
so the speedup is all in a better coefficient and not due to any
complexity reduction.
2008-01-22 Eric Blake <address@hidden>
Stage 11: full circle for single argument references.
~ * src/m4.h (struct token_chain): Add quote_age member.
~ (struct token_data): Add end member to chain alternate.
~ (make_text_link): New prototype.
~ * src/input.c (CHAR_QUOTE): New macro.
~ (word_start): Pre-allocate.
~ (set_word_regexp): Simplify.
~ (make_text_link): Export, and handle new fields.
~ (next_char, next_char_1): Add parameter.
~ (append_quote_token): New function.
~ (match_input, next_token): Adjust callers to handle quoted input
~ blocks.
~ * src/macro.c (struct macro_arguments): Add wrapper member.
~ (expand_argument): Accept composite blocks from input engine.
~ (expand_macro): Reduce refcounts of composite arguments.
~ (collect_arguments, arg_token, arg_mark, make_argv_ref): Update to
~ use new fields.
~ (arg_type, arg_text, arg_equal, arg_len): Treat composite
~ arguments as text.
~ (push_arg, push_args): Handle composites.
- --
Don't work too hard, make some time for fun as well!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHlllN84KuGfSFAYARAn/zAJ4g1+FB+zY+1Wh/N3zyI6RxQBjrKgCffgKm
te0swNG/6ja6EH1Y5kxSdoo=
=VdmM
-----END PGP SIGNATURE-----
>From 5307d448bacdf7f588a95f7bc44c520ce80827a6 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Mon, 21 Jan 2008 12:04:45 -0700
Subject: [PATCH] Stage 11: full circle for single argument references.
Pass quoted strings through to argument collection in a single
action, so that an argument can be reused throughout macro
recursion if it remains unchanged.
Memory impact: noticeable improvement, due to more reuse in
argument collection stacks.
Speed impact: noticeable improvement, due to less copying.
* m4/m4module.h (m4_arg_text): Add parameter.
(M4ARG): Adjust.
* m4/m4private.h (CHAR_QUOTE): New input engine sentinel.
(m4__make_text_link): New prototype.
(struct m4_symbol_chain): Add quote_age member.
(struct m4_symbol_value): Add end member to chained symbol.
(struct m4_macro_args): Add wrapper member.
* m4/symtab.c (m4_symbol_value_print): Print composite tokens.
(m4_symbol_value_copy, m4_symbol_value_delete): Recognize
composite tokens.
* m4/input.c (make_text_link): Rename...
(m4__make_text_link): ...to this, and export.
(m4_push_string_finish): Adjust caller.
(make_text_link, m4__push_symbol): Update new field.
(file_read, builtin_read, string_read, composite_read, next_char):
Add parameter.
(m4_skip_line, match_input, consume_syntax): Adjust callers.
(append_quote_token): New function.
(m4__next_token): Pass quoted strings onto argument collection.
(m4_print_token) [DEBUG_INPUT]: Update.
* m4/macro.c (expand_argument): Collect composite arguments.
(collect_arguments): Update new field.
(expand_macro): Reduce ref-count of back-references after use.
(arg_mark, m4_arg_symbol, m4_make_argv_ref): Adjust to new member
names.
(m4_is_arg_text): Also recognize composite symbols as text.
(m4_arg_text, m4_arg_len): Merge composite symbols as needed.
(m4_arg_equal): Compare composite symbols.
(m4_push_arg, m4_push_args): Handle composite symbols.
(m4_arg_symbol): Relax assertion.
(process_macro): Use single-argument references.
* m4/output.c (m4_shipout_string_trunc): Update comment.
* tests/macros.at (Rescanning macros): Augment test.
Signed-off-by: Eric Blake <address@hidden>
---
ChangeLog | 43 ++++++++++
m4/input.c | 236 +++++++++++++++++++++++++++++++++++-------------------
m4/m4module.h | 9 +-
m4/m4private.h | 17 +++-
m4/macro.c | 239 +++++++++++++++++++++++++++++++++++++++++++++++--------
m4/output.c | 3 +-
m4/symtab.c | 150 ++++++++++++++++++++++++++---------
tests/macros.at | 20 +++++-
8 files changed, 557 insertions(+), 160 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index cc00596..782b475 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,46 @@
+2008-01-21 Eric Blake <address@hidden>
+
+ Stage 11: full circle for single argument references.
+ Pass quoted strings through to argument collection in a single
+ action, so that an argument can be reused throughout macro
+ recursion if it remains unchanged.
+ Memory impact: noticeable improvement, due to more reuse in
+ argument collection stacks.
+ Speed impact: noticeable improvement, due to less copying.
+ * m4/m4module.h (m4_arg_text): Add parameter.
+ (M4ARG): Adjust.
+ * m4/m4private.h (CHAR_QUOTE): New input engine sentinel.
+ (m4__make_text_link): New prototype.
+ (struct m4_symbol_chain): Add quote_age member.
+ (struct m4_symbol_value): Add end member to chained symbol.
+ (struct m4_macro_args): Add wrapper member.
+ * m4/symtab.c (m4_symbol_value_print): Print composite tokens.
+ (m4_symbol_value_copy, m4_symbol_value_delete): Recognize
+ composite tokens.
+ * m4/input.c (make_text_link): Rename...
+ (m4__make_text_link): ...to this, and export.
+ (m4_push_string_finish): Adjust caller.
+ (make_text_link, m4__push_symbol): Update new field.
+ (file_read, builtin_read, string_read, composite_read, next_char):
+ Add parameter.
+ (m4_skip_line, match_input, consume_syntax): Adjust callers.
+ (append_quote_token): New function.
+ (m4__next_token): Pass quoted strings onto argument collection.
+ (m4_print_token) [DEBUG_INPUT]: Update.
+ * m4/macro.c (expand_argument): Collect composite arguments.
+ (collect_arguments): Update new field.
+ (expand_macro): Reduce ref-count of back-references after use.
+ (arg_mark, m4_arg_symbol, m4_make_argv_ref): Adjust to new member
+ names.
+ (m4_is_arg_text): Also recognize composite symbols as text.
+ (m4_arg_text, m4_arg_len): Merge composite symbols as needed.
+ (m4_arg_equal): Compare composite symbols.
+ (m4_push_arg, m4_push_args): Handle composite symbols.
+ (m4_arg_symbol): Relax assertion.
+ (process_macro): Use single-argument references.
+ * m4/output.c (m4_shipout_string_trunc): Update comment.
+ * tests/macros.at (Rescanning macros): Augment test.
+
2008-01-16 Eric Blake <address@hidden>
Stage 10: avoid extra copying of strings and comments.
diff --git a/m4/input.c b/m4/input.c
index 6dcaac0..0dcb0ae 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -93,29 +93,28 @@
between input blocks must update the context accordingly. */
static int file_peek (m4_input_block *);
-static int file_read (m4_input_block *, m4 *, bool);
+static int file_read (m4_input_block *, m4 *, bool, bool);
static void file_unget (m4_input_block *, int);
static bool file_clean (m4_input_block *, m4 *, bool);
static void file_print (m4_input_block *, m4 *, m4_obstack *);
static int builtin_peek (m4_input_block *);
-static int builtin_read (m4_input_block *, m4 *, bool);
+static int builtin_read (m4_input_block *, m4 *, bool, bool);
static void builtin_unget (m4_input_block *, int);
static void builtin_print (m4_input_block *, m4 *, m4_obstack *);
static int string_peek (m4_input_block *);
-static int string_read (m4_input_block *, m4 *, bool);
+static int string_read (m4_input_block *, m4 *, bool, bool);
static void string_unget (m4_input_block *, int);
static void string_print (m4_input_block *, m4 *, m4_obstack *);
static int composite_peek (m4_input_block *);
-static int composite_read (m4_input_block *, m4 *, bool);
+static int composite_read (m4_input_block *, m4 *, bool, bool);
static void composite_unget (m4_input_block *, int);
static bool composite_clean (m4_input_block *, m4 *, bool);
static void composite_print (m4_input_block *, m4 *, m4_obstack *);
-static void make_text_link (m4_obstack *, m4_symbol_chain **,
- m4_symbol_chain **);
static void init_builtin_token (m4 *, m4_symbol_value *);
+static void append_quote_token (m4_obstack *, m4_symbol_value *);
static bool match_input (m4 *, const char *, bool);
-static int next_char (m4 *, bool);
+static int next_char (m4 *, bool, bool);
static int peek_char (m4 *);
static bool pop_input (m4 *, bool);
static void unget_input (int);
@@ -133,9 +132,10 @@ struct input_funcs
int (*peek_func) (m4_input_block *);
/* Read input, return an unsigned char, CHAR_BUILTIN if it is a
- builtin, or CHAR_RETRY if none available. If SAFE, then do not
- alter the current file or line. */
- int (*read_func) (m4_input_block *, m4 *, bool safe);
+ builtin, or CHAR_RETRY if none available. If ALLOW_QUOTE, then
+ CHAR_QUOTE may be returned. If SAFE, then do not alter the
+ current file or line. */
+ int (*read_func) (m4_input_block *, m4 *, bool allow_quote, bool safe);
/* Unread a single unsigned character or CHAR_BUILTIN, must be the
same character previously read by read_func. */
@@ -269,7 +269,8 @@ file_peek (m4_input_block *me)
}
static int
-file_read (m4_input_block *me, m4 *context, bool safe M4_GNUC_UNUSED)
+file_read (m4_input_block *me, m4 *context, bool allow_quote M4_GNUC_UNUSED,
+ bool safe M4_GNUC_UNUSED)
{
int ch;
@@ -397,7 +398,7 @@ builtin_peek (m4_input_block *me)
static int
builtin_read (m4_input_block *me, m4 *context M4_GNUC_UNUSED,
- bool safe M4_GNUC_UNUSED)
+ bool allow_quote M4_GNUC_UNUSED, bool safe M4_GNUC_UNUSED)
{
if (me->u.u_b.read)
return CHAR_RETRY;
@@ -479,7 +480,7 @@ string_peek (m4_input_block *me)
static int
string_read (m4_input_block *me, m4 *context M4_GNUC_UNUSED,
- bool safe M4_GNUC_UNUSED)
+ bool allow_quote M4_GNUC_UNUSED, bool safe M4_GNUC_UNUSED)
{
if (!me->u.u_s.len)
return CHAR_RETRY;
@@ -560,7 +561,7 @@ m4__push_symbol (m4 *context, m4_symbol_value *value,
size_t level)
next->funcs = &composite_funcs;
next->u.u_c.chain = next->u.u_c.end = NULL;
}
- make_text_link (current_input, &next->u.u_c.chain, &next->u.u_c.end);
+ m4__make_text_link (current_input, &next->u.u_c.chain, &next->u.u_c.end);
chain = (m4_symbol_chain *) obstack_alloc (current_input, sizeof *chain);
if (next->u.u_c.end)
next->u.u_c.end->next = chain;
@@ -568,6 +569,7 @@ m4__push_symbol (m4 *context, m4_symbol_value *value,
size_t level)
next->u.u_c.chain = chain;
next->u.u_c.end = chain;
chain->next = NULL;
+ chain->quote_age = m4_get_symbol_value_quote_age (value);
chain->str = m4_get_symbol_value_text (value);
chain->len = m4_get_symbol_value_len (value);
chain->level = level;
@@ -611,7 +613,8 @@ m4_push_string_finish (void)
next->u.u_s.len = len;
}
else
- make_text_link (current_input, &next->u.u_c.chain, &next->u.u_c.end);
+ m4__make_text_link (current_input, &next->u.u_c.chain,
+ &next->u.u_c.end);
next->prev = isp;
ret = isp = next;
input_change = true;
@@ -649,15 +652,19 @@ composite_peek (m4_input_block *me)
}
static int
-composite_read (m4_input_block *me, m4 *context, bool safe)
+composite_read (m4_input_block *me, m4 *context, bool allow_quote, bool safe)
{
m4_symbol_chain *chain = me->u.u_c.chain;
while (chain)
{
+ if (allow_quote && chain->quote_age == m4__quote_age (M4SYNTAX))
+ return CHAR_QUOTE;
if (chain->str)
{
if (chain->len)
{
+ /* Partial consumption invalidates quote age. */
+ chain->quote_age = 0;
chain->len--;
return to_uchar (*chain->str++);
}
@@ -668,8 +675,6 @@ composite_read (m4_input_block *me, m4 *context, bool safe)
assert (!"implemented yet");
abort ();
}
- if (safe)
- return CHAR_RETRY;
if (chain->level < SIZE_MAX)
m4__adjust_refcount (context, chain->level, false);
me->u.u_c.chain = chain = chain->next;
@@ -744,9 +749,9 @@ composite_print (m4_input_block *me, m4 *context,
m4_obstack *obs)
/* Given an obstack OBS, capture any unfinished text as a link in the
chain that starts at *START and ends at *END. START may be NULL if
*END is non-NULL. */
-static void
-make_text_link (m4_obstack *obs, m4_symbol_chain **start,
- m4_symbol_chain **end)
+void
+m4__make_text_link (m4_obstack *obs, m4_symbol_chain **start,
+ m4_symbol_chain **end)
{
m4_symbol_chain *chain;
size_t len = obstack_object_size (obs);
@@ -762,6 +767,7 @@ make_text_link (m4_obstack *obs, m4_symbol_chain **start,
*start = chain;
*end = chain;
chain->next = NULL;
+ chain->quote_age = 0;
chain->str = str;
chain->len = len;
chain->level = SIZE_MAX;
@@ -905,13 +911,43 @@ init_builtin_token (m4 *context, m4_symbol_value *token)
VALUE_MAX_ARGS (token) = block->u.u_b.builtin->max_args;
}
+/* When a QUOTE token is seen, convert VALUE to a composite (if it is
+ not one already), consisting of any unfinished text on OBS, as well
+ as the quoted token from the top of the input stack. Use OBS for
+ any additional allocations needed to store the token chain. */
+static void
+append_quote_token (m4_obstack *obs, m4_symbol_value *value)
+{
+ m4_symbol_chain *src_chain = isp->u.u_c.chain;
+ m4_symbol_chain *chain;
+ assert (isp->funcs == &composite_funcs && obs);
+
+ if (value->type == M4_SYMBOL_VOID)
+ {
+ value->type = M4_SYMBOL_COMP;
+ value->u.u_c.chain = value->u.u_c.end = NULL;
+ }
+ assert (value->type == M4_SYMBOL_COMP);
+ m4__make_text_link (obs, &value->u.u_c.chain, &value->u.u_c.end);
+ chain = (m4_symbol_chain *) obstack_copy (obs, src_chain, sizeof *chain);
+ if (value->u.u_c.end)
+ value->u.u_c.end->next = chain;
+ else
+ value->u.u_c.chain = chain;
+ value->u.u_c.end = chain;
+ value->u.u_c.end->next = NULL;
+ isp->u.u_c.chain = src_chain->next;
+}
+
/* Low level input is done a character at a time. The function
next_char () is used to read and advance the input to the next
- character. If RETRY, then avoid returning CHAR_RETRY by popping
- input. */
+ character. If ALLOW_QUOTE, and the current input matches the
+ current quote age, return CHAR_QUOTE and leave consumption of data
+ for append_quote_token. If RETRY, then avoid returning CHAR_RETRY
+ by popping input. */
static int
-next_char (m4 *context, bool retry)
+next_char (m4 *context, bool allow_quote, bool retry)
{
int ch;
@@ -931,7 +967,8 @@ next_char (m4 *context, bool retry)
}
assert (isp->funcs->read_func);
- while ((ch = isp->funcs->read_func (isp, context, !retry)) != CHAR_RETRY
+ while (((ch = isp->funcs->read_func (isp, context, allow_quote, !retry))
+ != CHAR_RETRY)
|| !retry)
{
/* if (!IS_IGNORE (ch)) */
@@ -960,7 +997,9 @@ peek_char (m4 *context)
assert (block->funcs->peek_func);
if ((ch = block->funcs->peek_func (block)) != CHAR_RETRY)
{
- return /* (IS_IGNORE (ch)) ? next_char (context, true) : */ ch;
+/* if (IS_IGNORE (ch)) */
+/* return next_char (context, false, true); */
+ return ch;
}
block = block->prev;
@@ -969,7 +1008,7 @@ peek_char (m4 *context)
/* The function unget_input () puts back a character on the input
stack, using an existing input_block if possible. This is not safe
- to call except immediately after next_char(context, false). */
+ to call except immediately after next_char(context, allow, false). */
static void
unget_input (int ch)
{
@@ -987,7 +1026,7 @@ m4_skip_line (m4 *context, const char *name)
const char *file = m4_get_current_file (context);
int line = m4_get_current_line (context);
- while ((ch = next_char (context, true)) != CHAR_EOF && ch != '\n')
+ while ((ch = next_char (context, false, true)) != CHAR_EOF && ch != '\n')
;
if (ch == CHAR_EOF)
/* current_file changed; use the previous value we cached. */
@@ -1032,14 +1071,14 @@ match_input (m4 *context, const char *s, bool consume)
if (s[1] == '\0')
{
if (consume)
- next_char (context, true);
+ next_char (context, false, true);
return true; /* short match */
}
- next_char (context, true);
+ next_char (context, false, true);
for (n = 1, t = s++; (ch = peek_char (context)) == to_uchar (*s++); )
{
- next_char (context, true);
+ next_char (context, false, true);
n++;
if (*s == '\0') /* long match */
{
@@ -1071,29 +1110,35 @@ match_input (m4 *context, const char *s, bool consume)
/* While the current input character has the given SYNTAX, append it
to OBS. Take care not to pop input source unless the next source
- would continue the chain. Return true unless the chain ended with
+ would continue the chain. Return true if the chain ended with
CHAR_EOF. */
static bool
consume_syntax (m4 *context, m4_obstack *obs, unsigned int syntax)
{
int ch;
+ bool allow_quote = m4__safe_quotes (M4SYNTAX);
assert (syntax);
while (1)
{
/* It is safe to call next_char without first checking
peek_char, except at input source boundaries, which we detect
- by CHAR_RETRY. We exploit the fact that CHAR_EOF and
- CHAR_MACRO do not satisfy any syntax categories. */
- while ((ch = next_char (context, false)) != CHAR_RETRY
+ by CHAR_RETRY. We exploit the fact that CHAR_EOF,
+ CHAR_BUILTIN, and CHAR_QUOTE do not satisfy any syntax
+ categories. */
+ while ((ch = next_char (context, allow_quote, false)) != CHAR_RETRY
&& m4_has_syntax (M4SYNTAX, ch, syntax))
- obstack_1grow (obs, ch);
- if (ch == CHAR_RETRY)
+ {
+ assert (ch < CHAR_EOF);
+ obstack_1grow (obs, ch);
+ }
+ if (ch == CHAR_RETRY || ch == CHAR_QUOTE)
{
ch = peek_char (context);
if (m4_has_syntax (M4SYNTAX, ch, syntax))
{
+ assert (ch < CHAR_EOF);
obstack_1grow (obs, ch);
- next_char (context, true);
+ next_char (context, false, true);
continue;
}
return ch == CHAR_EOF;
@@ -1141,13 +1186,13 @@ m4_input_exit (void)
}
-/* Parse and return a single token from the input stream, built in
- TOKEN. See m4__token_type for the valid return types, along with a
- description of what TOKEN will contain. If LINE is not NULL, set
- *LINE to the line number where the token starts. If OBS, expand
- safe tokens (strings and comments) directly into OBS rather than in
- a temporary staging area. Report errors (unterminated comments or
- strings) on behalf of CALLER, if non-NULL.
+/* Parse and return a single token from the input stream, constructed
+ into TOKEN. See m4__token_type for the valid return types, along
+ with a description of what TOKEN will contain. If LINE is not
+ NULL, set *LINE to the line number where the token starts. If OBS,
+ expand safe tokens (strings and comments) directly into OBS rather
+ than in a temporary staging area. Report errors (unterminated
+ comments or strings) on behalf of CALLER, if non-NULL.
If OBS is NULL or the token expansion is unknown, the token text is
collected on the obstack token_stack, which never contains more
@@ -1177,7 +1222,6 @@ m4__next_token (m4 *context, m4_symbol_value *token, int
*line,
do {
obstack_free (&token_stack, token_bottom);
-
/* Must consume an input character, but not until CHAR_BUILTIN is
handled. */
ch = peek_char (context);
@@ -1186,28 +1230,29 @@ m4__next_token (m4 *context, m4_symbol_value *token,
int *line,
#ifdef DEBUG_INPUT
xfprintf (stderr, "next_token -> EOF\n");
#endif
- next_char (context, true);
+ next_char (context, false, true);
return M4_TOKEN_EOF;
}
if (ch == CHAR_BUILTIN) /* BUILTIN TOKEN */
{
init_builtin_token (context, token);
- next_char (context, true);
+ next_char (context, false, true);
#ifdef DEBUG_INPUT
m4_print_token ("next_token", M4_TOKEN_MACDEF, token);
#endif
return M4_TOKEN_MACDEF;
}
- next_char (context, true); /* Consume character we already peeked at. */
+ /* Consume character we already peeked at. */
+ next_char (context, false, true);
file = m4_get_current_file (context);
*line = m4_get_current_line (context);
if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ESCAPE))
{ /* ESCAPED WORD */
obstack_1grow (&token_stack, ch);
- if ((ch = next_char (context, true)) != CHAR_EOF)
+ if ((ch = next_char (context, false, true)) < CHAR_EOF)
{
obstack_1grow (&token_stack, ch);
if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ALPHA))
@@ -1234,12 +1279,13 @@ m4__next_token (m4 *context, m4_symbol_value *token,
int *line,
quote_level = 1;
while (1)
{
- ch = next_char (context, true);
+ ch = next_char (context, obs && m4__quote_age (M4SYNTAX), true);
if (ch == CHAR_EOF)
m4_error_at_line (context, EXIT_FAILURE, 0, file, *line, caller,
_("end of file in string"));
-
- if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_RQUOTE))
+ if (ch == CHAR_QUOTE)
+ append_quote_token (obs, token);
+ else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_RQUOTE))
{
if (--quote_level == 0)
break;
@@ -1261,9 +1307,10 @@ m4__next_token (m4 *context, m4_symbol_value *token, int
*line,
if (obs)
obs_safe = obs;
quote_level = 1;
+ assert (!m4__quote_age (M4SYNTAX));
while (1)
{
- ch = next_char (context, true);
+ ch = next_char (context, false, true);
if (ch == CHAR_EOF)
m4_error_at_line (context, EXIT_FAILURE, 0, file, *line, caller,
_("end of file in string"));
@@ -1290,11 +1337,14 @@ m4__next_token (m4 *context, m4_symbol_value *token,
int *line,
if (obs && !m4_get_discard_comments_opt (context))
obs_safe = obs;
obstack_1grow (obs_safe, ch);
- while ((ch = next_char (context, true)) != CHAR_EOF
+ while ((ch = next_char (context, false, true)) < CHAR_EOF
&& !m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ECOMM))
obstack_1grow (obs_safe, ch);
if (ch != CHAR_EOF)
- obstack_1grow (obs_safe, ch);
+ {
+ assert (ch < CHAR_EOF);
+ obstack_1grow (obs_safe, ch);
+ }
else
m4_error_at_line (context, EXIT_FAILURE, 0, file, *line, caller,
_("end of file in comment"));
@@ -1308,12 +1358,15 @@ m4__next_token (m4 *context, m4_symbol_value *token,
int *line,
obs_safe = obs;
obstack_grow (obs_safe, context->syntax->bcomm.string,
context->syntax->bcomm.length);
- while ((ch = next_char (context, true)) != CHAR_EOF
+ while ((ch = next_char (context, false, true)) < CHAR_EOF
&& !MATCH (context, ch, context->syntax->ecomm.string, true))
obstack_1grow (obs_safe, ch);
if (ch != CHAR_EOF)
- obstack_grow (obs_safe, context->syntax->ecomm.string,
- context->syntax->ecomm.length);
+ {
+ assert (ch < CHAR_EOF);
+ obstack_grow (obs_safe, context->syntax->ecomm.string,
+ context->syntax->ecomm.length);
+ }
else
m4_error_at_line (context, EXIT_FAILURE, 0, file, *line, caller,
_("end of file in comment"));
@@ -1343,6 +1396,7 @@ m4__next_token (m4 *context, m4_symbol_value *token, int
*line,
else if (m4_is_syntax_single_quotes (M4SYNTAX)
&& m4_is_syntax_single_comments (M4SYNTAX))
{ /* EVERYTHING ELSE (SHORT QUOTES AND COMMENTS)
*/
+ assert (ch < CHAR_EOF);
obstack_1grow (&token_stack, ch);
if (m4_has_syntax (M4SYNTAX, ch,
@@ -1374,6 +1428,7 @@ m4__next_token (m4 *context, m4_symbol_value *token, int
*line,
}
else /* EVERYTHING ELSE (LONG QUOTES OR COMMENTS) */
{
+ assert (ch < CHAR_EOF);
obstack_1grow (&token_stack, ch);
if (m4_has_syntax (M4SYNTAX, ch,
@@ -1394,16 +1449,21 @@ m4__next_token (m4 *context, m4_symbol_value *token,
int *line,
}
} while (type == M4_TOKEN_NONE);
- if (obs_safe != obs)
+ if (token->type == M4_SYMBOL_VOID)
{
- len = obstack_object_size (&token_stack);
- obstack_1grow (&token_stack, '\0');
+ if (obs_safe != obs)
+ {
+ len = obstack_object_size (&token_stack);
+ obstack_1grow (&token_stack, '\0');
- m4_set_symbol_value_text (token, obstack_finish (&token_stack), len,
- m4__quote_age (M4SYNTAX));
+ m4_set_symbol_value_text (token, obstack_finish (&token_stack), len,
+ m4__quote_age (M4SYNTAX));
+ }
+ else
+ assert (type == M4_TOKEN_STRING);
}
else
- assert (type == M4_TOKEN_STRING);
+ assert (token->type == M4_SYMBOL_COMP && type == M4_TOKEN_STRING);
VALUE_MAX_ARGS (token) = -1;
#ifdef DEBUG_INPUT
@@ -1440,46 +1500,58 @@ m4__next_token_is_open (m4 *context)
int
m4_print_token (const char *s, m4__token_type type, m4_symbol_value *token)
{
- xfprintf (stderr, "%s: ", s ? s : "m4input");
+ m4_obstack obs;
+ size_t len;
+
+ obstack_init (&obs);
+ if (!s)
+ s = "m4input";
+ obstack_grow (&obs, s, strlen (s));
+ obstack_1grow (&obs, ':');
+ obstack_1grow (&obs, ' ');
switch (type)
{ /* TOKSW */
case M4_TOKEN_EOF:
- xfprintf (stderr, "eof\n");
+ obstack_grow (&obs, "eof", strlen ("eof"));
+ token = NULL;
break;
case M4_TOKEN_NONE:
- xfprintf (stderr, "none\n");
+ obstack_grow (&obs, "none", strlen ("none"));
+ token = NULL;
break;
case M4_TOKEN_STRING:
- xfprintf (stderr, "string\t\"%s\"\n", m4_get_symbol_value_text (token));
+ obstack_grow (&obs, "string\t", strlen ("string\t"));
break;
case M4_TOKEN_SPACE:
- xfprintf (stderr, "space\t\"%s\"\n", m4_get_symbol_value_text (token));
+ obstack_grow (&obs, "space\t", strlen ("space\t"));
break;
case M4_TOKEN_WORD:
- xfprintf (stderr, "word\t\"%s\"\n", m4_get_symbol_value_text (token));
+ obstack_grow (&obs, "word\t", strlen ("word\t"));
break;
case M4_TOKEN_OPEN:
- xfprintf (stderr, "open\t\"%s\"\n", m4_get_symbol_value_text (token));
+ obstack_grow (&obs, "open\t", strlen ("open\t"));
break;
case M4_TOKEN_COMMA:
- xfprintf (stderr, "comma\t\"%s\"\n", m4_get_symbol_value_text (token));
+ obstack_grow (&obs, "comma\t", strlen ("comma\t"));
break;
case M4_TOKEN_CLOSE:
- xfprintf (stderr, "close\t\"%s\"\n", m4_get_symbol_value_text (token));
+ obstack_grow (&obs, "close\t", strlen ("close\t"));
break;
case M4_TOKEN_SIMPLE:
- xfprintf (stderr, "simple\t\"%s\"\n", m4_get_symbol_value_text (token));
+ obstack_grow (&obs, "simple\t", strlen ("simple\t"));
break;
case M4_TOKEN_MACDEF:
- {
- const m4_builtin *bp;
- bp = m4_builtin_find_by_func (NULL, m4_get_symbol_value_func (token));
- assert (bp);
- xfprintf (stderr, "builtin\t<%s>{%s}\n", bp->name,
- m4_get_module_name (VALUE_MODULE (token)));
- }
+ obstack_grow (&obs, "builtin\t", strlen ("builtin\t"));
break;
+ default:
+ abort ();
}
+ if (token)
+ m4_symbol_value_print (token, &obs, true, "\"", "\"", SIZE_MAX, NULL);
+ obstack_1grow (&obs, '\n');
+ len = obstack_object_size (&obs);
+ fwrite (obstack_finish (&obs), 1, len, stderr);
+ obstack_free (&obs, NULL);
return 0;
}
#endif /* DEBUG_INPUT */
diff --git a/m4/m4module.h b/m4/m4module.h
index 03025af..330a90e 100644
--- a/m4/m4module.h
+++ b/m4/m4module.h
@@ -1,7 +1,7 @@
/* GNU m4 -- A simple macro processor
Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 1999, 2000, 2003,
- 2004, 2005, 2006, 2007 Free Software Foundation, Inc.
+ 2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc.
This file is part of GNU M4.
@@ -102,8 +102,9 @@ struct m4_macro
m4_module_import (context, STR (M), STR (S), obs)
/* Grab the text contents of argument I, or abort if the argument is
- not text. Assumes that `m4_macro_args *argv' is in scope. */
-#define M4ARG(i) m4_arg_text (argv, i)
+ not text. Assumes that `m4 *context' and `m4_macro_args *argv' are
+ in scope. */
+#define M4ARG(i) m4_arg_text (context, argv, i)
extern bool m4_bad_argc (m4 *, int, const char *,
unsigned int, unsigned int, bool);
@@ -304,7 +305,7 @@ extern unsigned int m4_arg_argc (m4_macro_args
*);
extern m4_symbol_value *m4_arg_symbol (m4_macro_args *, unsigned int);
extern bool m4_is_arg_text (m4_macro_args *, unsigned int);
extern bool m4_is_arg_func (m4_macro_args *, unsigned int);
-extern const char *m4_arg_text (m4_macro_args *, unsigned int);
+extern const char *m4_arg_text (m4 *, m4_macro_args *, unsigned int);
extern bool m4_arg_equal (m4_macro_args *, unsigned int,
unsigned int);
extern bool m4_arg_empty (m4_macro_args *, unsigned int);
diff --git a/m4/m4private.h b/m4/m4private.h
index 630a9b7..6a08455 100644
--- a/m4/m4private.h
+++ b/m4/m4private.h
@@ -35,7 +35,7 @@ typedef enum {
M4_SYMBOL_TEXT, /* Plain text, u.u_t is valid. */
M4_SYMBOL_FUNC, /* Builtin function, u.func is valid. */
M4_SYMBOL_PLACEHOLDER, /* Placeholder for unknown builtin from -R. */
- M4_SYMBOL_COMP /* Composite symbol, u.chain is valid. */
+ M4_SYMBOL_COMP /* Composite symbol, u.u_c.c is valid. */
} m4__symbol_type;
#define BIT_TEST(flags, bit) (((flags) & (bit)) == (bit))
@@ -197,6 +197,7 @@ struct m4_symbol
struct m4_symbol_chain
{
m4_symbol_chain *next;/* Pointer to next link of chain. */
+ unsigned int quote_age; /* Quote_age of this link of chain, or 0. */
const char *str; /* NUL-terminated string if text, or NULL. */
size_t len; /* Length of str, or 0. */
size_t level; /* Expansion level of content, or SIZE_MAX. */
@@ -230,7 +231,11 @@ struct m4_symbol_value
unsigned int quote_age;
} u_t; /* Valid when type is TEXT, PLACEHOLDER. */
const m4_builtin * builtin;/* Valid when type is FUNC. */
- m4_symbol_chain * chain; /* Valid when type is COMP. */
+ struct
+ {
+ m4_symbol_chain * chain; /* First link of the chain. */
+ m4_symbol_chain * end; /* Last link of the chain. */
+ } u_c; /* Valid when type is COMP. */
} u;
};
@@ -248,6 +253,9 @@ struct m4_macro_args
bool_bitfield inuse : 1;
/* False if all arguments are just text or func, true if this argv
refers to another one. */
+ bool_bitfield wrapper : 1;
+ /* False if all arguments belong to this argv, true if some of them
+ include references to another. */
bool_bitfield has_ref : 1;
const char *argv0; /* The macro name being expanded. */
size_t argv0_len; /* Length of argv0. */
@@ -365,7 +373,8 @@ extern void m4__symtab_remove_module_references
(m4_symbol_table*,
all other characters and sentinels. */
#define CHAR_EOF 256 /* Character return on EOF. */
#define CHAR_BUILTIN 257 /* Character return for BUILTIN token. */
-#define CHAR_RETRY 258 /* Character return for end of input block. */
+#define CHAR_QUOTE 258 /* Character return for quoted string. */
+#define CHAR_RETRY 259 /* Character return for end of input block. */
#define DEF_LQUOTE "`" /* Default left quote delimiter. */
#define DEF_RQUOTE "\'" /* Default right quote delimiter. */
@@ -451,6 +460,8 @@ typedef enum {
M4_TOKEN_MACDEF /* Macro's definition (see "defn"), M4_SYMBOL_FUNC. */
} m4__token_type;
+extern void m4__make_text_link (m4_obstack *, m4_symbol_chain **,
+ m4_symbol_chain **);
extern bool m4__push_symbol (m4 *, m4_symbol_value *, size_t);
extern m4__token_type m4__next_token (m4 *, m4_symbol_value *, int *,
m4_obstack *, const char *);
diff --git a/m4/macro.c b/m4/macro.c
index 9963409..683dd26 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -334,9 +334,15 @@ expand_argument (m4 *context, m4_obstack *obs,
m4_symbol_value *argp,
len = obstack_object_size (obs);
if (argp->type == M4_SYMBOL_FUNC && !len)
return type == M4_TOKEN_COMMA;
- obstack_1grow (obs, '\0');
- VALUE_MODULE (argp) = NULL;
- m4_set_symbol_value_text (argp, obstack_finish (obs), len, age);
+ if (argp->type != M4_SYMBOL_COMP)
+ {
+ obstack_1grow (obs, '\0');
+ VALUE_MODULE (argp) = NULL;
+ m4_set_symbol_value_text (argp, obstack_finish (obs), len,
+ age);
+ }
+ else
+ m4__make_text_link (obs, NULL, &argp->u.u_c.end);
return type == M4_TOKEN_COMMA;
}
/* fallthru */
@@ -360,6 +366,20 @@ expand_argument (m4 *context, m4_obstack *obs,
m4_symbol_value *argp,
case M4_TOKEN_STRING:
if (!expand_token (context, obs, type, &token, line, first))
age = 0;
+ if (token.type == M4_SYMBOL_COMP)
+ {
+ if (argp->type != M4_SYMBOL_COMP)
+ {
+ argp->type = M4_SYMBOL_COMP;
+ argp->u.u_c.chain = token.u.u_c.chain;
+ }
+ else
+ {
+ assert (argp->u.u_c.end);
+ argp->u.u_c.end->next = token.u.u_c.chain;
+ }
+ argp->u.u_c.end = token.u.u_c.end;
+ }
break;
case M4_TOKEN_MACDEF:
@@ -502,8 +522,23 @@ recursion limit of %zu exceeded, use -L<N> to change it"),
if (BIT_TEST (VALUE_FLAGS (value), VALUE_DELETED_BIT))
m4_symbol_value_delete (value);
- /* If argv contains references, those refcounts can be reduced now. */
- /* TODO - support references in argv. */
+ /* If argv contains references, those refcounts must be reduced now. */
+ if (argv->has_ref)
+ {
+ m4_symbol_chain *chain;
+ size_t i;
+ for (i = 0; i < argv->arraylen; i++)
+ if (argv->array[i]->type == M4_SYMBOL_COMP)
+ {
+ chain = argv->array[i]->u.u_c.chain;
+ while (chain)
+ {
+ if (chain->level < SIZE_MAX)
+ m4__adjust_refcount (context, chain->level, false);
+ chain = chain->next;
+ }
+ }
+ }
/* We no longer need argv, so reduce the refcount. Additionally, if
no other references to argv were created, we can free our portion
@@ -550,6 +585,7 @@ collect_arguments (m4 *context, const char *name, size_t
len,
args.argc = 1;
args.inuse = false;
+ args.wrapper = false;
args.has_ref = false;
/* Must copy here, since we are consuming tokens, and since symbol
table can be changed during argument collection. */
@@ -587,11 +623,14 @@ collect_arguments (m4 *context, const char *name, size_t
len,
&& m4_get_symbol_value_len (tokenp)
&& m4_get_symbol_value_quote_age (tokenp) != args.quote_age)
args.quote_age = 0;
+ else if (tokenp->type == M4_SYMBOL_COMP)
+ args.has_ref = true;
}
while (more_args);
}
argv = (m4_macro_args *) obstack_finish (argv_stack);
argv->argc = args.argc;
+ argv->has_ref = args.has_ref;
if (args.quote_age != m4__quote_age (M4SYNTAX))
argv->quote_age = 0;
argv->arraylen = args.arraylen;
@@ -674,8 +713,7 @@ process_macro (m4 *context, m4_symbol_value *value,
m4_obstack *obs,
text = endp;
}
if (i < argc)
- m4_shipout_string (context, obs, M4ARG (i), m4_arg_len (argv, i),
- false);
+ m4_push_arg (context, obs, argv, i);
break;
case '#': /* number of arguments */
@@ -947,14 +985,14 @@ static void
arg_mark (m4_macro_args *argv)
{
argv->inuse = true;
- if (argv->has_ref)
+ if (argv->wrapper)
{
/* TODO for now we support only a single-length $@ chain. */
assert (argv->arraylen == 1
&& argv->array[0]->type == M4_SYMBOL_COMP
- && !argv->array[0]->u.chain->next
- && !argv->array[0]->u.chain->str);
- argv->array[0]->u.chain->argv->inuse = true;
+ && !argv->array[0]->u.u_c.chain->next
+ && !argv->array[0]->u.u_c.chain->str);
+ argv->array[0]->u.u_c.chain->argv->inuse = true;
}
}
@@ -970,7 +1008,7 @@ m4_arg_symbol (m4_macro_args *argv, unsigned int index)
if (argv->argc <= index)
return &empty_symbol;
- if (!argv->has_ref)
+ if (!argv->wrapper)
return argv->array[index - 1];
/* Must cycle through all array slots until we find index, since
wrappers can contain multiple arguments. */
@@ -979,7 +1017,7 @@ m4_arg_symbol (m4_macro_args *argv, unsigned int index)
value = argv->array[i];
if (value->type == M4_SYMBOL_COMP)
{
- m4_symbol_chain *chain = value->u.chain;
+ m4_symbol_chain *chain = value->u.u_c.chain;
/* TODO - for now we support only a single $@ chain. */
assert (!chain->next && !chain->str);
if (index < chain->argv->argc - (chain->index - 1))
@@ -994,7 +1032,6 @@ m4_arg_symbol (m4_macro_args *argv, unsigned int index)
else if (--index == 0)
break;
}
- assert (value->type != M4_SYMBOL_COMP);
return value;
}
@@ -1003,9 +1040,14 @@ m4_arg_symbol (m4_macro_args *argv, unsigned int index)
bool
m4_is_arg_text (m4_macro_args *argv, unsigned int index)
{
+ m4_symbol_value *value;
if (index == 0 || argv->argc <= index)
return true;
- return m4_is_symbol_value_text (m4_arg_symbol (argv, index));
+ value = m4_arg_symbol (argv, index);
+ /* Composite tokens are currently sequences of text only. */
+ if (m4_is_symbol_value_text (value) || value->type == M4_SYMBOL_COMP)
+ return true;
+ return false;
}
/* Given ARGV, return true if argument INDEX is a builtin function.
@@ -1020,37 +1062,125 @@ m4_is_arg_func (m4_macro_args *argv, unsigned int
index)
/* Given ARGV, return the text at argument INDEX. Abort if the
argument is not text. Index 0 is always text, and indices beyond
- argc return the empty string. */
+ argc return the empty string. The result is always NUL-terminated,
+ even if it includes embedded NUL characters. */
const char *
-m4_arg_text (m4_macro_args *argv, unsigned int index)
+m4_arg_text (m4 *context, m4_macro_args *argv, unsigned int index)
{
m4_symbol_value *value;
+ m4_symbol_chain *chain;
+ m4_obstack *obs;
if (index == 0)
return argv->argv0;
if (argv->argc <= index)
return "";
value = m4_arg_symbol (argv, index);
- return m4_get_symbol_value_text (value);
+ if (m4_is_symbol_value_text (value))
+ return m4_get_symbol_value_text (value);
+ /* TODO - concatenate argv refs and functions? For now, we assume
+ all chain elements are text. */
+ assert (value->type == M4_SYMBOL_COMP);
+ chain = value->u.u_c.chain;
+ obs = m4_arg_scratch (context);
+ while (chain)
+ {
+ assert (chain->str);
+ obstack_grow (obs, chain->str, chain->len);
+ chain = chain->next;
+ }
+ obstack_1grow (obs, '\0');
+ return (char *) obstack_finish (obs);
}
/* Given ARGV, compare text arguments INDEXA and INDEXB for equality.
Both indices must be non-zero. Return true if the arguments
contain the same contents; often more efficient than
- !strcmp (m4_arg_text (argv, indexa), m4_arg_text (argv, indexb)). */
+ !strcmp (m4_arg_text (context, argv, indexa),
+ m4_arg_text (context, argv, indexb)). */
bool
m4_arg_equal (m4_macro_args *argv, unsigned int indexa, unsigned int indexb)
{
m4_symbol_value *sa = m4_arg_symbol (argv, indexa);
m4_symbol_value *sb = m4_arg_symbol (argv, indexb);
+ m4_symbol_chain tmpa;
+ m4_symbol_chain tmpb;
+ m4_symbol_chain *ca = &tmpa;
+ m4_symbol_chain *cb = &tmpb;
+ /* Quick tests. */
if (sa == &empty_symbol || sb == &empty_symbol)
return sa == sb;
+ if (m4_is_symbol_value_text (sa) && m4_is_symbol_value_text (sb))
+ return (m4_get_symbol_value_len (sa) == m4_get_symbol_value_len (sb)
+ && memcmp (m4_get_symbol_value_text (sa),
+ m4_get_symbol_value_text (sb),
+ m4_get_symbol_value_len (sa)) == 0);
+
+ /* Convert both arguments to chains, if not one already. */
/* TODO - allow builtin tokens in the comparison? */
- assert (m4_is_symbol_value_text (sa) && m4_is_symbol_value_text (sb));
- return (m4_get_symbol_value_len (sa) == m4_get_symbol_value_len (sb)
- && strcmp (m4_get_symbol_value_text (sa),
- m4_get_symbol_value_text (sb)) == 0);
+ if (m4_is_symbol_value_text (sa))
+ {
+ tmpa.next = NULL;
+ tmpa.str = m4_get_symbol_value_text (sa);
+ tmpa.len = m4_get_symbol_value_len (sa);
+ }
+ else
+ {
+ assert (sa->type == M4_SYMBOL_COMP);
+ ca = sa->u.u_c.chain;
+ }
+ if (m4_is_symbol_value_text (sb))
+ {
+ tmpb.next = NULL;
+ tmpb.str = m4_get_symbol_value_text (sb);
+ tmpb.len = m4_get_symbol_value_len (sb);
+ }
+ else
+ {
+ assert (sb->type == M4_SYMBOL_COMP);
+ cb = sb->u.u_c.chain;
+ }
+
+ /* Compare each link of the chain. */
+ while (ca && cb)
+ {
+ /* TODO support comparison against $@ refs. */
+ assert (ca->str && cb->str);
+ if (ca->len == cb->len)
+ {
+ if (memcmp (ca->str, cb->str, ca->len) != 0)
+ return false;
+ ca = ca->next;
+ cb = cb->next;
+ }
+ else if (ca->len < cb->len)
+ {
+ if (memcmp (ca->str, cb->str, ca->len) != 0)
+ return false;
+ tmpb.next = cb->next;
+ tmpb.str = cb->str + ca->len;
+ tmpb.len = cb->len - ca->len;
+ ca = ca->next;
+ cb = &tmpb;
+ }
+ else
+ {
+ assert (cb->len < ca->len);
+ if (memcmp (ca->str, cb->str, cb->len) != 0)
+ return false;
+ tmpa.next = ca->next;
+ tmpa.str = ca->str + cb->len;
+ tmpa.len = ca->len - cb->len;
+ ca = &tmpa;
+ cb = cb->next;
+ }
+ }
+
+ /* If we get this far, the two arguments are equal only if both
+ chains are exhausted. */
+ assert (ca != cb || !ca);
+ return ca == cb;
}
/* Given ARGV, return true if argument INDEX is the empty string.
@@ -1069,13 +1199,28 @@ size_t
m4_arg_len (m4_macro_args *argv, unsigned int index)
{
m4_symbol_value *value;
+ m4_symbol_chain *chain;
+ size_t len;
if (index == 0)
return argv->argv0_len;
if (argv->argc <= index)
return 0;
value = m4_arg_symbol (argv, index);
- return m4_get_symbol_value_len (value);
+ if (m4_is_symbol_value_text (value))
+ return m4_get_symbol_value_len (value);
+ /* TODO - for now, we assume all chain links are text. */
+ assert (value->type == M4_SYMBOL_COMP);
+ chain = value->u.u_c.chain;
+ len = 0;
+ while (chain)
+ {
+ assert (chain->str);
+ len += chain->len;
+ chain = chain->next;
+ }
+ assert (len);
+ return len;
}
/* Given ARGV, return the builtin function referenced by argument
@@ -1105,11 +1250,11 @@ m4_make_argv_ref (m4 *context, m4_macro_args *argv,
const char *argv0,
/* When making a reference through a reference, point to the
original if possible. */
- if (argv->has_ref)
+ if (argv->wrapper)
{
/* TODO for now we support only a single-length $@ chain. */
assert (argv->arraylen == 1 && argv->array[0]->type == M4_SYMBOL_COMP);
- chain = argv->array[0]->u.chain;
+ chain = argv->array[0]->u.u_c.chain;
assert (!chain->next && !chain->str);
argv = chain->argv;
index += chain->index - 1;
@@ -1130,10 +1275,12 @@ m4_make_argv_ref (m4 *context, m4_macro_args *argv,
const char *argv0,
chain = (m4_symbol_chain *) obstack_alloc (obs, sizeof *chain);
new_argv->arraylen = 1;
new_argv->array[0] = value;
+ new_argv->wrapper = true;
new_argv->has_ref = true;
value->type = M4_SYMBOL_COMP;
- value->u.chain = chain;
+ value->u.u_c.chain = value->u.u_c.end = chain;
chain->next = NULL;
+ chain->quote_age = argv->quote_age;
chain->str = NULL;
chain->len = 0;
chain->level = context->expansion_level - 1;
@@ -1170,9 +1317,23 @@ m4_push_arg (m4 *context, m4_obstack *obs, m4_macro_args
*argv,
return;
}
/* TODO handle builtin tokens? */
- assert (value->type == M4_SYMBOL_TEXT);
- if (m4__push_symbol (context, value, context->expansion_level - 1))
- arg_mark (argv);
+ if (value->type == M4_SYMBOL_TEXT)
+ {
+ if (m4__push_symbol (context, value, context->expansion_level - 1))
+ arg_mark (argv);
+ }
+ else if (value->type == M4_SYMBOL_COMP)
+ {
+ /* TODO - really handle composites; for now, just flatten the
+ composite and push its text. */
+ m4_symbol_chain *chain = value->u.u_c.chain;
+ while (chain)
+ {
+ assert (chain->str);
+ obstack_grow (obs, chain->str, chain->len);
+ chain = chain->next;
+ }
+ }
}
/* Push series of comma-separated arguments from ARGV, which should
@@ -1184,6 +1345,7 @@ m4_push_args (m4 *context, m4_obstack *obs, m4_macro_args
*argv, bool skip,
bool quote)
{
m4_symbol_value *value;
+ m4_symbol_chain *chain;
unsigned int i = skip ? 2 : 1;
const char *sep = ",";
size_t sep_len = 1;
@@ -1226,8 +1388,21 @@ m4_push_args (m4 *context, m4_obstack *obs,
m4_macro_args *argv, bool skip,
else
use_sep = true;
/* TODO handle builtin tokens? */
- assert (value->type == M4_SYMBOL_TEXT);
- inuse |= m4__push_symbol (context, value, context->expansion_level - 1);
+ if (value->type == M4_SYMBOL_TEXT)
+ inuse |= m4__push_symbol (context, value,
+ context->expansion_level - 1);
+ else
+ {
+ /* TODO handle composite text. */
+ assert (value->type == M4_SYMBOL_COMP);
+ chain = value->u.u_c.chain;
+ while (chain)
+ {
+ assert (chain->str);
+ obstack_grow (obs, chain->str, chain->len);
+ chain = chain->next;
+ }
+ }
}
if (quote)
obstack_grow (obs, rquote, strlen (rquote));
diff --git a/m4/output.c b/m4/output.c
index f745efe..dc2194f 100644
--- a/m4/output.c
+++ b/m4/output.c
@@ -602,7 +602,8 @@ m4_shipout_string (m4 *context, m4_obstack *obs, const char
*s, size_t len,
current quote characters around S. If LEN is SIZE_MAX, use the
string length of S instead. If MAX_LEN, reduce *MAX_LEN by LEN.
If LEN is larger than *MAX_LEN, then truncate output and return
- true; otherwise return false. */
+ true; otherwise return false. CONTEXT may be NULL if QUOTED is
+ false. */
bool
m4_shipout_string_trunc (m4 *context, m4_obstack *obs, const char *s,
size_t len, bool quoted, size_t *max_len)
diff --git a/m4/symtab.c b/m4/symtab.c
index 30a61ed..3ff6f0d 100644
--- a/m4/symtab.c
+++ b/m4/symtab.c
@@ -1,6 +1,6 @@
/* GNU m4 -- A simple macro processor
- Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2001, 2005, 2006, 2007
- Free Software Foundation, Inc.
+ Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2001, 2005, 2006,
+ 2007, 2008 Free Software Foundation, Inc.
This file is part of GNU M4.
@@ -326,10 +326,21 @@ m4_symbol_value_delete (m4_symbol_value *value)
m4_hash_apply (VALUE_ARG_SIGNATURE (value), arg_destroy_CB, NULL);
m4_hash_delete (VALUE_ARG_SIGNATURE (value));
}
- if (m4_is_symbol_value_text (value))
- free ((char *) m4_get_symbol_value_text (value));
- else if (m4_is_symbol_value_placeholder (value))
- free ((char *) m4_get_symbol_value_placeholder (value));
+ switch (value->type)
+ {
+ case M4_SYMBOL_TEXT:
+ free ((char *) m4_get_symbol_value_text (value));
+ break;
+ case M4_SYMBOL_PLACEHOLDER:
+ free ((char *) m4_get_symbol_value_placeholder (value));
+ break;
+ case M4_SYMBOL_VOID:
+ case M4_SYMBOL_FUNC:
+ break;
+ default:
+ assert (!"m4_symbol_value_delete");
+ abort ();
+ }
free (value);
}
}
@@ -392,10 +403,21 @@ m4_symbol_value_copy (m4_symbol_value *dest,
m4_symbol_value *src)
assert (dest);
assert (src);
- if (m4_is_symbol_value_text (dest))
- free ((char *) m4_get_symbol_value_text (dest));
- else if (m4_is_symbol_value_placeholder (dest))
- free ((char *) m4_get_symbol_value_placeholder (dest));
+ switch (dest->type)
+ {
+ case M4_SYMBOL_TEXT:
+ free ((char *) m4_get_symbol_value_text (dest));
+ break;
+ case M4_SYMBOL_PLACEHOLDER:
+ free ((char *) m4_get_symbol_value_placeholder (dest));
+ break;
+ case M4_SYMBOL_VOID:
+ case M4_SYMBOL_FUNC:
+ break;
+ default:
+ assert (!"m4_symbol_value_delete");
+ abort ();
+ }
if (VALUE_ARG_SIGNATURE (dest))
{
@@ -411,19 +433,54 @@ m4_symbol_value_copy (m4_symbol_value *dest,
m4_symbol_value *src)
/* Caller is supposed to free text token strings, so we have to
copy the string not just its address in that case. */
- if (m4_is_symbol_value_text (src))
+ switch (src->type)
{
- size_t len = m4_get_symbol_value_len (src);
- unsigned int age = m4_get_symbol_value_quote_age (src);
- m4_set_symbol_value_text (dest,
- xmemdup (m4_get_symbol_value_text (src),
- len + 1), len, age);
+ case M4_SYMBOL_TEXT:
+ {
+ size_t len = m4_get_symbol_value_len (src);
+ unsigned int age = m4_get_symbol_value_quote_age (src);
+ m4_set_symbol_value_text (dest,
+ xmemdup (m4_get_symbol_value_text (src),
+ len + 1), len, age);
+ }
+ break;
+ case M4_SYMBOL_FUNC:
+ /* Nothing further to do. */
+ break;
+ case M4_SYMBOL_PLACEHOLDER:
+ m4_set_symbol_value_placeholder (dest,
+ xstrdup (m4_get_symbol_value_placeholder
+ (src)));
+ break;
+ case M4_SYMBOL_COMP:
+ {
+ m4_symbol_chain *chain = src->u.u_c.chain;
+ size_t len = 0;
+ char *str;
+ char *p;
+ while (chain)
+ {
+ /* TODO for now, only text links are supported. */
+ assert (chain->str);
+ len += chain->len;
+ chain = chain->next;
+ }
+ p = str = xcharalloc (len + 1);
+ chain = src->u.u_c.chain;
+ while (chain)
+ {
+ memcpy (p, chain->str, chain->len);
+ p += chain->len;
+ chain = chain->next;
+ }
+ *p = '\0';
+ m4_set_symbol_value_text (dest, str, len, 0);
+ }
+ break;
+ default:
+ assert (!"m4_symbol_value_copy");
+ abort ();
}
- else if (m4_is_symbol_value_placeholder (src))
- m4_set_symbol_value_placeholder (dest,
- xstrdup (m4_get_symbol_value_placeholder
- (src)));
-
if (VALUE_ARG_SIGNATURE (src))
VALUE_ARG_SIGNATURE (dest) = m4_hash_dup (VALUE_ARG_SIGNATURE (src),
arg_copy_CB);
@@ -488,8 +545,9 @@ m4_symbol_value_print (m4_symbol_value *value, m4_obstack
*obs, bool quote,
size_t len;
bool truncated = false;
- if (m4_is_symbol_value_text (value))
+ switch (value->type)
{
+ case M4_SYMBOL_TEXT:
text = m4_get_symbol_value_text (value);
len = m4_get_symbol_value_len (value);
if (maxlen < len)
@@ -497,27 +555,45 @@ m4_symbol_value_print (m4_symbol_value *value, m4_obstack
*obs, bool quote,
len = maxlen;
truncated = true;
}
- }
- else if (m4_is_symbol_value_func (value))
- {
- const m4_builtin *bp = m4_get_symbol_value_builtin (value);
- text = bp->name;
- len = strlen (text);
- lquote = "<";
- rquote = ">";
- quote = true;
- }
- else if (m4_is_symbol_value_placeholder (value))
- {
+ break;
+ case M4_SYMBOL_FUNC:
+ {
+ const m4_builtin *bp = m4_get_symbol_value_builtin (value);
+ text = bp->name;
+ len = strlen (text);
+ lquote = "<";
+ rquote = ">";
+ quote = true;
+ }
+ break;
+ case M4_SYMBOL_PLACEHOLDER:
text = m4_get_symbol_value_placeholder (value);
/* FIXME - is it worth translating "placeholder for "? */
len = strlen (text);
lquote = "<placeholder for ";
rquote = ">";
quote = true;
- }
- else
- {
+ break;
+ case M4_SYMBOL_COMP:
+ {
+ m4_symbol_chain *chain = value->u.u_c.chain;
+ if (quote)
+ obstack_grow (obs, lquote, strlen (lquote));
+ while (chain)
+ {
+ /* TODO for now, assume all links are text. */
+ assert (chain->str);
+ if (m4_shipout_string_trunc (NULL, obs, chain->str, chain->len,
+ false, &maxlen))
+ break;
+ chain = chain->next;
+ }
+ if (quote)
+ obstack_grow (obs, rquote, strlen (rquote));
+ assert (!module);
+ return;
+ }
+ default:
assert (!"invalid token in symbol_value_print");
abort ();
}
diff --git a/tests/macros.at b/tests/macros.at
index 367d47e..3d74356 100644
--- a/tests/macros.at
+++ b/tests/macros.at
@@ -1,5 +1,5 @@
# Hand crafted tests for GNU M4. -*- Autotest -*-
-# Copyright (C) 2001, 2006, 2007 Free Software Foundation, Inc.
+# Copyright (C) 2001, 2006, 2007, 2008 Free Software Foundation, Inc.
# This file is part of GNU M4.
#
@@ -535,6 +535,24 @@ AT_CHECK_M4([in], [0], [[40
]])
AT_DATA([in], [[define(`echo', `$@')dnl
+define(`foo', echo(`01234567890123456789')echo(`98765432109876543210'))dnl
+foo
+]])
+
+AT_CHECK_M4([in], [0], [[0123456789012345678998765432109876543210
+]])
+
+AT_DATA([in], [[define(`a', `A')define(`echo', `$@')define(`join', `$1$2')dnl
+define(`abcdefghijklmnopqrstuvwxyz', `Z')dnl
+join(`a', `bcdefghijklmnopqrstuvwxyz')
+join(`a', echo(`bcdefghijklmnopqrstuvwxyz'))
+]])
+
+AT_CHECK_M4([in], [0], [[Z
+Z
+]])
+
+AT_DATA([in], [[define(`echo', `$@')dnl
echo(echo(`01234567890123456789', `01234567890123456789')
echo(`98765432109876543210', `98765432109876543210'))
len((echo(`01234567890123456789',
--
1.5.3.8
>From c2c0a7ddc9f559d66a17184ea8be2c363dd4807c Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sat, 27 Oct 2007 05:44:09 -0600
Subject: [PATCH] Stage 11: full circle for single argument references.
Pass quoted strings through to argument collection in a single
action, so that an argument can be reused throughout macro
recursion if it remains unchanged.
Memory impact: noticeable improvement, due to more reuse in
argument collection stacks.
Speed impact: noticeable improvement, due to less copying.
* src/m4.h (struct token_chain): Add quote_age member.
(struct token_data): Add end member to chain alternate.
(make_text_link): New prototype.
* src/input.c (CHAR_QUOTE): New macro.
(word_start): Pre-allocate.
(set_word_regexp): Simplify.
(make_text_link): Export, and handle new fields.
(next_char, next_char_1): Add parameter.
(append_quote_token): New function.
(match_input, next_token): Adjust callers to handle quoted input
blocks.
* src/macro.c (struct macro_arguments): Add wrapper member.
(expand_argument): Accept composite blocks from input engine.
(expand_macro): Reduce refcounts of composite arguments.
(collect_arguments, arg_token, arg_mark, make_argv_ref): Update to
use new fields.
(arg_type, arg_text, arg_equal, arg_len): Treat composite
arguments as text.
(push_arg, push_args): Handle composites.
(cherry picked from commit b1fef201f5d121e25e5dd61ec8ca3eac41a899ba)
Signed-off-by: Eric Blake <address@hidden>
---
ChangeLog | 29 ++++++++
src/input.c | 207 +++++++++++++++++++++++++++++++++--------------------
src/m4.h | 25 ++++---
src/macro.c | 233 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
4 files changed, 376 insertions(+), 118 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 5ad26e3..15549a6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,32 @@
+2008-01-22 Eric Blake <address@hidden>
+
+ Stage 11: full circle for single argument references.
+ Pass quoted strings through to argument collection in a single
+ action, so that an argument can be reused throughout macro
+ recursion if it remains unchanged.
+ Memory impact: noticeable improvement, due to more reuse in
+ argument collection stacks.
+ Speed impact: noticeable improvement, due to less copying.
+ * src/m4.h (struct token_chain): Add quote_age member.
+ (struct token_data): Add end member to chain alternate.
+ (make_text_link): New prototype.
+ * src/input.c (CHAR_QUOTE): New macro.
+ (word_start): Pre-allocate.
+ (set_word_regexp): Simplify.
+ (make_text_link): Export, and handle new fields.
+ (next_char, next_char_1): Add parameter.
+ (append_quote_token): New function.
+ (match_input, next_token): Adjust callers to handle quoted input
+ blocks.
+ * src/macro.c (struct macro_arguments): Add wrapper member.
+ (expand_argument): Accept composite blocks from input engine.
+ (expand_macro): Reduce refcounts of composite arguments.
+ (collect_arguments, arg_token, arg_mark, make_argv_ref): Update to
+ use new fields.
+ (arg_type, arg_text, arg_equal, arg_len): Treat composite
+ arguments as text.
+ (push_arg, push_args): Handle composites.
+
2008-01-17 Eric Blake <address@hidden>
Stage 10: avoid extra copying of strings and comments.
diff --git a/src/input.c b/src/input.c
index bc73c6f..9f25e8f 100644
--- a/src/input.c
+++ b/src/input.c
@@ -153,6 +153,7 @@ static bool input_change;
#define CHAR_EOF 256 /* Character return on EOF. */
#define CHAR_MACRO 257 /* Character return for MACRO token. */
+#define CHAR_QUOTE 258 /* Character return for quoted string. */
/* Quote chars. */
STRING rquote;
@@ -167,7 +168,7 @@ STRING ecomm;
# define DEFAULT_WORD_REGEXP "[_a-zA-Z][_a-zA-Z0-9]*"
/* Table of characters that can start a word. */
-static char *word_start;
+static char word_start[256];
/* Current regular expression for detecting words. */
static struct re_pattern_buffer word_regexp;
@@ -201,7 +202,7 @@ static const char *token_type_string (token_type);
| chain that starts at *START and ends at *END. START may be NULL |
| if *END is non-NULL. |
`-------------------------------------------------------------------*/
-static void
+void
make_text_link (struct obstack *obs, token_chain **start, token_chain **end)
{
token_chain *chain;
@@ -218,6 +219,7 @@ make_text_link (struct obstack *obs, token_chain **start,
token_chain **end)
*start = chain;
*end = chain;
chain->next = NULL;
+ chain->quote_age = 0;
chain->str = str;
chain->len = len;
chain->level = -1;
@@ -361,6 +363,7 @@ push_token (token_data *token, int level)
next->u.u_c.chain = chain;
next->u.u_c.end = chain;
chain->next = NULL;
+ chain->quote_age = TOKEN_DATA_QUOTE_AGE (token);
chain->str = TOKEN_DATA_TEXT (token);
chain->len = TOKEN_DATA_LEN (token);
chain->level = level;
@@ -563,19 +566,6 @@ pop_wrapup (void)
return true;
}
-/*-------------------------------------------------------------------.
-| When a MACRO token is seen, next_token () uses init_macro_token () |
-| to retrieve the value of the function pointer and store it in TD. |
-`-------------------------------------------------------------------*/
-
-static void
-init_macro_token (token_data *td)
-{
- assert (isp->type == INPUT_MACRO);
- TOKEN_DATA_TYPE (td) = TOKEN_FUNC;
- TOKEN_DATA_FUNC (td) = isp->u.func;
-}
-
/*--------------------------------------------------------------.
| Dump a representation of INPUT to the obstack OBS, for use in |
| tracing. |
@@ -699,16 +689,19 @@ peek_input (void)
| consisting of a newline alone is taken as belonging to the line it |
| ends, and the current line number is not incremented until the |
| next character is read. 99.9% of all calls will read from a |
-| string, so factor that out into a macro for speed. |
+| string, so factor that out into a macro for speed. If |
+| ALLOW_QUOTE, and the current input matches the current quote age, |
+| return CHAR_QUOTE and leave consumption of data for |
+| append_quote_token. |
`-------------------------------------------------------------------*/
-#define next_char() \
+#define next_char(AQ) \
(isp && isp->type == INPUT_STRING && isp->u.u_s.len && !input_change \
? (isp->u.u_s.len--, to_uchar (*isp->u.u_s.str++)) \
- : next_char_1 ())
+ : next_char_1 (AQ))
static int
-next_char_1 (void)
+next_char_1 (bool allow_quote)
{
int ch;
token_chain *chain;
@@ -765,10 +758,14 @@ next_char_1 (void)
chain = isp->u.u_c.chain;
while (chain)
{
+ if (allow_quote && chain->quote_age == current_quote_age)
+ return CHAR_QUOTE;
if (chain->str)
{
if (chain->len)
{
+ /* Partial consumption invalidates quote age. */
+ chain->quote_age = 0;
chain->len--;
return to_uchar (*chain->str++);
}
@@ -808,7 +805,7 @@ skip_line (const char *name)
const char *file = current_file;
int line = current_line;
- while ((ch = next_char ()) != CHAR_EOF && ch != '\n')
+ while ((ch = next_char (false)) != CHAR_EOF && ch != '\n')
;
if (ch == CHAR_EOF)
/* current_file changed to "" if we see CHAR_EOF, use the
@@ -825,6 +822,49 @@ skip_line (const char *name)
}
+/*-------------------------------------------------------------------.
+| When a MACRO token is seen, next_token () uses init_macro_token () |
+| to retrieve the value of the function pointer and store it in TD. |
+`-------------------------------------------------------------------*/
+
+static void
+init_macro_token (token_data *td)
+{
+ assert (isp->type == INPUT_MACRO);
+ TOKEN_DATA_TYPE (td) = TOKEN_FUNC;
+ TOKEN_DATA_FUNC (td) = isp->u.func;
+}
+
+/*-------------------------------------------------------------------.
+| When a QUOTE token is seen, convert TD to a composite (if it is |
+| not one already), consisting of any unfinished text on OBS, as |
+| well as the quoted token from the top of the input stack. Use OBS |
+| for any additional allocations needed to store the token chain. |
+`-------------------------------------------------------------------*/
+static void
+append_quote_token (struct obstack *obs, token_data *td)
+{
+ token_chain *src_chain = isp->u.u_c.chain;
+ token_chain *chain;
+ assert (isp->type == INPUT_CHAIN && obs && current_quote_age);
+
+ if (TOKEN_DATA_TYPE (td) == TOKEN_VOID)
+ {
+ TOKEN_DATA_TYPE (td) = TOKEN_COMP;
+ td->u.u_c.chain = td->u.u_c.end = NULL;
+ }
+ assert (TOKEN_DATA_TYPE (td) == TOKEN_COMP);
+ make_text_link (obs, &td->u.u_c.chain, &td->u.u_c.end);
+ chain = (token_chain *) obstack_copy (obs, src_chain, sizeof *chain);
+ if (td->u.u_c.end)
+ td->u.u_c.end->next = chain;
+ else
+ td->u.u_c.chain = chain;
+ td->u.u_c.end = chain;
+ td->u.u_c.end->next = NULL;
+ isp->u.u_c.chain = src_chain->next;
+}
+
/*------------------------------------------------------------------.
| This function is for matching a string against a prefix of the |
| input stream. If the string S matches the input and CONSUME is |
@@ -848,14 +888,14 @@ match_input (const char *s, bool consume)
if (s[1] == '\0')
{
if (consume)
- (void) next_char ();
+ next_char (false);
return true; /* short match */
}
- (void) next_char ();
+ next_char (false);
for (n = 1, t = s++; (ch = peek_input ()) == to_uchar (*s++); )
{
- (void) next_char ();
+ next_char (false);
n++;
if (*s == '\0') /* long match */
{
@@ -1016,7 +1056,6 @@ void
set_word_regexp (const char *caller, const char *regexp)
{
int i;
- char test[2];
const char *msg;
struct re_pattern_buffer new_word_regexp;
@@ -1048,15 +1087,10 @@ set_word_regexp (const char *caller, const char *regexp)
default_word_regexp = false;
set_quote_age ();
- if (word_start == NULL)
- word_start = (char *) xmalloc (256);
-
- word_start[0] = '\0';
- test[1] = '\0';
for (i = 1; i < 256; i++)
{
- test[0] = i;
- word_start[i] = re_search (&word_regexp, test, 1, 0, 0, NULL) >= 0;
+ char test = i;
+ word_start[i] = re_match (&word_regexp, &test, 1, 0, NULL) > 0;
}
}
@@ -1140,16 +1174,17 @@ safe_quotes (void)
/*--------------------------------------------------------------------.
-| Parse and return a single token from the input stream. A token |
-| can either be TOKEN_EOF, if the input_stack is empty; it can be |
-| TOKEN_STRING for a quoted string or comment; TOKEN_WORD for |
-| something that is a potential macro name; and TOKEN_SIMPLE for any |
-| single character that is not a part of any of the previous types. |
-| If LINE is not NULL, set *LINE to the line where the token starts. |
-| If OBS is not NULL, expand TOKEN_STRING directly into OBS rather |
-| than in token_stack temporary storage area. Report errors |
-| (unterminated comments or strings) on behalf of CALLER, if |
-| non-NULL. |
+| Parse a single token from the input stream, set TD to its |
+| contents, and return its type. A token is TOKEN_EOF if the |
+| input_stack is empty; TOKEN_STRING for a quoted string or comment; |
+| TOKEN_WORD for something that is a potential macro name; and |
+| TOKEN_SIMPLE for any single character that is not a part of any of |
+| the previous types. If LINE is not NULL, set *LINE to the line |
+| where the token starts. If OBS is not NULL, expand TOKEN_STRING |
+| directly into OBS rather than in token_stack temporary storage |
+| area, and TD could be a TOKEN_COMP instead of the usual |
+| TOKEN_TEXT. Report errors (unterminated comments or strings) on |
+| behalf of CALLER, if non-NULL. |
| |
| Next_token () returns the token type, and passes back a pointer to |
| the token data through TD. Non-string token text is collected on |
@@ -1165,7 +1200,6 @@ next_token (token_data *td, int *line, struct obstack
*obs, const char *caller)
int quote_level;
token_type type;
#ifdef ENABLE_CHANGEWORD
- int startpos;
char *orig_text = NULL;
#endif /* ENABLE_CHANGEWORD */
const char *file;
@@ -1181,19 +1215,20 @@ next_token (token_data *td, int *line, struct obstack
*obs, const char *caller)
line = &dummy;
/* Can't consume character until after CHAR_MACRO is handled. */
+ TOKEN_DATA_TYPE (td) = TOKEN_VOID;
ch = peek_input ();
if (ch == CHAR_EOF)
{
#ifdef DEBUG_INPUT
xfprintf (stderr, "next_token -> EOF\n");
#endif /* DEBUG_INPUT */
- next_char ();
+ next_char (false);
return TOKEN_EOF;
}
if (ch == CHAR_MACRO)
{
init_macro_token (td);
- next_char ();
+ next_char (false);
#ifdef DEBUG_INPUT
xfprintf (stderr, "next_token -> MACDEF (%s)\n",
find_builtin_by_addr (TOKEN_DATA_FUNC (td))->name);
@@ -1201,7 +1236,7 @@ next_token (token_data *td, int *line, struct obstack
*obs, const char *caller)
return TOKEN_MACDEF;
}
- next_char (); /* Consume character we already peeked at. */
+ next_char (false); /* Consume character we already peeked at. */
file = current_file;
*line = current_line;
if (MATCH (ch, bcomm.string, true))
@@ -1209,11 +1244,14 @@ next_token (token_data *td, int *line, struct obstack
*obs, const char *caller)
if (obs)
obs_td = obs;
obstack_grow (obs_td, bcomm.string, bcomm.length);
- while ((ch = next_char ()) != CHAR_EOF
+ while ((ch = next_char (false)) < CHAR_EOF
&& !MATCH (ch, ecomm.string, true))
obstack_1grow (obs_td, ch);
if (ch != CHAR_EOF)
- obstack_grow (obs_td, ecomm.string, ecomm.length);
+ {
+ assert (ch < CHAR_EOF);
+ obstack_grow (obs_td, ecomm.string, ecomm.length);
+ }
else
/* Current_file changed to "" if we see CHAR_EOF, use the
previous value we stored earlier. */
@@ -1225,10 +1263,10 @@ next_token (token_data *td, int *line, struct obstack
*obs, const char *caller)
else if (default_word_regexp && (isalpha (ch) || ch == '_'))
{
obstack_1grow (&token_stack, ch);
- while ((ch = peek_input ()) != CHAR_EOF && (isalnum (ch) || ch == '_'))
+ while ((ch = peek_input ()) < CHAR_EOF && (isalnum (ch) || ch == '_'))
{
obstack_1grow (&token_stack, ch);
- (void) next_char ();
+ next_char (false);
}
type = TOKEN_WORD;
}
@@ -1241,20 +1279,17 @@ next_token (token_data *td, int *line, struct obstack
*obs, const char *caller)
while (1)
{
ch = peek_input ();
- if (ch == CHAR_EOF)
+ if (ch >= CHAR_EOF)
break;
obstack_1grow (&token_stack, ch);
- startpos = re_search (&word_regexp,
- (char *) obstack_base (&token_stack),
- obstack_object_size (&token_stack), 0, 0,
- ®s);
- if (startpos != 0 ||
- regs.end [0] != obstack_object_size (&token_stack))
+ if (re_match (&word_regexp, (char *) obstack_base (&token_stack),
+ obstack_object_size (&token_stack), 0, ®s)
+ != obstack_object_size (&token_stack))
{
obstack_blank (&token_stack, -1);
break;
}
- next_char ();
+ next_char (false);
}
obstack_1grow (&token_stack, '\0');
@@ -1297,14 +1332,16 @@ next_token (token_data *td, int *line, struct obstack
*obs, const char *caller)
quote_level = 1;
while (1)
{
- ch = next_char ();
+ ch = next_char (obs != NULL && current_quote_age);
if (ch == CHAR_EOF)
/* Current_file changed to "" if we see CHAR_EOF, use
the previous value we stored earlier. */
m4_error_at_line (EXIT_FAILURE, 0, file, *line, caller,
_("end of file in string"));
- if (MATCH (ch, rquote.string, true))
+ if (ch == CHAR_QUOTE)
+ append_quote_token (obs, td);
+ else if (MATCH (ch, rquote.string, true))
{
if (--quote_level == 0)
break;
@@ -1316,35 +1353,49 @@ next_token (token_data *td, int *line, struct obstack
*obs, const char *caller)
obstack_grow (obs_td, lquote.string, lquote.length);
}
else
- obstack_1grow (obs_td, ch);
+ {
+ assert (ch < CHAR_EOF);
+ obstack_1grow (obs_td, ch);
+ }
}
type = TOKEN_STRING;
}
- TOKEN_DATA_TYPE (td) = TOKEN_TEXT;
- TOKEN_DATA_LEN (td) = obstack_object_size (obs_td);
- if (obs_td != obs)
+ if (TOKEN_DATA_TYPE (td) == TOKEN_VOID)
{
- obstack_1grow (obs_td, '\0');
- TOKEN_DATA_TEXT (td) = (char *) obstack_finish (obs_td);
- }
- else
- TOKEN_DATA_TEXT (td) = NULL;
- TOKEN_DATA_QUOTE_AGE (td) = current_quote_age;
+ TOKEN_DATA_TYPE (td) = TOKEN_TEXT;
+ TOKEN_DATA_LEN (td) = obstack_object_size (obs_td);
+ if (obs_td != obs)
+ {
+ obstack_1grow (obs_td, '\0');
+ TOKEN_DATA_TEXT (td) = (char *) obstack_finish (obs_td);
+ }
+ else
+ TOKEN_DATA_TEXT (td) = NULL;
+ TOKEN_DATA_QUOTE_AGE (td) = current_quote_age;
#ifdef ENABLE_CHANGEWORD
- if (orig_text == NULL)
- TOKEN_DATA_ORIG_TEXT (td) = TOKEN_DATA_TEXT (td);
+ if (orig_text == NULL)
+ TOKEN_DATA_ORIG_TEXT (td) = TOKEN_DATA_TEXT (td);
+ else
+ {
+ TOKEN_DATA_ORIG_TEXT (td) = orig_text;
+ TOKEN_DATA_LEN (td) = strlen (orig_text);
+ }
+#endif /* ENABLE_CHANGEWORD */
+#ifdef DEBUG_INPUT
+ xfprintf (stderr, "next_token -> %s (%s), len %zu\n",
+ token_type_string (type), TOKEN_DATA_TEXT (td),
+ TOKEN_DATA_LEN (td));
+#endif /* DEBUG_INPUT */
+ }
else
{
- TOKEN_DATA_ORIG_TEXT (td) = orig_text;
- TOKEN_DATA_LEN (td) = strlen (orig_text);
- }
-#endif /* ENABLE_CHANGEWORD */
+ assert (TOKEN_DATA_TYPE (td) == TOKEN_COMP && type == TOKEN_STRING);
#ifdef DEBUG_INPUT
- xfprintf (stderr, "next_token -> %s (%s), len %zu\n",
- token_type_string (type), TOKEN_DATA_TEXT (td),
- TOKEN_DATA_LEN (td));
+ xfprintf (stderr, "next_token -> %s <chain>\n",
+ token_type_string (type));
#endif /* DEBUG_INPUT */
+ }
return type;
}
diff --git a/src/m4.h b/src/m4.h
index ea3947f..474338b 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -271,19 +271,20 @@ enum token_data_type
TOKEN_VOID, /* Token still being constructed, u is invalid. */
TOKEN_TEXT, /* Straight text, u.u_t is valid. */
TOKEN_FUNC, /* Builtin function definition, u.func is valid. */
- TOKEN_COMP /* Composite argument, u.chain is valid. */
+ TOKEN_COMP /* Composite argument, u.u_c is valid. */
};
/* Composite tokens are built of a linked list of chains. */
struct token_chain
{
- token_chain *next; /* Pointer to next link of chain. */
- const char *str; /* NUL-terminated string if text, else NULL. */
- size_t len; /* Length of str, else 0. */
- int level; /* Expansion level of link content, or -1. */
- macro_arguments *argv;/* Reference to earlier address@hidden */
- unsigned int index; /* Argument index within argv. */
- bool flatten; /* True to treat builtins as text. */
+ token_chain *next; /* Pointer to next link of chain. */
+ unsigned int quote_age; /* Quote_age of this link of chain, or 0. */
+ const char *str; /* NUL-terminated string if text, or NULL. */
+ size_t len; /* Length of str, else 0. */
+ int level; /* Expansion level of link content, or -1. */
+ macro_arguments *argv; /* Reference to earlier address@hidden */
+ unsigned int index; /* Argument index within argv. */
+ bool flatten; /* True to treat builtins as text. */
};
/* The content of a token or macro argument. */
@@ -319,7 +320,12 @@ struct token_data
/* Composite text: a linked list of straight text and $@
placeholders. */
- token_chain *chain;
+ struct
+ {
+ token_chain *chain; /* First link of the chain. */
+ token_chain *end; /* Last link of the chain. */
+ }
+ u_c;
}
u;
};
@@ -342,6 +348,7 @@ token_type next_token (token_data *, int *, struct obstack
*, const char *);
void skip_line (const char *);
/* push back input */
+void make_text_link (struct obstack *, token_chain **, token_chain **);
void push_file (FILE *, const char *, bool);
void push_macro (builtin_func *);
struct obstack *push_string_init (void);
diff --git a/src/macro.c b/src/macro.c
index ef18b8f..62af398 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -45,6 +45,9 @@ struct macro_arguments
bool_bitfield inuse : 1;
/* False if all arguments are just text or func, true if this argv
refers to another one. */
+ bool_bitfield wrapper : 1;
+ /* False if all arguments belong to this argv, true if some of them
+ include references to another. */
bool_bitfield has_ref : 1;
const char *argv0; /* The macro name being expanded. */
size_t argv0_len; /* Length of argv0. */
@@ -382,11 +385,16 @@ expand_argument (struct obstack *obs, token_data *argp,
const char *caller)
return t == TOKEN_COMMA;
warn_builtin_concat (caller, TOKEN_DATA_FUNC (argp));
}
- obstack_1grow (obs, '\0');
- TOKEN_DATA_TYPE (argp) = TOKEN_TEXT;
- TOKEN_DATA_TEXT (argp) = (char *) obstack_finish (obs);
- TOKEN_DATA_LEN (argp) = len;
- TOKEN_DATA_QUOTE_AGE (argp) = age;
+ if (TOKEN_DATA_TYPE (argp) != TOKEN_COMP)
+ {
+ obstack_1grow (obs, '\0');
+ TOKEN_DATA_TYPE (argp) = TOKEN_TEXT;
+ TOKEN_DATA_TEXT (argp) = (char *) obstack_finish (obs);
+ TOKEN_DATA_LEN (argp) = len;
+ TOKEN_DATA_QUOTE_AGE (argp) = age;
+ }
+ else
+ make_text_link (obs, NULL, &argp->u.u_c.end);
return t == TOKEN_COMMA;
}
/* fallthru */
@@ -411,6 +419,23 @@ expand_argument (struct obstack *obs, token_data *argp,
const char *caller)
case TOKEN_STRING:
if (!expand_token (obs, t, &td, line, first))
age = 0;
+ if (TOKEN_DATA_TYPE (&td) == TOKEN_COMP)
+ {
+ if (TOKEN_DATA_TYPE (argp) != TOKEN_COMP)
+ {
+ if (TOKEN_DATA_TYPE (argp) == TOKEN_FUNC)
+ warn_builtin_concat (caller, TOKEN_DATA_FUNC (argp));
+ TOKEN_DATA_TYPE (argp) = TOKEN_COMP;
+ argp->u.u_c.chain = td.u.u_c.chain;
+ argp->u.u_c.end = td.u.u_c.end;
+ }
+ else
+ {
+ assert (argp->u.u_c.end);
+ argp->u.u_c.end->next = td.u.u_c.chain;
+ argp->u.u_c.end = td.u.u_c.end;
+ }
+ }
break;
case TOKEN_MACDEF:
@@ -459,6 +484,7 @@ collect_arguments (symbol *sym, struct obstack *arguments,
args.argc = 1;
args.inuse = false;
+ args.wrapper = false;
args.has_ref = false;
args.argv0 = SYMBOL_NAME (sym);
args.argv0_len = strlen (args.argv0);
@@ -490,11 +516,14 @@ collect_arguments (symbol *sym, struct obstack *arguments,
&& TOKEN_DATA_LEN (tdp) > 0
&& TOKEN_DATA_QUOTE_AGE (tdp) != args.quote_age)
args.quote_age = 0;
+ else if (TOKEN_DATA_TYPE (tdp) == TOKEN_COMP)
+ args.has_ref = true;
}
while (more_args);
}
argv = (macro_arguments *) obstack_finish (argv_stack);
argv->argc = args.argc;
+ argv->has_ref = args.has_ref;
if (args.quote_age != quote_age ())
argv->quote_age = 0;
argv->arraylen = args.arraylen;
@@ -633,8 +662,23 @@ expand_macro (symbol *sym)
if (SYMBOL_DELETED (sym))
free_symbol (sym);
- /* If argv contains references, those refcounts can be reduced now. */
- /* TODO - support references in argv. */
+ /* If argv contains references, those refcounts must be reduced now. */
+ if (argv->has_ref)
+ {
+ token_chain *chain;
+ size_t i;
+ for (i = 0; i < argv->arraylen; i++)
+ if (TOKEN_DATA_TYPE (argv->array[i]) == TOKEN_COMP)
+ {
+ chain = argv->array[i]->u.u_c.chain;
+ while (chain)
+ {
+ if (chain->level >= 0)
+ adjust_refcount (chain->level, false);
+ chain = chain->next;
+ }
+ }
+ }
/* We no longer need argv, so reduce the refcount. Additionally, if
no other references to argv were created, we can free our portion
@@ -698,7 +742,7 @@ arg_token (macro_arguments *argv, unsigned int index)
token_data *token;
assert (index && index < argv->argc);
- if (!argv->has_ref)
+ if (!argv->wrapper)
return argv->array[index - 1];
/* Must cycle through all tokens, until we find index, since a ref
may occupy multiple indices. */
@@ -707,7 +751,7 @@ arg_token (macro_arguments *argv, unsigned int index)
token = argv->array[i];
if (TOKEN_DATA_TYPE (token) == TOKEN_COMP)
{
- token_chain *chain = token->u.chain;
+ token_chain *chain = token->u.u_c.chain;
/* TODO - for now we support only a single-length $@ chain. */
assert (!chain->next && !chain->str);
if (index < chain->argv->argc - (chain->index - 1))
@@ -731,14 +775,14 @@ static void
arg_mark (macro_arguments *argv)
{
argv->inuse = true;
- if (argv->has_ref)
+ if (argv->wrapper)
{
/* TODO for now we support only a single-length $@ chain. */
assert (argv->arraylen == 1
&& TOKEN_DATA_TYPE (argv->array[0]) == TOKEN_COMP
- && !argv->array[0]->u.chain->next
- && !argv->array[0]->u.chain->str);
- argv->array[0]->u.chain->argv->inuse = true;
+ && !argv->array[0]->u.u_c.chain->next
+ && !argv->array[0]->u.u_c.chain->str);
+ argv->array[0]->u.u_c.chain->argv->inuse = true;
}
}
@@ -761,17 +805,22 @@ arg_type (macro_arguments *argv, unsigned int index)
return TOKEN_TEXT;
token = arg_token (argv, index);
type = TOKEN_DATA_TYPE (token);
- assert (type != TOKEN_COMP);
+ /* Composite tokens are currently sequences of text only. */
+ if (type == TOKEN_COMP)
+ type = TOKEN_TEXT;
return type;
}
/* Given ARGV, return the text at argument INDEX. Abort if the
argument is not text. Index 0 is always text, and indices beyond
- argc return the empty string. */
+ argc return the empty string. The result is always NUL-terminated,
+ even if it includes embedded NUL characters. */
const char *
arg_text (macro_arguments *argv, unsigned int index)
{
token_data *token;
+ token_chain *chain;
+ struct obstack *obs;
if (index == 0)
return argv->argv0;
@@ -783,8 +832,18 @@ arg_text (macro_arguments *argv, unsigned int index)
case TOKEN_TEXT:
return TOKEN_DATA_TEXT (token);
case TOKEN_COMP:
- /* TODO - how to concatenate multiple arguments? For now, we expect
- only one element in the chain, and arg_token dereferences it. */
+ /* TODO - concatenate multiple arguments? For now, we assume
+ all elements are text. */
+ chain = token->u.u_c.chain;
+ obs = arg_scratch ();
+ while (chain)
+ {
+ assert (chain->str);
+ obstack_grow (obs, chain->str, chain->len);
+ chain = chain->next;
+ }
+ obstack_1grow (obs, '\0');
+ return (char *) obstack_finish (obs);
default:
break;
}
@@ -801,14 +860,84 @@ arg_equal (macro_arguments *argv, unsigned int indexa,
unsigned int indexb)
{
token_data *ta = arg_token (argv, indexa);
token_data *tb = arg_token (argv, indexb);
+ token_chain tmpa;
+ token_chain tmpb;
+ token_chain *ca = &tmpa;
+ token_chain *cb = &tmpb;
+ /* Quick tests. */
if (ta == &empty_token || tb == &empty_token)
return ta == tb;
+ if (TOKEN_DATA_TYPE (ta) == TOKEN_TEXT
+ && TOKEN_DATA_TYPE (tb) == TOKEN_TEXT)
+ return (TOKEN_DATA_LEN (ta) == TOKEN_DATA_LEN (tb)
+ && memcmp (TOKEN_DATA_TEXT (ta), TOKEN_DATA_TEXT (tb),
+ TOKEN_DATA_LEN (ta)) == 0);
+
+ /* Convert both arguments to chains, if not one already. */
/* TODO - allow builtin tokens in the comparison? */
- assert (TOKEN_DATA_TYPE (ta) == TOKEN_TEXT
- && TOKEN_DATA_TYPE (tb) == TOKEN_TEXT);
- return (TOKEN_DATA_LEN (ta) == TOKEN_DATA_LEN (tb)
- && strcmp (TOKEN_DATA_TEXT (ta), TOKEN_DATA_TEXT (tb)) == 0);
+ if (TOKEN_DATA_TYPE (ta) == TOKEN_TEXT)
+ {
+ tmpa.next = NULL;
+ tmpa.str = TOKEN_DATA_TEXT (ta);
+ tmpa.len = TOKEN_DATA_LEN (ta);
+ }
+ else
+ {
+ assert (TOKEN_DATA_TYPE (ta) == TOKEN_COMP);
+ ca = ta->u.u_c.chain;
+ }
+ if (TOKEN_DATA_TYPE (tb) == TOKEN_TEXT)
+ {
+ tmpb.next = NULL;
+ tmpb.str = TOKEN_DATA_TEXT (tb);
+ tmpb.len = TOKEN_DATA_LEN (tb);
+ }
+ else
+ {
+ assert (TOKEN_DATA_TYPE (tb) == TOKEN_COMP);
+ cb = tb->u.u_c.chain;
+ }
+
+ /* Compare each link of the chain. */
+ while (ca && cb)
+ {
+ /* TODO support comparison against $@ refs. */
+ assert (ca->str && cb->str);
+ if (ca->len == cb->len)
+ {
+ if (memcmp (ca->str, cb->str, ca->len) != 0)
+ return false;
+ ca = ca->next;
+ cb = cb->next;
+ }
+ else if (ca->len < cb->len)
+ {
+ if (memcmp (ca->str, cb->str, ca->len) != 0)
+ return false;
+ tmpb.next = cb->next;
+ tmpb.str = cb->str + ca->len;
+ tmpb.len = cb->len - ca->len;
+ ca = ca->next;
+ cb = &tmpb;
+ }
+ else
+ {
+ assert (ca->len > cb->len);
+ if (memcmp (ca->str, cb->str, cb->len) != 0)
+ return false;
+ tmpa.next = ca->next;
+ tmpa.str = ca->str + cb->len;
+ tmpa.len = ca->len - cb->len;
+ ca = &tmpa;
+ cb = cb->next;
+ }
+ }
+
+ /* If we get this far, the two tokens are equal only if both chains
+ are exhausted. */
+ assert (ca != cb || ca == NULL);
+ return ca == cb;
}
/* Given ARGV, return true if argument INDEX is the empty string.
@@ -830,6 +959,8 @@ size_t
arg_len (macro_arguments *argv, unsigned int index)
{
token_data *token;
+ token_chain *chain;
+ size_t len;
if (index == 0)
return argv->argv0_len;
@@ -842,8 +973,18 @@ arg_len (macro_arguments *argv, unsigned int index)
assert ((token == &empty_token) == (TOKEN_DATA_LEN (token) == 0));
return TOKEN_DATA_LEN (token);
case TOKEN_COMP:
- /* TODO - how to concatenate multiple arguments? For now, we expect
- only one element in the chain, and arg_token dereferences it. */
+ /* TODO - concatenate multiple arguments? For now, we assume
+ all elements are text. */
+ chain = token->u.u_c.chain;
+ len = 0;
+ while (chain)
+ {
+ assert (chain->str);
+ len += chain->len;
+ chain = chain->next;
+ }
+ assert (len);
+ return len;
default:
break;
}
@@ -892,12 +1033,12 @@ make_argv_ref (macro_arguments *argv, const char *argv0,
size_t argv0_len,
/* When making a reference through a reference, point to the
original if possible. */
- if (argv->has_ref)
+ if (argv->wrapper)
{
/* TODO - for now we support only a single-length $@ chain. */
assert (argv->arraylen == 1
&& TOKEN_DATA_TYPE (argv->array[0]) == TOKEN_COMP);
- chain = argv->array[0]->u.chain;
+ chain = argv->array[0]->u.u_c.chain;
assert (!chain->next && !chain->str);
argv = chain->argv;
index += chain->index - 1;
@@ -907,6 +1048,7 @@ make_argv_ref (macro_arguments *argv, const char *argv0,
size_t argv0_len,
new_argv = (macro_arguments *)
obstack_alloc (obs, offsetof (macro_arguments, array));
new_argv->arraylen = 0;
+ new_argv->wrapper = false;
new_argv->has_ref = false;
}
else
@@ -918,10 +1060,12 @@ make_argv_ref (macro_arguments *argv, const char *argv0,
size_t argv0_len,
chain = (token_chain *) obstack_alloc (obs, sizeof *chain);
new_argv->arraylen = 1;
new_argv->array[0] = token;
+ new_argv->wrapper = true;
new_argv->has_ref = true;
TOKEN_DATA_TYPE (token) = TOKEN_COMP;
- token->u.chain = chain;
+ token->u.u_c.chain = token->u.u_c.end = chain;
chain->next = NULL;
+ chain->quote_age = argv->quote_age;
chain->str = NULL;
chain->len = 0;
chain->level = expansion_level - 1;
@@ -955,9 +1099,23 @@ push_arg (struct obstack *obs, macro_arguments *argv,
unsigned int index)
return;
token = arg_token (argv, index);
/* TODO handle func tokens? */
- assert (TOKEN_DATA_TYPE (token) == TOKEN_TEXT);
- if (push_token (token, expansion_level - 1))
- arg_mark (argv);
+ if (TOKEN_DATA_TYPE (token) == TOKEN_TEXT)
+ {
+ if (push_token (token, expansion_level - 1))
+ arg_mark (argv);
+ }
+ else if (TOKEN_DATA_TYPE (token) == TOKEN_COMP)
+ {
+ /* TODO - concatenate multiple arguments? For now, we assume
+ all elements are text. */
+ token_chain *chain = token->u.u_c.chain;
+ while (chain)
+ {
+ assert (chain->str);
+ obstack_grow (obs, chain->str, chain->len);
+ chain = chain->next;
+ }
+ }
}
/* Push series of comma-separated arguments from ARGV, which should
@@ -968,6 +1126,7 @@ void
push_args (struct obstack *obs, macro_arguments *argv, bool skip, bool quote)
{
token_data *token;
+ token_chain *chain;
unsigned int i = skip ? 2 : 1;
const char *sep = ",";
size_t sep_len = 1;
@@ -1007,8 +1166,20 @@ push_args (struct obstack *obs, macro_arguments *argv,
bool skip, bool quote)
else
use_sep = true;
/* TODO handle func tokens? */
- assert (TOKEN_DATA_TYPE (token) == TOKEN_TEXT);
- inuse |= push_token (token, expansion_level - 1);
+ if (TOKEN_DATA_TYPE (token) == TOKEN_TEXT)
+ inuse |= push_token (token, expansion_level - 1);
+ else
+ {
+ /* TODO - handle composite text in push_token. */
+ assert (TOKEN_DATA_TYPE (token) == TOKEN_COMP);
+ chain = token->u.u_c.chain;
+ while (chain)
+ {
+ assert (chain->str);
+ obstack_grow (obs, chain->str, chain->len);
+ chain = chain->next;
+ }
+ }
}
if (quote)
obstack_grow (obs, rquote.string, rquote.length);
--
1.5.3.8
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [11/18] argv_ref speedup: support composite arguments,
Eric Blake <=