[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 26-gary-changeresyntax.patch
From: |
Eric Blake |
Subject: |
Re: 26-gary-changeresyntax.patch |
Date: |
Tue, 11 Jul 2006 06:51:04 -0600 |
User-agent: |
Thunderbird 1.5.0.4 (Windows/20060516) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Gary V. Vaughan on 7/10/2006 5:45 PM:
> Hi Eric!
>
> Thanks for the review. New version attached.
>
> Okay to commit to HEAD?
Still a couple of nits; but fix those and I think you should commit (we
can tackle fallout later, if needs be).
>>>> +{
>>>> + int resyntax = -1;
>>>> +
>>>> + if (spec)
>> Can we guarantee that all callers pass non-NULL spec, since this is
>> static? Otherwise, passing a NULL spec silently returns -1.
>
> Nice catch. No we can't, but as we're making a NULL spec equal to
> RE_SYNTAX_EMACS it's not a problem now :-)
Actually, M4ARG guarantees a non-NULL string - if the user didn't pass
that many arguments, it is equivalent to "", not NULL. We really don't
pass NULL around, even though some of your code was making that assumption.
>> Your original proposal also suggested listing valid syntax-specs; is that
>> still worth doing here?
>
> It looked messy, and I don't want to get into calculating the terminal
> width to wrap nicely. The precedent from our other diagnostics is to
> complain in this style I think...
ACK. The proposed patch is just fine here.
>> Ouch. We used to have:
>> regexp(aab,a*)
>> 0
>> regexp(aab,a*,)
>>
>> But now with the resyntax parameter, we have no way to tell whether we
>> wanted the third argument to behave as a literal replacement of the empty
>> string, vs. an index of the match point:
>> regexp(aab,a*,,EMACS)
>> `0' or `'?
>
> For now, I've fixed this by allowing a 3 argument regexp (where there is
> a valid resyntax as the last argument):
>
> regexp(gnu emacs,e+)
> => 4
> regexp(gnu emacs, e+, )
> =>
> regexp(gnu emacs,e+,emacs)
> => 4
> regexp(gnu emacs,e+,emacs,emacs)
> => emacs
I'm not sure... regexp is a GNU invention, and we are certainly free to
document it this way, but I can't help but think we might break some
existing script. My thought overnight was that perhaps we invent a new
escape code for displaying the index:
regexp(gnu emacs, e+)dnl 2 args gives index
=> 4
regexp(gnu emacs, e+, )dnl 3 args gives literal third string
=>
regexp(gnu emacs, e+, , emacs)dnl 4 args gives literal third string...
=>
regexp(gnu emacs, e+, \_, emacs)dnl ...except new \_ escape gives index
=> 4
Since we are already doing escape processing on the replacement, adding a
new escape code would preserve backwards compatibility when there are
fewer than four arguments, while allowing the user to choose between
behaviors when using the new argument.
> But I'd be amazed if people are using one of our resyntax specs as the
> 3rd argument of a regexp() macro call.
Don't underestimate the user's ability to surprise :)
>
> In due course, we can reduce this probability even more, because I want
> to add named argument support to GNU M4, thusly:
>
> regexp(gnu emacs, e+, resyntax=emacs)
> => 4
And what if we want the replacement to be resyntax=emacs - is there an
escape for treating the = as part of the replacement rather than as a
designator for named argument?
>> Unrelated to your patch, but is it time to kill (or at least raise the
>> default) of this arbitrary limit? autoconf purposefully uses 1024 instead
>> of 250. We already document that the limit may go away in future releases.
>
> I don't think we can kill it easily as some limit is required to detect
> infinite loops. Raising the default seems like a reasonable thing to
> do. Hardware has come a long way in the last 12 years.
Let's do that in a separate patch, then - at least 1024, because that is
what autom4te uses.
>
>> And why a linear search? The map is sorted, so you could do
>> a binary search.
>
> The table is tiny, so it didn't seem worth the effort to write the
> harder code. Moot point now though, as the table is no longer sorted.
ACK - O(n) vs. O(log n) isn't noticeable when the maximum n is controlled
by a fixed-length table.
>
>>>> address@hidden regexprops-generic.texi
>> And where is the patch to bootstrap that ensures that we pull this file in
>> from gnulib? Or the patch to Makefile.am that ensures we distribute it?
>
> There is no gnulib-tool option to import texi files, as far as I can
> tell, I had planned on manually updating prior to release just like
> config.guess etc. I've added regexprops-generic.texi to Makefile.am.
Actually, I'm committing a patch to gnulib to make fdl.texi part of a
module; I can also make a module for regexprops-generic.texi while I'm at it.
> @@ -552,138 +548,137 @@
> the expansion to this argument. */
>
> /**
> - * regexp(VICTIM, REGEXP, [REPLACEMENT])
> - * eregexp(VICTIM, REGEXP, [REPLACEMENT])
> + * regexp(VICTIM, REGEXP, RESYNTAX)
> + * regexp(VICTIM, REGEXP, [REPLACEMENT], [RESYNTAX])
> **/
> -
> -static void
> -m4_regexp_do (m4 *context, m4_obstack *obs, int argc,
> - m4_symbol_value **argv, int syntax)
> +M4BUILTIN_HANDLER (regexp)
> {
> - const char *caller; /* calling macro name */
> - const char *victim; /* first argument */
> - const char *regexp; /* regular expression */
> -
> + const char *me; /* name of this macro */
> + const char *replace; /* optional replacement string */
> m4_pattern_buffer *buf; /* compiled regular expression */
> int startpos; /* start position of match */
> int length; /* length of first argument */
> + int resyntax;
> +
> + me = M4ARG (0);
> + replace = M4ARG (3);
> + resyntax = m4_get_regexp_syntax_opt (context);
>
> - caller = M4ARG (0);
> - victim = M4ARG (1);
> - regexp = M4ARG (2);
> + if (argc == 4)
If you take my suggestion of adding a \_ escape to imply the index of the
match, then this should no longer look at M4ARG(3) for a resyntax name.
> + {
> + resyntax = m4_regexp_syntax_encode (replace);
> +
> + /* The first case is the most difficult, because the empty string
> + is a valid RESYNTAX, yet we want `regexp(aab, a*, )' to return
> + an empty string as per M4 1.4.x! */
>
> Index: m4--devo--0/m4/resyntax.c
> ===================================================================
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ m4--devo--0/m4/resyntax.c 2006-07-11 00:33:54.000000000 +0100
> +
> +/* Return the internal code representing the syntax SPEC, or -1 if
> + SPEC is invalid. The `m4_syntax_map' table is searched case
> + insensitively, after replacing any spaces or dashes in SPEC with
> + underscore characters. Possible matches for the "GNU_M4" element
> + then, are "gnu m4", "GNU-m4" or "Gnu_M4". */
> +int
> +m4_regexp_syntax_encode (const char *spec)
> +{
> + const m4_resyntax *resyntax;
> + char *canonical;
> + char *p;
> +
> + /* Unless specified otherwise, return the historical GNU M4 default. */
> + if (!spec)
I don't think we ever pass a non-NULL spec; should we add an assert here
instead?
> + return RE_SYNTAX_EMACS;
> +
> + canonical = strdup (spec);
> address@hidden Changeresyntax
> address@hidden Changing the regular expression syntax
> +
> address@hidden regular expression syntax, changing
> address@hidden GNU extensions
> address@hidden {Builtin (gnu)} changeresyntax (@w{opt @var{resyntax}})
> +By default, the @sc{gnu} extensions @code{patsubst}, @code{regexp} and
> +more recently @code{renamesyms} continue to use emacs style regular
> +expression syntax (@pxref{Regular expression syntax}).
> +
> +The @code{changeresyntax} macro expands to nothing, but changes the
> +default regular expression syntax used by M4 according to the value of
> address@hidden, equivalent to passing @var{resyntax} as the argument to
> address@hidden when invoking @code{m4}. @xref{Invoking m4},
> +for more details. If @var{resyntax} is empty, or omitted the default
s/empty, or omitted /empty or omitted, /
> +settings are reverted to emacs style.
> address@hidden deffn
> +
> address@hidden Eregexp and Regexp
> address@hidden Regexp
> @section Searching for regular expressions
>
> @cindex regular expressions
> @cindex GNU extensions
> address@hidden {Builtin (gnu)} eregexp (@var{string}, @var{regexp}, @w{opt
> @var{replacement})}
> address@hidden {Builtin (gnu)} regexp (@var{string}, @var{regexp}, @w{opt
> @var{replacement},} @w{opt @var{resyntax})}
> Searching for regular expressions is done with the builtin
> @code{regexp}, which searches for @var{regexp} in @var{string}. The
> syntax of regular expressions is similar to that of Perl, @sc{gnu} Awk
> @@ -3014,13 +3114,18 @@
> is specified and matches, then it expands into @var{replacement}. If
> @var{regexp} does not match anywhere in @var{string}, it expands to -1.
>
> -The builtin macro @code{eregexp} is recognized only when given arguments.
> +If @var{resyntax} is given, the particular flavor of regular
> +expression understood with respect to @var{regexp} can be changed from
> +the current default. @xref{Changeresyntax}, for details of the values
> +that can be given for this argument.
> +
> +The builtin macro @code{regexp} is recognized only when given arguments.
> @end deffn
>
> @example
> -eregexp(`GNUs not Unix', `\<[a-z]\w+')
> +regexp(`GNUs not Unix', `\<[a-z]\w+')
> @result{}5
> -eregexp(`GNUs not Unix', `\<Q\w*')
> +regexp(`GNUs not Unix', `\<Q\w*')
> @result{}-1
> @end example
>
> @@ -3030,27 +3135,21 @@
> @samp{\&} being the text the entire regular expression matched.
>
> @example
> -eregexp(`GNUs not Unix', `\w(\w+)$', `*** \& *** \1 ***')
> +regexp(`GNUs not Unix', `\w\(\w+\)$', `*** \& *** \1 ***')
> @result{}*** Unix *** nix ***
> @end example
>
> -Originally, regular expressions were much less powerful (basically only
> address@hidden was available), but to keep backward compatibility, new
> -operators were implemented with previously invalid sequences, such as
> address@hidden(}. The following macro is exactly equivalent to
> @code{eregexp},
> -but using the old, clumsy syntax.
> -
> address@hidden {Builtin (gnu)} regexp (@var{string}, @var{regexp}, @w{opt
> @var{replacement})}
> -Same as @code{eregexp}, but using the old and clumsy ``Basic Regular
> -Expression'' syntax, the same as in @sc{gnu} Emacs. @xref{Regexps, ,
> -Syntax of Regular Expressions, emacs, The @sc{gnu} Emacs Manual}.
> address@hidden deffn
> +If @var{resyntax} is given, @var{regexp} must be given according to
> +the syntax chosen, though the default regular expression syntax
> +remains unchanged for other invocations:
>
> @example
> -regexp(`GNUs not Unix', `\w\(\w+\)$', `*** \& *** \1 ***')
> +regexp(`GNUs not Unix', `\w(\w+)$', `*** \& *** \1 ***', `POSIX_EXTENDED')
> @result{}*** Unix *** nix ***
> @end example
If you stick with your approach, where the third argument is checked to
see if it is a resyntax name rather than a replacement, then you will need
to document that and give an example in this node.
>
> address@hidden R @var{length} @key{NL} @var{string} @key{NL}
> +Sets the default regexp syntax, where @var{string} encodes one of the
> +regular expression syntaxes supported by @sc{gnu} M4.
Note to myself - I need to be more consistent about using @sc{gnu}, and
distinguishing between the package name of M4 vs. the executable @code{m4}.
Thanks for a good idea, and for responding to my review!
- --
Life is short - so eat dessert first!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEs56384KuGfSFAYARAptPAJ9O/o9cuc5kHAe67N0NuU8OrzciBACeJsnj
BxFKWGyXF/vFUJEmH8B81sI=
=Fn0G
-----END PGP SIGNATURE-----