[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch-1_4 patsubst replacement bug
From: |
Eric Blake |
Subject: |
branch-1_4 patsubst replacement bug |
Date: |
Fri, 14 Jul 2006 14:38:43 -0600 |
User-agent: |
Thunderbird 1.5.0.4 (Windows/20060516) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
$ echo 'patsubst(abc,b,\)' | m4
a
Oops - we lost the c.
Also, it almost feels like an arbitrary limit that we can only handle 9
sub-expressions. Then again, even sed can only replace the first 9
sub-expressions, and the regex engine only allows 9 back-references within
the regexp, so it's probably not worth worrying about.
2006-07-14 Eric Blake <address@hidden>
* src/builtin.c (substitute): Warn on bad escape sequences.
Ignore trailing backslash.
* doc/m4.texinfo (Regexp): Add documentation for this.
* NEWS: Document this change.
- --
Life is short - so eat dessert first!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEuADT84KuGfSFAYARAt5OAJ9sfcKFR86BQP3qNixp44IUHhYNhwCfavW/
pQ/lo+5QrIfIO0vmIdlOMoo=
=xgoc
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.37
diff -u -p -r1.1.1.1.2.37 NEWS
--- NEWS 13 Jul 2006 22:09:54 -0000 1.1.1.1.2.37
+++ NEWS 14 Jul 2006 20:21:23 -0000
@@ -47,6 +47,9 @@ Version 1.4.5 - ?? 2006, by ??? (CVS ve
* The popdef and undefine macros now correctly accept multiple arguments.
* Although changeword is on its last leg, if enabled, it now reverts to the
default (faster) regexp when passed the empty string.
+* The regexp and substr macros now warn and ignore a trailing backslash in
+ the replacement, and warn on \n for n larger than the number of
+ sub-expressions in the regexp.
Version 1.4.4b - 17 June 2006, by Eric Blake (CVS version 1.4.4a)
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.42
diff -u -p -r1.1.1.1.2.42 m4.texinfo
--- doc/m4.texinfo 14 Jul 2006 15:15:58 -0000 1.1.1.1.2.42
+++ doc/m4.texinfo 14 Jul 2006 20:21:24 -0000
@@ -2206,7 +2206,7 @@ foo
@error{}m4trace:8: -1- foo
@result{}FOO
@end example
-
+
@node Debug Output
@section Saving debugging output
@@ -3115,25 +3115,40 @@ If @var{replacement} is omitted, @code{r
the first match of @var{regexp} in @var{string}. If @var{regexp} does
not match anywhere in @var{string}, it expands to -1.
+If @var{replacement} is supplied, and there was a match, @code{regexp}
+changes the expansion to this argument, with @address@hidden substituted
+by the text matched by the @var{n}th parenthesized sub-expression of
address@hidden, up to nine sub-expressions. The escape @samp{\&} is
+replaced by the text of the entire regular expression matched. For
+all other characters, @samp{\} treats the next character literally. A
+warning is issued if there were fewer sub-expressions than the
address@hidden@var{n}} requested, or if there is a trailing @samp{\}. If there
+was no match, @code{regexp} expands to the empty string.
+
+The builtin macro @code{regexp} is recognized only when given arguments.
+
@example
regexp(`GNUs not Unix', `\<[a-z]\w+')
@result{}5
regexp(`GNUs not Unix', `\<Q\w*')
@result{}-1
+regexp(`GNUs not Unix', `\w\(\w+\)$', `*** \& *** \1 ***')
address@hidden Unix *** nix ***
+regexp(`GNUs not Unix', `\<Q\w*', `*** \& *** \1 ***')
address@hidden
@end example
-If @var{replacement} is supplied, @code{regexp} changes the expansion
-to this argument, with @address@hidden substituted by the text
-matched by the @var{n}th parenthesized sub-expression of @var{regexp},
address@hidden&} being the text the entire regular expression matched.
+Here are some more examples on the handling of backslash:
@example
-regexp(`GNUs not Unix', `\w\(\w+\)$', `*** \& *** \1 ***')
address@hidden Unix *** nix ***
+regexp(`abc', `\(b\)', `\\\10\a')
address@hidden
+regexp(`abc', `b', `\1\')
address@hidden:2: m4: Warning: sub-expression 1 not present
address@hidden:2: m4: Warning: trailing \ ignored in replacement
address@hidden
@end example
-The builtin macro @code{regexp} is recognized only when given arguments.
-
@node Substr
@section Extracting substrings
@@ -3241,12 +3256,19 @@ to avoid infinite loops.
When a replacement is to be made, @var{replacement} is inserted into
the expansion, with @address@hidden substituted by the text matched by
-the @var{n}th parenthesized sub-expression of @var{regexp}, @samp{\&}
-being the text the entire regular expression matched.
+the @var{n}th parenthesized sub-expression of @var{patsubst}, for up to
+nine sub-expressions. The escape @samp{\&} is replaced by the text of
+the entire regular expression matched. For all other characters,
address@hidden treats the next character literally. A warning is issued if
+there were fewer sub-expressions than the @address@hidden requested, or
+if there is a trailing @samp{\}.
The @var{replacement} argument can be omitted, in which case the text
matched by @var{regexp} is deleted.
+The builtin macro @code{patsubst} is recognized only when given
+arguments.
+
@example
patsubst(`GNUs not Unix', `^', `OBS: ')
@result{}OBS: GNUs not Unix
@@ -3258,6 +3280,9 @@ patsubst(`GNUs not Unix', `\w+', `(\&)')
@result{}(GNUs) (not) (Unix)
patsubst(`GNUs not Unix', `[A-Z][a-z]+')
@result{}GN not @comment
+patsubst(`GNUs not Unix', `not', `NOT\')
address@hidden:6: m4: Warning: trailing \ ignored in replacement
address@hidden NOT Unix
@end example
Here is a slightly more realistic example, which capitalizes individual
@@ -3276,8 +3301,21 @@ capitalize(`GNUs not Unix')
@result{}Gnus Not Unix
@end example
-The builtin macro @code{patsubst} is recognized only when given
-arguments.
+While @code{regexp} replaces the whole input with the replacement as
+soon as there is a match, @code{patsubst} replaces each
address@hidden of a match and preserves non matching pieces:
+
address@hidden
+define(`patreg',
+`patsubst($@@)
+regexp($@@)')dnl
+patreg(`bar foo baz Foo', `foo\|Foo', `FOO')
address@hidden FOO baz FOO
address@hidden
+patreg(`aba abb 121', `\(.\)\(.\)\1', `\2\1\2')
address@hidden abb 212
address@hidden
address@hidden example
@node Format
@section Formatted output
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.24
diff -u -p -r1.1.1.1.2.24 builtin.c
--- src/builtin.c 14 Jul 2006 15:15:58 -0000 1.1.1.1.2.24
+++ src/builtin.c 14 Jul 2006 20:21:24 -0000
@@ -1649,6 +1649,14 @@ Warning: \\0 will disappear, use \\& ins
if (regs->end[ch] > 0)
obstack_grow (obs, victim + regs->start[ch],
regs->end[ch] - regs->start[ch]);
+ else
+ M4ERROR ((warning_status, 0, "\
+Warning: sub-expression %d not present", ch));
+ break;
+
+ case '\0':
+ M4ERROR ((warning_status, 0, "\
+Warning: trailing \\ ignored in replacement"));
break;
default:
- branch-1_4 patsubst replacement bug,
Eric Blake <=