[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch-1_4 debian bug 311378 - 8-bit quotes
From: |
Eric Blake |
Subject: |
branch-1_4 debian bug 311378 - 8-bit quotes |
Date: |
Mon, 31 Jul 2006 20:53:16 -0600 |
User-agent: |
Thunderbird 1.5.0.5 (Windows/20060719) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=311378 complains:
$ m4 > samp1 <<EOF
> changequote(«,»)dnl
> define(a,b)dnl
> «a»
> EOF
«b»
And indeed, on platforms where char is signed, we had some sign extension
bugs, since we were comparing getc()'s unsigned chars vs a char*. With
this patch, m4 should now be 8-bit clean; I went the path of always using
unsigned char in the parser.
Unfortunately, I don't know any good way to put an example of 8-bit
characters in the documentation. Info will faithfully reproduce literal
characters (but it may render horribly depending on your local), while TeX
ignores 8-bit characters and needs a command for a glyph. So for now, I
left the examples in an @ignore block, so at least the testsuite will
ensure we don't regress.
2006-07-31 Eric Blake <address@hidden>
* src/input.c (peek_input, next_char, match_input): Be eight-bit
clean; fixes debian bug 311378.
* doc/m4.texinfo (Syntax): Describe eight-bit handling.
(Changequote, Changecom): Add examples to test this.
* NEWS: Document this fix.
* THANKS: Update.
Reported by Steven Augart.
- --
Life is short - so eat dessert first!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEzsIb84KuGfSFAYARAjV6AKC4F7Y2rpNKr8LzY8Murz2fnAy01gCfY4pv
adcorwShehrSo21KhbyPdvg=
=dSGW
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.45
diff -u -p -r1.1.1.1.2.45 NEWS
--- NEWS 30 Jul 2006 03:18:12 -0000 1.1.1.1.2.45
+++ NEWS 1 Aug 2006 02:44:33 -0000
@@ -20,6 +20,7 @@ Version 1.4.6 - ?? 2006, by ?? (CVS ver
* The __file__ macro, and the -s/--synclines option, now show what
directory a file was found in when the -I/--include option or M4PATH
variable had an effect.
+* The changequote and changecom macros now work with 8-bit characters.
Version 1.4.5 - 15 July 2006, by Eric Blake (CVS version 1.4.4c)
Index: src/input.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/input.c,v
retrieving revision 1.1.1.1.2.13
diff -u -p -r1.1.1.1.2.13 input.c
--- src/input.c 30 Jul 2006 23:46:51 -0000 1.1.1.1.2.13
+++ src/input.c 1 Aug 2006 02:44:33 -0000
@@ -397,7 +397,7 @@ init_macro_token (token_data *td)
int
peek_input (void)
{
- register int ch;
+ int ch;
while (1)
{
@@ -407,7 +407,7 @@ peek_input (void)
switch (isp->type)
{
case INPUT_STRING:
- ch = isp->u.u_s.string[0];
+ ch = to_uchar (isp->u.u_s.string[0]);
if (ch != '\0')
return ch;
break;
@@ -446,13 +446,13 @@ peek_input (void)
#define next_char() \
(isp && isp->type == INPUT_STRING && isp->u.u_s.string[0] \
- ? *isp->u.u_s.string++ \
+ ? to_uchar (*isp->u.u_s.string++) \
: next_char_1 ())
static int
next_char_1 (void)
{
- register int ch;
+ int ch;
if (start_of_input_line)
{
@@ -468,7 +468,7 @@ next_char_1 (void)
switch (isp->type)
{
case INPUT_STRING:
- ch = *isp->u.u_s.string++;
+ ch = to_uchar (*isp->u.u_s.string++);
if (ch != '\0')
return ch;
break;
@@ -531,14 +531,14 @@ match_input (const char *s)
const char *t;
ch = peek_input ();
- if (ch != *s)
+ if (ch != to_uchar (*s))
return 0; /* fail */
(void) next_char ();
if (s[1] == '\0')
return 1; /* short match */
- for (n = 1, t = s++; (ch = peek_input ()) == *s++; n++)
+ for (n = 1, t = s++; (ch = peek_input ()) == to_uchar (*s++); n++)
{
(void) next_char ();
if (*s == '\0') /* long match */
@@ -564,9 +564,9 @@ match_input (const char *s)
`------------------------------------------------------------------------*/
#define MATCH(ch, s) \
- ((s)[0] == (ch) \
- && (ch) != '\0' \
- && ((s)[1] == '\0' \
+ (to_uchar ((s)[0]) == (ch) \
+ && (ch) != '\0' \
+ && ((s)[1] == '\0' \
|| (match_input ((s) + 1) ? (ch) = peek_input (), 1 : 0)))
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.57
diff -u -p -r1.1.1.1.2.57 m4.texinfo
--- doc/m4.texinfo 31 Jul 2006 20:28:12 -0000 1.1.1.1.2.57
+++ doc/m4.texinfo 1 Aug 2006 02:44:34 -0000
@@ -698,8 +698,12 @@ primitive is spelled within @code{m4}.
As @code{m4} reads its input, it separates it into @dfn{tokens}. A
token is either a name, a quoted string, or any single character, that
is not a part of either a name or a string. Input to @code{m4} can also
-contain comments. @acronym{GNU} @code{m4} does not yet understand locales; all
-operations are byte-oriented rather than character-oriented.
+contain comments. @acronym{GNU} @code{m4} does not yet understand
+locales; all operations are byte-oriented rather than
+character-oriented. However, @code{m4} is eight-bit clean, so you can
+use non-ASCII characters in quoted strings (@pxref{Changequote}),
+comments (@pxref{Changecom}), and macro names (@pxref{Indir}), with the
+exception of the NUL character (the zero byte).
@menu
* Names:: Macro names
@@ -2344,6 +2348,23 @@ foo
@result{}Macro foo.
@end example
+The quotation strings can safely contain eight-bit characters.
address@hidden
+Yuck. I know of no clean way to render an 8-bit character in both info
+and dvi. This example uses the `open-guillemot' and `close-guillemot'
+characters of the Latin-1 character set.
+
address@hidden
+define(`a', `b')
address@hidden
+«a»
address@hidden
+changequote(`«', `»')
address@hidden
+«a»
address@hidden
address@hidden example
address@hidden ignore
If no single character is appropriate, @var{start} and @var{end} can be
of any length.
@@ -2380,10 +2401,10 @@ calls of @code{changequote} must be made
and one for the new quotes.
Macros are recognized in preference to the begin-quote string, so if a
-prefix of @var{start} can be recognized as a macro name, the quoting
-mechanism is effectively disabled. Unless you use @code{changeword}
-(@pxref{Changeword}), this means that @var{start} should not begin with
-a letter or @samp{_} (underscore).
+prefix of @var{start} can be recognized as a potential macro name, the
+quoting mechanism is effectively disabled. Unless you use
address@hidden (@pxref{Changeword}), this means that @var{start}
+should not begin with a letter or @samp{_} (underscore).
@example
define(`hi', `HI')
@@ -2490,11 +2511,29 @@ changecom(`#')
@result{}# comment again
@end example
+The comment strings can safely contain eight-bit characters.
address@hidden
+Yuck. I know of no clean way to render an 8-bit character in both info
+and dvi. This example uses the `open-guillemot' and `close-guillemot'
+characters of the Latin-1 character set.
+
address@hidden
+define(`a', `b')
address@hidden
+«a»
address@hidden
+changecom(`«', `»')
address@hidden
+«a»
address@hidden
address@hidden example
address@hidden ignore
+
Comments are recognized in preference to macros. However, this is not
compatible with other implementations, where macros take precedence over
comments, so it may change in a future release. For portability, this
-means that @var{start} should not have a prefix that begins with a
-letter or @samp{_} (underscore).
+means that @var{start} should not begin with a letter or @samp{_}
+(underscore).
@example
define(`hi', `HI')
@@ -4646,6 +4685,7 @@ the first time.
@bye
@c Local Variables:
address@hidden coding: ISO-8859-1
@c fill-column: 72
@c ispell-local-dictionary: "american"
@c indent-tabs-mode: nil
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch-1_4 debian bug 311378 - 8-bit quotes,
Eric Blake <=