[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: branch-1_4 8-bit clean translit
From: |
Eric Blake |
Subject: |
Re: branch-1_4 8-bit clean translit |
Date: |
Sat, 11 Nov 2006 06:58:46 -0700 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.8) Gecko/20061025 Thunderbird/1.5.0.8 Mnenhy/0.7.4.666 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Eric Blake on 11/11/2006 5:54 AM:
>
> 2006-11-11 Eric Blake <address@hidden>
>
> * src/builtin.c: Remove unnecessary casts.
> (expand_ranges): Make 8-bit clean.
Ported to head as follows:
2006-11-11 Eric Blake <address@hidden>
* m4/macro.c (trace_format): Use canonical type name.
* m4/output.c (m4_freeze_diversions): Likewise.
* src/freeze.c (produce_module_dump, dump_symbol_CB)
(produce_frozen_state): Likewise.
* m4/m4private.h (to_uchar): Grab from branch.
* m4/input.c (string_peek, string_read): Use it.
* m4/utility.c (skip_space): Likewise.
* src/main.c (main): Likewise.
* doc/m4.texinfo (Translit): Remerge from branch.
* tests/builtins.at (translit): Test 8-bit range.
* modules/m4.c (m4_expand_ranges): Merge from branch.
- --
Life is short - so eat dessert first!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFVdcW84KuGfSFAYARAuSRAJoDCm5zj5rMti1TzJVrCTFLZ19KiACfTppi
OzKurlD1d32dP0v6G0q85zs=
=jsLg
-----END PGP SIGNATURE-----
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.77
diff -u -p -r1.77 m4.texinfo
--- doc/m4.texinfo 8 Nov 2006 19:06:00 -0000 1.77
+++ doc/m4.texinfo 11 Nov 2006 13:57:48 -0000
@@ -4904,14 +4904,15 @@ translation pass is made, even if charac
appear in @var{chars}.
As a @acronym{GNU} extension, both @var{chars} and @var{replacement} can
-contain character-ranges,
-e.g., @samp{a-z} (meaning all lowercase letters) or @samp{0-9} (meaning
-all digits). To include a dash @samp{-} in @var{chars} or
address@hidden, place it first or last.
-
-It is not an error for the last character in the range to be `larger'
-than the first. In that case, the range runs backwards, i.e.,
address@hidden means the string @samp{9876543210}.
+contain character-ranges, e.g., @samp{a-z} (meaning all lowercase
+letters) or @samp{0-9} (meaning all digits). To include a dash @samp{-}
+in @var{chars} or @var{replacement}, place it first or last in the
+entire string, or as the last character of a range. Back-to-back ranges
+can share a common endpoint. It is not an error for the last character
+in the range to be `larger' than the first. In that case, the range
+runs backwards, i.e., @samp{9-0} means the string @samp{9876543210}.
+The expansion of a range is dependent on the underlying encoding of
+characters, so using ranges is not always portable between machines.
The macro @code{translit} is recognized only with parameters.
@end deffn
@@ -4923,17 +4924,21 @@ translit(`GNUs not Unix', `a-z', `A-Z')
@result{}GNUS NOT UNIX
translit(`GNUs not Unix', `A-Z', `z-a')
@result{}tmfs not fnix
+translit(`+,-12345', `+--1-5', `<;>a-c-a')
address@hidden<;>abcba
translit(`abcdef', `aabdef', `bcged')
@result{}bgced
@end example
-The first example deletes all uppercase letters, the second converts
-lowercase to uppercase, and the third `mirrors' all uppercase letters,
-while converting them to lowercase. The two first cases are by far the
-most common. The final example shows that @samp{a} is mapped to
address@hidden, not @samp{c}; the resulting @samp{b} is not further remapped
-to @samp{g}; the @samp{d} and @samp{e} are swapped, and the @samp{f} is
-discarded.
+In the @sc{ascii} encoding, the first example deletes all uppercase
+letters, the second converts lowercase to uppercase, and the third
+`mirrors' all uppercase letters, while converting them to lowercase.
+The two first cases are by far the most common, even though they are not
+portable to @sc{ebcdic} or other encodings. The fourth example shows a
+range ending in @samp{-}, as well as back-to-back ranges. The final
+example shows that @samp{a} is mapped to @samp{b}, not @samp{c}; the
+resulting @samp{b} is not further remapped to @samp{g}; the @samp{d} and
address@hidden are swapped, and the @samp{f} is discarded.
Omitting @var{chars} evokes a warning, but still produces output.
Index: m4/input.c
===================================================================
RCS file: /sources/m4/m4/m4/input.c,v
retrieving revision 1.56
diff -u -p -r1.56 input.c
--- m4/input.c 27 Oct 2006 17:03:51 -0000 1.56
+++ m4/input.c 11 Nov 2006 13:57:49 -0000
@@ -450,7 +450,7 @@ static struct input_funcs string_funcs =
static int
string_peek (m4_input_block *me)
{
- int ch = (unsigned char) *me->u.u_s.current;
+ int ch = to_uchar (*me->u.u_s.current);
return (ch == '\0') ? CHAR_RETRY : ch;
}
@@ -459,7 +459,7 @@ static int
string_read (m4_input_block *me, m4 *context M4_GNUC_UNUSED,
bool retry M4_GNUC_UNUSED)
{
- int ch = (unsigned char) *me->u.u_s.current;
+ int ch = to_uchar (*me->u.u_s.current);
if (ch == '\0')
return CHAR_RETRY;
me->u.u_s.current++;
Index: m4/m4private.h
===================================================================
RCS file: /sources/m4/m4/m4/m4private.h,v
retrieving revision 1.70
diff -u -p -r1.70 m4private.h
--- m4/m4private.h 31 Oct 2006 02:24:50 -0000 1.70
+++ m4/m4private.h 11 Nov 2006 13:57:49 -0000
@@ -352,6 +352,16 @@ struct m4__search_path_info {
extern void m4__include_init (m4 *);
+/* Convert a possibly-signed character to an unsigned character. This is
+ a bit safer than casting to unsigned char, since it catches some type
+ errors that the cast doesn't. */
+#if HAVE_INLINE
+static inline unsigned char to_uchar (char ch) { return ch; }
+#else
+# define to_uchar(C) ((unsigned char) (C))
+#endif
+
+
/* Debugging the memory allocator. */
#if WITH_DMALLOC
Index: m4/macro.c
===================================================================
RCS file: /sources/m4/m4/m4/macro.c,v
retrieving revision 1.62
diff -u -p -r1.62 macro.c
--- m4/macro.c 27 Oct 2006 17:03:51 -0000 1.62
+++ m4/macro.c 11 Nov 2006 13:57:49 -0000
@@ -580,7 +580,7 @@ trace_format (m4 *context, const char *f
size_t z = va_arg (args, size_t);
char nbuf[INT_BUFSIZE_BOUND (size_t)];
- sprintf (nbuf, "%lu", (unsigned long) z);
+ sprintf (nbuf, "%lu", (unsigned long int) z);
s = nbuf;
}
break;
Index: m4/output.c
===================================================================
RCS file: /sources/m4/m4/m4/output.c,v
retrieving revision 1.37
diff -u -p -r1.37 output.c
--- m4/output.c 8 Nov 2006 05:11:47 -0000 1.37
+++ m4/output.c 11 Nov 2006 13:57:49 -0000
@@ -765,11 +765,11 @@ m4_freeze_diversions (m4 *context, FILE
fix frozen file format to support 64-bit
integers. */
if (file_stat.st_size < 0
- || file_stat.st_size != (unsigned long) file_stat.st_size)
+ || file_stat.st_size != (unsigned long int) file_stat.st_size)
m4_error (context, EXIT_FAILURE, errno,
_("diversion too large"));
fprintf (file, "D%d,%lu", diversion->divnum,
- (unsigned long) file_stat.st_size);
+ (unsigned long int) file_stat.st_size);
}
m4_insert_diversion_helper (context, diversion, node);
Index: m4/utility.c
===================================================================
RCS file: /sources/m4/m4/m4/utility.c,v
retrieving revision 1.54
diff -u -p -r1.54 utility.c
--- m4/utility.c 13 Oct 2006 16:46:47 -0000 1.54
+++ m4/utility.c 11 Nov 2006 13:57:49 -0000
@@ -62,7 +62,7 @@ m4_bad_argc (m4 *context, int argc, m4_s
static const char *
skip_space (m4 *context, const char *arg)
{
- while (m4_has_syntax (M4SYNTAX, (unsigned char) *arg, M4_SYNTAX_SPACE))
+ while (m4_has_syntax (M4SYNTAX, to_uchar (*arg), M4_SYNTAX_SPACE))
arg++;
return arg;
}
Index: modules/m4.c
===================================================================
RCS file: /sources/m4/m4/modules/m4.c,v
retrieving revision 1.91
diff -u -p -r1.91 m4.c
--- modules/m4.c 7 Nov 2006 19:18:10 -0000 1.91
+++ modules/m4.c 11 Nov 2006 13:57:49 -0000
@@ -920,8 +920,8 @@ M4BUILTIN_HANDLER (substr)
const char *
m4_expand_ranges (const char *s, m4_obstack *obs)
{
- char from;
- char to;
+ unsigned char from;
+ unsigned char to;
assert (obstack_object_size (obs) == 0);
for (from = '\0'; *s != '\0'; from = *s++)
Index: src/freeze.c
===================================================================
RCS file: /sources/m4/m4/src/freeze.c,v
retrieving revision 1.54
diff -u -p -r1.54 freeze.c
--- src/freeze.c 27 Oct 2006 17:03:51 -0000 1.54
+++ src/freeze.c 11 Nov 2006 13:57:49 -0000
@@ -142,7 +142,7 @@ produce_module_dump (FILE *file, lt_dlha
if (handle)
produce_module_dump (file, handle);
- fprintf (file, "M%lu\n", (unsigned long) strlen (name));
+ fprintf (file, "M%lu\n", (unsigned long int) strlen (name));
fputs (name, file);
fputc ('\n', file);
}
@@ -168,10 +168,10 @@ dump_symbol_CB (m4_symbol_table *symtab,
if (m4_is_symbol_text (symbol))
{
fprintf (file, "T%lu,%lu",
- (unsigned long) strlen (symbol_name),
- (unsigned long) strlen (m4_get_symbol_text (symbol)));
+ (unsigned long int) strlen (symbol_name),
+ (unsigned long int) strlen (m4_get_symbol_text (symbol)));
if (handle)
- fprintf (file, ",%lu", (unsigned long) strlen (module_name));
+ fprintf (file, ",%lu", (unsigned long int) strlen (module_name));
fputc ('\n', file);
fputs (symbol_name, file);
@@ -189,12 +189,12 @@ dump_symbol_CB (m4_symbol_table *symtab,
assert (!"INTERNAL ERROR: builtin not found in builtin table!");
fprintf (file, "F%lu,%lu",
- (unsigned long) strlen (symbol_name),
- (unsigned long) strlen (bp->name));
+ (unsigned long int) strlen (symbol_name),
+ (unsigned long int) strlen (bp->name));
if (handle)
fprintf (file, ",%lu",
- (unsigned long) strlen (module_name));
+ (unsigned long int) strlen (module_name));
fputc ('\n', file);
fputs (symbol_name, file);
@@ -241,8 +241,8 @@ produce_frozen_state (m4 *context, const
|| strcmp (m4_get_syntax_rquote (M4SYNTAX), DEF_RQUOTE))
{
fprintf (file, "Q%lu,%lu\n",
- (unsigned long) context->syntax->lquote.length,
- (unsigned long) context->syntax->rquote.length);
+ (unsigned long int) context->syntax->lquote.length,
+ (unsigned long int) context->syntax->rquote.length);
fputs (context->syntax->lquote.string, file);
fputs (context->syntax->rquote.string, file);
fputc ('\n', file);
@@ -254,8 +254,8 @@ produce_frozen_state (m4 *context, const
|| strcmp (m4_get_syntax_ecomm (M4SYNTAX), DEF_ECOMM))
{
fprintf (file, "C%lu,%lu\n",
- (unsigned long) context->syntax->bcomm.length,
- (unsigned long) context->syntax->ecomm.length);
+ (unsigned long int) context->syntax->bcomm.length,
+ (unsigned long int) context->syntax->ecomm.length);
fputs (context->syntax->bcomm.string, file);
fputs (context->syntax->ecomm.string, file);
fputc ('\n', file);
Index: src/main.c
===================================================================
RCS file: /sources/m4/m4/src/main.c,v
retrieving revision 1.101
diff -u -p -r1.101 main.c
--- src/main.c 8 Nov 2006 19:06:00 -0000 1.101
+++ src/main.c 11 Nov 2006 13:57:49 -0000
@@ -395,7 +395,7 @@ main (int argc, char *const *argv, char
/* In 1.4.x, -B<num> was a no-op option for compatibility with
Solaris m4. Warn if optarg is all numeric. FIXME -
silence this warning after 2.0. */
- if (isdigit ((unsigned char) *optarg))
+ if (isdigit (to_uchar (*optarg)))
{
char *end;
errno = 0;
Index: tests/builtins.at
===================================================================
RCS file: /sources/m4/m4/tests/builtins.at,v
retrieving revision 1.32
diff -u -p -r1.32 builtins.at
--- tests/builtins.at 8 Nov 2006 04:26:53 -0000 1.32
+++ tests/builtins.at 11 Nov 2006 13:57:49 -0000
@@ -932,6 +932,12 @@ AT_DATA([[in]],
AT_CHECK_M4([in], [0], [[c]m4_for([i],[1],[5000],[],[[d]])
])
+dnl This validates that ranges are built using unsigned chars.
+AT_DATA([in], [[translit(`«abc~', `~-»')
+]])
+AT_CHECK_M4([in], [0], [[abc
+]])
+
AT_CLEANUP