[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: format bug
From: |
Eric Blake |
Subject: |
Re: format bug |
Date: |
Tue, 29 May 2007 07:03:07 -0600 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.10) Gecko/20070221 Thunderbird/1.5.0.10 Mnenhy/0.7.5.666 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Eric Blake on 5/28/2007 10:15 PM:
> Try this for a fun time:
>
> $ echo 'format(%*.*d,-1,-1,1)' | m4 | wc
> 1 1 2280281
>
> Oops - that was 2 million+ characters that I wasn't expecting! Patch
> coming up later. I don't think the bug can be exploited to run arbitrary
> code, but executing printf without enough arguments is never a good idea.
Fixed on cygwin with this (some of it borrowed from head, the rest needs
porting to head):
2007-05-29 Eric Blake <address@hidden>
Improve format support.
* m4/gnulib-cache.m4: Augment with 'gnulib-tool --import
vasprintf-posix'.
* src/format.c (format): Parse %'hhd, %a, %A. Avoid calling
printf with too few arguments, as in format(%*.*d,-1,-1,1).
* doc/m4.texinfo (Format): Expand tests, and improve
documentation.
* NEWS: Document this change.
Still broken on mingw, since mingw has a broken atof. Also, the gnulib
stdint and unistd modules are failing on mingw. I'm thinking of cutting
m4 1.4.9b today, because of the extensive gnulib changes, to hopefully get
a bit more of a shakedown from random platforms before going with the
official m4 1.4.10.
- --
Don't work too hard, make some time for fun as well!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGXCSK84KuGfSFAYARAi4rAKCkN0B32VIYp9wZmweL7kat+eGnKwCdGcB/
6YCw4TUMzEEndU96jajsKwI=
=WTbs
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.102
diff -u -p -r1.1.1.1.2.102 NEWS
--- NEWS 28 May 2007 20:24:27 -0000 1.1.1.1.2.102
+++ NEWS 29 May 2007 12:31:05 -0000
@@ -13,6 +13,9 @@ Version 1.4.10 - ?? ??? 2007, by ???? (
* Work around a number of corner-case POSIX compliance bugs in various
broken stdio libraries. In particular, the `syscmd' builtin behaves
more predictably when stdin is seekable.
+* The `format' builtin now understands formats such as %a, %A, and %'hhd,
+ and works around a number of platform printf bugs. Furthermore, the
+ sequence format(%*.*d,-1,-1,1) no longer outputs random data.
Version 1.4.9 - 23 Mar 2007, by Eric Blake (CVS version 1.4.8c)
Index: m4/gnulib-cache.m4
===================================================================
RCS file: /sources/m4/m4/m4/Attic/gnulib-cache.m4,v
retrieving revision 1.1.2.26
diff -u -p -r1.1.2.26 gnulib-cache.m4
--- m4/gnulib-cache.m4 28 May 2007 17:07:04 -0000 1.1.2.26
+++ m4/gnulib-cache.m4 29 May 2007 12:31:05 -0000
@@ -15,11 +15,11 @@
# Specification in the form of a command-line invocation:
-# gnulib-tool --import --dir=. --lib=libm4 --source-base=lib --m4-base=m4
--doc-base=doc --aux-dir=. --with-tests --no-libtool --macro-prefix=M4
avltree-oset binary-io clean-temp cloexec close-stream closein config-h error
fdl fflush fopen-safer free fseeko gendocs getopt gnupload mkstemp obstack
regex stdbool stdint stdlib-safer strtol unlocked-io verror version-etc-fsf
xalloc xvasprintf
+# gnulib-tool --import --dir=. --lib=libm4 --source-base=lib --m4-base=m4
--doc-base=doc --aux-dir=. --with-tests --no-libtool --macro-prefix=M4
avltree-oset binary-io clean-temp cloexec close-stream closein config-h error
fdl fflush fopen-safer free fseeko gendocs getopt gnupload mkstemp obstack
regex stdbool stdint stdlib-safer strtol unlocked-io vasprintf-posix verror
version-etc-fsf xalloc xvasprintf
# Specification in the form of a few gnulib-tool.m4 macro invocations:
gl_LOCAL_DIR([])
-gl_MODULES([avltree-oset binary-io clean-temp cloexec close-stream closein
config-h error fdl fflush fopen-safer free fseeko gendocs getopt gnupload
mkstemp obstack regex stdbool stdint stdlib-safer strtol unlocked-io verror
version-etc-fsf xalloc xvasprintf])
+gl_MODULES([avltree-oset binary-io clean-temp cloexec close-stream closein
config-h error fdl fflush fopen-safer free fseeko gendocs getopt gnupload
mkstemp obstack regex stdbool stdint stdlib-safer strtol unlocked-io
vasprintf-posix verror version-etc-fsf xalloc xvasprintf])
gl_AVOID([])
gl_SOURCE_BASE([lib])
gl_M4_BASE([m4])
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.126
diff -u -p -r1.1.1.1.2.126 m4.texinfo
--- doc/m4.texinfo 25 May 2007 20:29:17 -0000 1.1.1.1.2.126
+++ doc/m4.texinfo 29 May 2007 12:31:05 -0000
@@ -4887,15 +4887,27 @@ The macro @code{format} is recognized on
Its use is best described by a few examples:
address@hidden This test is a bit fragile, if someone tries to port to a
address@hidden platform without infinity.
@example
define(`foo', `The brown fox jumped over the lazy dog')
@result{}
format(`The string "%s" uses %d characters', foo, len(foo))
@result{}The string "The brown fox jumped over the lazy dog" uses 38 characters
+format(`%*.*d', `-1', `-1', `1')
address@hidden
format(`%.0f', `56789.9876')
@result{}56790
-len(format(`%-*X', `300', `1'))
address@hidden
+len(format(`%-*X', `5000', `1'))
address@hidden
+ifelse(format(`%010F', `infinity'), ` INF', `success',
+ format(`%010F', `infinity'), ` INFINITY', `success',
+ `fail')
address@hidden
+ifelse(format(`%.1A', `1.999'), `0X1.0P+1', `success',
+ format(`%.1A', `1.999'), `0X2.0P+0', `success',
+ `fail')
address@hidden
@end example
Using the @code{forloop} macro defined earlier (@pxref{Forloop}), this
@@ -4921,18 +4933,32 @@ forloop(`i', `1', `10', `format(`%6d squ
@end example
The builtin @code{format} is modeled after the ANSI C @samp{printf}
-function, and supports these @samp{%} specifiers: @samp{c},
address@hidden, @samp{d}, @samp{o}, @samp{x}, @samp{X}, @samp{u}, @samp{e},
address@hidden, @samp{f}, @samp{F}, @samp{g}, @samp{G}, and @samp{%}; it
-supports field widths and precisions, and the
-modifiers @samp{+}, @samp{-}, @address@hidden }}, @samp{0}, @samp{#}, @samp{h}
and
address@hidden For more details on the functioning of @code{printf}, see the
-C Library Manual.
-
-For now, unrecognized specifiers are silently ignored, but it is
-anticipated that a future release of @acronym{GNU} @code{m4} will support more
-specifiers, and give warnings when problems are encountered. Likewise,
-escape sequences are not yet recognized.
+function, and supports these @samp{%} specifiers: @samp{c}, @samp{s},
address@hidden, @samp{o}, @samp{x}, @samp{X}, @samp{u}, @samp{a}, @samp{A},
address@hidden, @samp{E}, @samp{f}, @samp{F}, @samp{g}, @samp{G}, and
address@hidden; it supports field widths and precisions, and the flags
address@hidden, @samp{-}, @samp{ }, @samp{0}, @samp{#}, and @samp{'}. For
+integer specifiers, the width modifiers @samp{hh}, @samp{h}, and
address@hidden are recognized, and for floating point specifiers, the width
+modifier @samp{l} is recognized. Items not yet supported include
+positional arguments, the @samp{n}, @samp{p}, @samp{S}, and @samp{C}
+specifiers, the @samp{z}, @samp{t}, @samp{j}, @samp{L} and @samp{ll}
+modifiers, and any platform extensions available in the native
address@hidden For more details on the functioning of @code{printf},
+see the C Library Manual, or the @acronym{POSIX} specification (for
+example, @samp{%a} is supported even on platforms that haven't yet
+implemented C99 hexadecimal floating point output natively).
+
+Unrecognized specifiers result in a warning. It is anticipated that a
+future release of @acronym{GNU} @code{m4} will support more specifiers,
+and give better warnings when various problems such as overflow are
+encountered. Likewise, escape sequences are not yet recognized.
+
address@hidden
+format(`%p', `0')
address@hidden:stdin:1: Warning: unrecognized specifier in `%p'
address@hidden
address@hidden example
@node Arithmetic
@chapter Macros for doing arithmetic
Index: src/format.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/format.c,v
retrieving revision 1.1.1.1.2.6
diff -u -p -r1.1.1.1.2.6 format.c
--- src/format.c 26 Oct 2006 21:11:56 -0000 1.1.1.1.2.6
+++ src/format.c 29 May 2007 12:31:05 -0000
@@ -1,6 +1,6 @@
/* GNU m4 -- A simple macro processor
- Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2006
+ Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2006, 2007
Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
@@ -24,24 +24,17 @@
#include "m4.h"
#include "xvasprintf.h"
-/* Simple varargs substitute. */
+/* Simple varargs substitute. We assume int and unsigned int are the
+ same size; likewise for long and unsigned long. */
#define ARG_INT(argc, argv) \
((argc == 0) ? 0 : \
(--argc, argv++, atoi (TOKEN_DATA_TEXT (argv[-1]))))
-#define ARG_UINT(argc, argv) \
- ((argc == 0) ? 0 : \
- (--argc, argv++, (unsigned int) atoi (TOKEN_DATA_TEXT (argv[-1]))))
-
#define ARG_LONG(argc, argv) \
((argc == 0) ? 0 : \
(--argc, argv++, atol (TOKEN_DATA_TEXT (argv[-1]))))
-#define ARG_ULONG(argc, argv) \
- ((argc == 0) ? 0 : \
- (--argc, argv++, (unsigned long) atol (TOKEN_DATA_TEXT (argv[-1]))))
-
#define ARG_STR(argc, argv) \
((argc == 0) ? "" : \
(--argc, argv++, TOKEN_DATA_TEXT (argv[-1])))
@@ -51,44 +44,60 @@
(--argc, argv++, atof (TOKEN_DATA_TEXT (argv[-1]))))
-/*------------------------------------------------------------------------.
-| The main formatting function. Output is placed on the obstack OBS, the |
-| first argument in ARGV is the formatting string, and the rest is |
-| arguments for the string. |
-`------------------------------------------------------------------------*/
+/*------------------------------------------------------------------.
+| The main formatting function. Output is placed on the obstack |
+| OBS, the first argument in ARGV is the formatting string, and the |
+| rest is arguments for the string. Warn rather than invoke |
+| unspecified behavior in the underlying printf when we do not |
+| recognize a format. |
+`------------------------------------------------------------------*/
void
format (struct obstack *obs, int argc, token_data **argv)
{
- char *fmt; /* format control string */
- const char *fstart; /* beginning of current format spec */
- int c; /* a simple character */
+ const char *f; /* format control string */
+ const char *fmt; /* position within f */
+ char fstart[] = "%'+- 0#*.*hhd"; /* current format spec */
+ char *p; /* position within fstart */
+ unsigned char c; /* a simple character */
/* Flags. */
- char flags; /* 1 iff treating flags */
+ char flags; /* flags to use in fstart */
+ enum {
+ THOUSANDS = 0x01, /* ' */
+ PLUS = 0x02, /* + */
+ MINUS = 0x04, /* - */
+ SPACE = 0x08, /* */
+ ZERO = 0x10, /* 0 */
+ ALT = 0x20, /* # */
+ DONE = 0x40 /* no more flags */
+ };
/* Precision specifiers. */
int width; /* minimum field width */
int prec; /* precision */
char lflag; /* long flag */
- char hflag; /* short flag */
+
+ /* Specifiers we are willing to accept. ok['x'] implies %x is ok.
+ Various modifiers reduce the set, in order to avoid undefined
+ behavior in printf. */
+ char ok[128];
/* Buffer and stuff. */
char *str; /* malloc'd buffer of formatted text */
- enum {INT, UINT, LONG, ULONG, DOUBLE, STR} datatype;
+ enum {INT, LONG, DOUBLE, STR} datatype;
- fmt = (char *) ARG_STR (argc, argv);
+ f = fmt = (const char *) ARG_STR (argc, argv);
+ memset (ok, 0, sizeof ok);
for (;;)
{
while ((c = *fmt++) != '%')
{
- if (c == 0)
+ if (c == '\0')
return;
obstack_1grow (obs, c);
}
- fstart = fmt - 1;
-
if (*fmt == '%')
{
obstack_1grow (obs, '%');
@@ -96,73 +105,145 @@ format (struct obstack *obs, int argc, t
continue;
}
+ p = fstart + 1; /* % */
+ lflag = 0;
+ ok['a'] = ok['A'] = ok['c'] = ok['d'] = ok['e'] = ok['E']
+ = ok['f'] = ok['F'] = ok['g'] = ok['G'] = ok['i'] = ok['o']
+ = ok['s'] = ok['u'] = ok['x'] = ok['X'] = 1;
+
/* Parse flags. */
- flags = 1;
+ flags = 0;
do
{
switch (*fmt)
{
- case '-': /* left justification */
+ case '\'': /* thousands separator */
+ ok['a'] = ok['A'] = ok['c'] = ok['e'] = ok['E']
+ = ok['o'] = ok['s'] = ok['x'] = ok['X'] = 0;
+ flags |= THOUSANDS;
+ break;
+
case '+': /* mandatory sign */
+ ok['c'] = ok['o'] = ok['s'] = ok['u'] = ok['x'] = ok['X'] = 0;
+ flags |= PLUS;
+ break;
+
case ' ': /* space instead of positive sign */
+ ok['c'] = ok['o'] = ok['s'] = ok['u'] = ok['x'] = ok['X'] = 0;
+ flags |= SPACE;
+ break;
+
case '0': /* zero padding */
+ ok['c'] = ok['s'] = 0;
+ flags |= ZERO;
+ break;
+
case '#': /* alternate output */
+ ok['c'] = ok['d'] = ok['i'] = ok['s'] = ok['u'] = 0;
+ flags |= ALT;
+ break;
+
+ case '-': /* left justification */
+ flags |= MINUS;
break;
default:
- flags = 0;
+ flags |= DONE;
break;
}
}
- while (flags && fmt++);
-
- /* Minimum field width. */
- width = -1;
+ while (!(flags & DONE) && fmt++);
+ if (flags & THOUSANDS)
+ *p++ = '\'';
+ if (flags & PLUS)
+ *p++ = '+';
+ if (flags & MINUS)
+ *p++ = '-';
+ if (flags & SPACE)
+ *p++ = ' ';
+ if (flags & ZERO)
+ *p++ = '0';
+ if (flags & ALT)
+ *p++ = '#';
+
+ /* Minimum field width; an explicit 0 is the same as not giving
+ the width. */
+ width = 0;
+ *p++ = '*';
if (*fmt == '*')
{
width = ARG_INT (argc, argv);
fmt++;
}
- else if (isdigit (to_uchar (*fmt)))
+ else
{
- do
+ while (isdigit (to_uchar (*fmt)))
{
+ width = 10 * width + *fmt - '0';
fmt++;
}
- while (isdigit (to_uchar (*fmt)));
}
- /* Maximum precision. */
+ /* Maximum precision; an explicit negative precision is the same
+ as not giving the precision. A lone '.' is a precision of 0. */
prec = -1;
+ *p++ = '.';
+ *p++ = '*';
if (*fmt == '.')
{
+ ok['c'] = 0;
if (*(++fmt) == '*')
{
prec = ARG_INT (argc, argv);
++fmt;
}
- else if (isdigit (to_uchar (*fmt)))
+ else
{
- do
+ prec = 0;
+ while (isdigit (to_uchar (*fmt)))
{
+ prec = 10 * prec + *fmt - '0';
fmt++;
}
- while (isdigit (to_uchar (*fmt)));
}
}
- /* Length modifiers. */
- lflag = (*fmt == 'l');
- hflag = (*fmt == 'h');
- if (lflag || hflag)
- fmt++;
-
- switch (*fmt++)
+ /* Length modifiers. We don't yet recognize ll, j, t, or z. */
+ if (*fmt == 'l')
{
+ *p++ = 'l';
+ lflag = 1;
+ fmt++;
+ ok['c'] = ok['s'] = 0;
+ }
+ else if (*fmt == 'h')
+ {
+ *p++ = 'h';
+ fmt++;
+ if (*fmt == 'h')
+ {
+ *p++ = 'h';
+ fmt++;
+ }
+ ok['a'] = ok['A'] = ok['c'] = ok['e'] = ok['E'] = ok['f'] = ok['F']
+ = ok['g'] = ok['G'] = ok['s'] = 0;
+ }
- case '\0':
- return;
+ c = *fmt++;
+ if (c > sizeof ok || !ok[c])
+ {
+ M4ERROR ((warning_status, 0,
+ "Warning: unrecognized specifier in `%s'", f));
+ if (c == '\0')
+ fmt--;
+ continue;
+ }
+ *p++ = c;
+ *p = '\0';
+ /* Specifiers. We don't yet recognize C, S, n, or p. */
+ switch (c)
+ {
case 'c':
datatype = INT;
break;
@@ -173,30 +254,15 @@ format (struct obstack *obs, int argc, t
case 'd':
case 'i':
- if (lflag)
- {
- datatype = LONG;
- }
- else
- {
- datatype = INT;
- }
- break;
-
case 'o':
case 'x':
case 'X':
case 'u':
- if (lflag)
- {
- datatype = ULONG;
- }
- else
- {
- datatype = UINT;
- }
+ datatype = lflag ? LONG : INT;
break;
+ case 'a':
+ case 'A':
case 'e':
case 'E':
case 'f':
@@ -207,86 +273,31 @@ format (struct obstack *obs, int argc, t
break;
default:
- continue;
+ abort ();
}
- c = *fmt;
- *fmt = '\0';
-
switch(datatype)
{
case INT:
- if (width != -1 && prec != -1)
- str = xasprintf (fstart, width, prec, ARG_INT(argc, argv));
- else if (width != -1)
- str = xasprintf (fstart, width, ARG_INT(argc, argv));
- else if (prec != -1)
- str = xasprintf (fstart, prec, ARG_INT(argc, argv));
- else
- str = xasprintf (fstart, ARG_INT(argc, argv));
- break;
-
- case UINT:
- if (width != -1 && prec != -1)
- str = xasprintf (fstart, width, prec, ARG_UINT(argc, argv));
- else if (width != -1)
- str = xasprintf (fstart, width, ARG_UINT(argc, argv));
- else if (prec != -1)
- str = xasprintf (fstart, prec, ARG_UINT(argc, argv));
- else
- str = xasprintf (fstart, ARG_UINT(argc, argv));
+ str = xasprintf (fstart, width, prec, ARG_INT(argc, argv));
break;
case LONG:
- if (width != -1 && prec != -1)
- str = xasprintf (fstart, width, prec, ARG_LONG(argc, argv));
- else if (width != -1)
- str = xasprintf (fstart, width, ARG_LONG(argc, argv));
- else if (prec != -1)
- str = xasprintf (fstart, prec, ARG_LONG(argc, argv));
- else
- str = xasprintf (fstart, ARG_LONG(argc, argv));
- break;
-
- case ULONG:
- if (width != -1 && prec != -1)
- str = xasprintf (fstart, width, prec, ARG_ULONG(argc, argv));
- else if (width != -1)
- str = xasprintf (fstart, width, ARG_ULONG(argc, argv));
- else if (prec != -1)
- str = xasprintf (fstart, prec, ARG_ULONG(argc, argv));
- else
- str = xasprintf (fstart, ARG_ULONG(argc, argv));
+ str = xasprintf (fstart, width, prec, ARG_LONG(argc, argv));
break;
case DOUBLE:
- if (width != -1 && prec != -1)
- str = xasprintf (fstart, width, prec, ARG_DOUBLE(argc, argv));
- else if (width != -1)
- str = xasprintf (fstart, width, ARG_DOUBLE(argc, argv));
- else if (prec != -1)
- str = xasprintf (fstart, prec, ARG_DOUBLE(argc, argv));
- else
- str = xasprintf (fstart, ARG_DOUBLE(argc, argv));
+ str = xasprintf (fstart, width, prec, ARG_DOUBLE(argc, argv));
break;
case STR:
- if (width != -1 && prec != -1)
- str = xasprintf (fstart, width, prec, ARG_STR(argc, argv));
- else if (width != -1)
- str = xasprintf (fstart, width, ARG_STR(argc, argv));
- else if (prec != -1)
- str = xasprintf (fstart, prec, ARG_STR(argc, argv));
- else
- str = xasprintf (fstart, ARG_STR(argc, argv));
+ str = xasprintf (fstart, width, prec, ARG_STR(argc, argv));
break;
default:
abort();
}
- *fmt = c;
-
/* NULL was returned on failure, such as invalid format string. For
now, just silently ignore that bad specifier. */
if (str == NULL)
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: format bug,
Eric Blake <=