[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: linkedhash-list vs. hash
From: |
Eric Blake |
Subject: |
Re: linkedhash-list vs. hash |
Date: |
Sat, 26 Jul 2008 14:06:40 -0600 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080708 Thunderbird/2.0.0.16 Mnenhy/0.7.5.666 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Bruno Haible on 7/23/2008 5:40 AM:
| Eric Blake wrote:
|> both linkedhash-list and hash appear to fit the bill
|
| The main difference between linkedhash-list and hash is that the former
| implements a list in its own right, i.e. iterating over the elements always
| returns them in the defined order. Whereas in 'hash', you are effectively
| iterating over a set, not a list; i.e. the elements come out in random
order.
m4 doesn't care about the order of the iteration. So I went with the hash
module.
|
|> Does it really matter whether the set size is prime vs. 2^n-1 in how
|> likely a modulo operation in the hash is to cause collisions?
|
| I would not compromise on this - a modulo operation is fast on all hardware
| since 1995, and the time to search for the next prime around n is
| O(sqrt(n)*log(n)) - much less than the O(n) that you need for resizing the
| table.
Reasonable enough; so the dynamic calculation of a prime is not a strike
against the hash module.
|
| Also, when your hash table size is 2^n-1, you are calling malloc(8*(2^n-1)),
| which - in some malloc implementations - may end up allocating twice as much
| memory, because sizeof(malloc_header_size) + 8*(2^n-1) is just slightly
larger
| than a power of two.
Good point.
|
|> Another difference: linkedlist-hash is hard-coded on its growth
|> parameters, while hash allows the user to specify both fullness threshold
|> and growth factors
|
| Whether you need this or not, depends on the ratio between the number of
| lookups in the table vs. the number of insertions in the table. If this
| ratio is low, you profit from a larger growth factor (maybe around 1.5
or 2.0);
| if it is high, a smaller growth factor (around 1.2 or 1.4) is likely better.
| Anyway, I would first do some profiling to see whether it matters at all.
For 'autoconf -f' on coreutils: pre-patch noticed 2246308 lookups, 209873
collisions, using 509 buckets and 0 resizes. Overall time was 22.296 seconds.
post-patch, there were still 2246308 lookups, but only 20784 collisions
(an order of magnitude better). Also, this counted 2326034 hash
callbacks, 3433800 compare callbacks, using 3659 buckets and 5 resizes
from the default size (when m4 -H is not specified, the default is 509,
but since I used NULL for the tuning parameters, the hash module
pre-scales that to 641 buckets to anticipate 80% capacity at 509 entries).
~ Overall time was 22.169 seconds (barely noticeable).
At any rate, here's the series of patches I've pushed to m4 to switch from
hand-rolled to the gnulib hash module.
- --
Don't work too hard, make some time for fun as well!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkiLg9AACgkQ84KuGfSFAYC3+gCcCOurXGasT0LrzQ21G5AYbYvR
k7wAoNbgzj2ptKMrBFwZk2Tr5kzBa4LK
=KE1d
-----END PGP SIGNATURE-----
>From 0735b27919f180fcbc03e90681c0c849f40a63c5 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 22 Jul 2008 16:58:26 -0600
Subject: [PATCH] Backport faster hash lookups of pushdef stack collisions.
* src/m4.h (struct symbol): Replace shadowed bit by stack field.
(SYMBOL_SHADOWED): Delete.
* src/symtab.c (lookup_symbol): Manage pushdef stacks
independently from hash buckets, so that collisions with large
pushdef stacks can be quickly skipped over.
(symtab_print_list): Adjust caller.
* src/builtin.c (dump_symbol): Likewise.
* src/freeze.c (reverse_symbol_list, produce_frozen_state):
Likewise.
* doc/m4.texinfo (Using frozen files): Test the change.
Signed-off-by: Eric Blake <address@hidden>
---
ChangeLog | 12 ++++++++
doc/m4.texinfo | 16 +++++++++++
src/builtin.c | 2 +-
src/freeze.c | 25 +++++++++--------
src/m4.h | 5 +--
src/symtab.c | 77 ++++++++++++++++++++++++++++++++++---------------------
6 files changed, 91 insertions(+), 46 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index c393ee2..dda5fc0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,17 @@
2008-07-22 Eric Blake <address@hidden>
+ Backport faster hash lookups of pushdef stack collisions.
+ * src/m4.h (struct symbol): Replace shadowed bit by stack field.
+ (SYMBOL_SHADOWED): Delete.
+ * src/symtab.c (lookup_symbol): Manage pushdef stacks
+ independently from hash buckets, so that collisions with large
+ pushdef stacks can be quickly skipped over.
+ (symtab_print_list): Adjust caller.
+ * src/builtin.c (dump_symbol): Likewise.
+ * src/freeze.c (reverse_symbol_list, produce_frozen_state):
+ Likewise.
+ * doc/m4.texinfo (Using frozen files): Test the change.
+
Mention patch to make earlier autoconf happy.
* NEWS: Give URL to autoconf patch to avoid undefined popdef.
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 79ea41d..e78061c 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -7084,6 +7084,22 @@ traceon(`undefined')dnl
@c Make sure freezing is successful.
address@hidden
+ifdef(`__unix__', ,
+ `errprint(` skipping: syscmd does not have unix semantics
+')m4exit(`77')')dnl
+changequote(`[', `]')dnl
+syscmd([echo 'changequote([,])pushdef([divnum],[hi])dnl' \
+ | ]__program__[ -F in.m4f \
+ && echo 'divnum popdef([divnum])divnum' \
+ | ]__program__[ -R in.m4f \
+ && rm in.m4f])status sysval
address@hidden 0
address@hidden 0
address@hidden example
+
address@hidden Detect inability to freeze.
+
@comment options: -F /none/such
@comment status: 1
@example
diff --git a/src/builtin.c b/src/builtin.c
index 0ea814a..f8a3f3c 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -826,7 +826,7 @@ static void
dump_symbol (symbol *sym, void *arg)
{
struct dump_symbol_data *data = (struct dump_symbol_data *) arg;
- if (!SYMBOL_SHADOWED (sym) && SYMBOL_TYPE (sym) != TOKEN_VOID)
+ if (SYMBOL_TYPE (sym) != TOKEN_VOID)
{
obstack_blank (data->obs, sizeof (symbol *));
data->base = (symbol **) obstack_base (data->obs);
diff --git a/src/freeze.c b/src/freeze.c
index 5e35c81..4d0eeb4 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -36,8 +36,8 @@ reverse_symbol_list (symbol *sym)
result = NULL;
while (sym)
{
- next = SYMBOL_NEXT (sym);
- SYMBOL_NEXT (sym) = result;
+ next = sym->stack;
+ sym->stack = result;
result = sym;
sym = next;
}
@@ -53,6 +53,7 @@ produce_frozen_state (const char *name)
{
FILE *file;
int h;
+ symbol *stack;
symbol *sym;
const builtin *bp;
@@ -95,15 +96,16 @@ produce_frozen_state (const char *name)
/* Dump all symbols. */
- for (h = 0; h < hash_table_size; h++)
+ for (stack = symtab[h = 0]; h < hash_table_size;
+ stack = (stack ? SYMBOL_NEXT (stack) : symtab[h++]))
{
-
- /* Process all entries in one bucket, from the last to the first.
+ if (!stack)
+ continue;
+ /* Process all entries in each stack from the last to the first.
This order ensures that, at reload time, pushdef's will be
executed with the oldest definitions first. */
-
- symtab[h] = reverse_symbol_list (symtab[h]);
- for (sym = symtab[h]; sym; sym = SYMBOL_NEXT (sym))
+ sym = stack = reverse_symbol_list (stack);
+ while (sym)
{
switch (SYMBOL_TYPE (sym))
{
@@ -140,11 +142,10 @@ produce_frozen_state (const char *name)
abort ();
break;
}
+ sym = sym->stack;
}
-
- /* Reverse the bucket once more, putting it back as it was. */
-
- symtab[h] = reverse_symbol_list (symtab[h]);
+ /* Reverse the stack once more, putting it back as it was. */
+ stack = reverse_symbol_list (stack);
}
/* Let diversions be issued from output.c module, its cleaner to have this
diff --git a/src/m4.h b/src/m4.h
index 3afe476..dcb4613 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -417,9 +417,9 @@ enum symbol_lookup
/* Symbol table entry. */
struct symbol
{
- struct symbol *next;
+ struct symbol *next; /* Next symbol with the same hash. */
+ struct symbol *stack; /* Stack of shadowed symbols of the same name. */
bool_bitfield traced : 1;
- bool_bitfield shadowed : 1;
bool_bitfield macro_args : 1;
bool_bitfield blind_no_args : 1;
bool_bitfield deleted : 1;
@@ -432,7 +432,6 @@ struct symbol
#define SYMBOL_NEXT(S) ((S)->next)
#define SYMBOL_TRACED(S) ((S)->traced)
-#define SYMBOL_SHADOWED(S) ((S)->shadowed)
#define SYMBOL_MACRO_ARGS(S) ((S)->macro_args)
#define SYMBOL_BLIND_NO_ARGS(S) ((S)->blind_no_args)
#define SYMBOL_DELETED(S) ((S)->deleted)
diff --git a/src/symtab.c b/src/symtab.c
index 7e76cfc..381d025 100644
--- a/src/symtab.c
+++ b/src/symtab.c
@@ -215,7 +215,6 @@ lookup_symbol (const char *name, size_t len, symbol_lookup
mode)
SYMBOL_TRACED (sym) = SYMBOL_TRACED (old);
SYMBOL_NAME (sym) = xmemdup0 (name, len);
SYMBOL_NAME_LEN (sym) = len;
- SYMBOL_SHADOWED (sym) = false;
SYMBOL_MACRO_ARGS (sym) = false;
SYMBOL_BLIND_NO_ARGS (sym) = false;
SYMBOL_DELETED (sym) = false;
@@ -223,7 +222,9 @@ lookup_symbol (const char *name, size_t len, symbol_lookup
mode)
SYMBOL_NEXT (sym) = SYMBOL_NEXT (old);
SYMBOL_NEXT (old) = NULL;
- (*spp) = sym;
+ sym->stack = old->stack;
+ old->stack = NULL;
+ *spp = sym;
}
return sym;
}
@@ -240,19 +241,21 @@ lookup_symbol (const char *name, size_t len,
symbol_lookup mode)
SYMBOL_TRACED (sym) = false;
SYMBOL_NAME (sym) = xmemdup0 (name, len);
SYMBOL_NAME_LEN (sym) = len;
- SYMBOL_SHADOWED (sym) = false;
SYMBOL_MACRO_ARGS (sym) = false;
SYMBOL_BLIND_NO_ARGS (sym) = false;
SYMBOL_DELETED (sym) = false;
SYMBOL_PENDING_EXPANSIONS (sym) = 0;
SYMBOL_NEXT (sym) = *spp;
- (*spp) = sym;
+ sym->stack = NULL;
+ *spp = sym;
if (mode == SYMBOL_PUSHDEF && cmp == 0)
{
- SYMBOL_SHADOWED (SYMBOL_NEXT (sym)) = true;
- SYMBOL_TRACED (sym) = SYMBOL_TRACED (SYMBOL_NEXT (sym));
+ sym->stack = sym->next;
+ sym->next = sym->stack->next;
+ sym->stack->next = NULL;
+ SYMBOL_TRACED (sym) = SYMBOL_TRACED (sym->stack);
}
return sym;
@@ -270,24 +273,30 @@ lookup_symbol (const char *name, size_t len,
symbol_lookup mode)
return NULL;
{
bool traced = false;
- symbol *result = sym;
- if (SYMBOL_NEXT (sym) != NULL
- && SYMBOL_SHADOWED (SYMBOL_NEXT (sym))
- && mode == SYMBOL_POPDEF)
+ symbol *result = sym;
+ if (sym->stack && mode == SYMBOL_POPDEF)
{
- SYMBOL_SHADOWED (SYMBOL_NEXT (sym)) = false;
- SYMBOL_TRACED (SYMBOL_NEXT (sym)) = SYMBOL_TRACED (sym);
+ SYMBOL_TRACED (sym->stack) = SYMBOL_TRACED (sym);
+ sym->stack->next = sym->next;
+ *spp = sym->stack;
+ sym->next = NULL;
+ sym->stack = NULL;
+ free_symbol (sym);
}
else
- traced = SYMBOL_TRACED (sym);
- do
{
- *spp = SYMBOL_NEXT (sym);
- free_symbol (sym);
- sym = *spp;
+ traced = SYMBOL_TRACED (sym);
+ *spp = sym->next;
+ do
+ {
+ symbol *old = sym;
+ sym = sym->stack;
+ old->next = NULL;
+ old->stack = NULL;
+ free_symbol (old);
+ }
+ while (sym);
}
- while (*spp != NULL && SYMBOL_SHADOWED (*spp)
- && mode == SYMBOL_DELETE);
if (traced)
{
sym = (symbol *) xmalloc (sizeof (symbol));
@@ -295,16 +304,16 @@ lookup_symbol (const char *name, size_t len,
symbol_lookup mode)
SYMBOL_TRACED (sym) = true;
SYMBOL_NAME (sym) = xmemdup0 (name, len);
SYMBOL_NAME_LEN (sym) = len;
- SYMBOL_SHADOWED (sym) = false;
SYMBOL_MACRO_ARGS (sym) = false;
SYMBOL_BLIND_NO_ARGS (sym) = false;
SYMBOL_DELETED (sym) = false;
SYMBOL_PENDING_EXPANSIONS (sym) = 0;
SYMBOL_NEXT (sym) = *spp;
- (*spp) = sym;
+ sym->stack = NULL;
+ *spp = sym;
}
- return result;
+ return result;
}
default:
@@ -389,19 +398,27 @@ static void
symtab_print_list (int i)
{
symbol *sym;
+ symbol *stack;
size_t h;
xprintf ("Symbol dump #%d:\n", i);
for (h = 0; h < hash_table_size; h++)
for (sym = symtab[h]; sym != NULL; sym = sym->next)
- xprintf ("\tname %s, len %zu, bucket %lu, addr %p, next %p, "
- "flags%s%s%s, pending %d\n",
- SYMBOL_NAME (sym), SYMBOL_NAME_LEN (sym),
- (unsigned long int) h, sym, SYMBOL_NEXT (sym),
- SYMBOL_TRACED (sym) ? " traced" : "",
- SYMBOL_SHADOWED (sym) ? " shadowed" : "",
- SYMBOL_DELETED (sym) ? " deleted" : "",
- SYMBOL_PENDING_EXPANSIONS (sym));
+ {
+ stack = sym;
+ do
+ {
+ xprintf ("\tname %s, len %zu, bucket %lu, addr %p, next %p, "
+ "stack %p, flags%s%s, pending %d\n",
+ SYMBOL_NAME (stack), SYMBOL_NAME_LEN (stack),
+ (unsigned long int) h, stack, SYMBOL_NEXT (stack),
+ stack->stack, SYMBOL_TRACED (stack) ? " traced" : "",
+ SYMBOL_DELETED (stack) ? " deleted" : "",
+ SYMBOL_PENDING_EXPANSIONS (stack));
+ stack = stack->stack;
+ }
+ while (stack);
+ }
}
#endif /* DEBUG_SYM */
--
1.5.6.4
>From 38a274a6e3c1e13e2d9d1c803d64b1a5414ff603 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 22 Jul 2008 17:18:10 -0600
Subject: [PATCH] Make symbol table opaque.
* src/m4.h (symtab, hash_table_size): No longer export.
(symtab_init): Add parameter.
* src/symtab.c (symtab): Make static, so we can later change
implementations.
(hash_table_size): New private size variable.
(symtab_init): Take size hint, rather than reading a global.
* src/m4.c (hash_table_size): Delete.
(main): Track -H as local variable instead.
* src/freeze.c (dump_symbol_CB): New function, extracted from...
(produce_frozen_state): ...here, to use hack_all_symbols rather
than raw traversal.
Signed-off-by: Eric Blake <address@hidden>
---
ChangeLog | 13 +++++++
src/freeze.c | 115 ++++++++++++++++++++++++++++++----------------------------
src/m4.c | 7 ++--
src/m4.h | 7 +---
src/symtab.c | 17 ++++-----
5 files changed, 86 insertions(+), 73 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index dda5fc0..ed2acce 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,18 @@
2008-07-22 Eric Blake <address@hidden>
+ Make symbol table opaque.
+ * src/m4.h (symtab, hash_table_size): No longer export.
+ (symtab_init): Add parameter.
+ * src/symtab.c (symtab): Make static, so we can later change
+ implementations.
+ (hash_table_size): New private size variable.
+ (symtab_init): Take size hint, rather than reading a global.
+ * src/m4.c (hash_table_size): Delete.
+ (main): Track -H as local variable instead.
+ * src/freeze.c (dump_symbol_CB): New function, extracted from...
+ (produce_frozen_state): ...here, to use hack_all_symbols rather
+ than raw traversal.
+
Backport faster hash lookups of pushdef stack collisions.
* src/m4.h (struct symbol): Replace shadowed bit by stack field.
(SYMBOL_SHADOWED): Delete.
diff --git a/src/freeze.c b/src/freeze.c
index 4d0eeb4..3a48b03 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -44,6 +44,65 @@ reverse_symbol_list (symbol *sym)
return result;
}
+/*-------------------------------------------------------------------.
+| Dump a stack of pushdef references to the stream F. Designed as a |
+| callback for hack_all_symbols. |
+`-------------------------------------------------------------------*/
+
+static void
+dump_symbol_CB (symbol *sym, void *f)
+{
+ FILE *file = (FILE *) f;
+ symbol *stack;
+ const builtin *bp;
+
+ /* Process all entries in each stack from the last to the first.
+ This order ensures that, at reload time, pushdef's will be
+ executed with the oldest definitions first. */
+ sym = stack = reverse_symbol_list (sym);
+ while (sym)
+ {
+ switch (SYMBOL_TYPE (sym))
+ {
+ case TOKEN_TEXT:
+ xfprintf (file, "T%d,%d\n",
+ (int) SYMBOL_NAME_LEN (sym),
+ (int) strlen (SYMBOL_TEXT (sym)));
+ fwrite (SYMBOL_NAME (sym), 1, SYMBOL_NAME_LEN (sym), file);
+ fputs (SYMBOL_TEXT (sym), file);
+ fputc ('\n', file);
+ break;
+
+ case TOKEN_FUNC:
+ bp = find_builtin_by_addr (SYMBOL_FUNC (sym));
+ if (bp == NULL)
+ {
+ assert (!"dump_symbol_CB");
+ abort ();
+ }
+ xfprintf (file, "F%d,%d\n",
+ (int) SYMBOL_NAME_LEN (sym),
+ (int) strlen (bp->name));
+ fwrite (SYMBOL_NAME (sym), 1, SYMBOL_NAME_LEN (sym), file);
+ fputs (bp->name, file);
+ fputc ('\n', file);
+ break;
+
+ case TOKEN_VOID:
+ /* Ignore placeholder tokens that exist due to traceon. */
+ break;
+
+ default:
+ assert (!"dump_symbol_CB");
+ abort ();
+ break;
+ }
+ sym = sym->stack;
+ }
+ /* Reverse the stack once more, putting it back as it was. */
+ reverse_symbol_list (stack);
+}
+
/*------------------------------------------------.
| Produce a frozen state to the given file NAME. |
`------------------------------------------------*/
@@ -52,10 +111,6 @@ void
produce_frozen_state (const char *name)
{
FILE *file;
- int h;
- symbol *stack;
- symbol *sym;
- const builtin *bp;
file = fopen (name, O_BINARY ? "wb" : "w");
if (!file)
@@ -96,57 +151,7 @@ produce_frozen_state (const char *name)
/* Dump all symbols. */
- for (stack = symtab[h = 0]; h < hash_table_size;
- stack = (stack ? SYMBOL_NEXT (stack) : symtab[h++]))
- {
- if (!stack)
- continue;
- /* Process all entries in each stack from the last to the first.
- This order ensures that, at reload time, pushdef's will be
- executed with the oldest definitions first. */
- sym = stack = reverse_symbol_list (stack);
- while (sym)
- {
- switch (SYMBOL_TYPE (sym))
- {
- case TOKEN_TEXT:
- xfprintf (file, "T%d,%d\n",
- (int) SYMBOL_NAME_LEN (sym),
- (int) strlen (SYMBOL_TEXT (sym)));
- fwrite (SYMBOL_NAME (sym), 1, SYMBOL_NAME_LEN (sym), file);
- fputs (SYMBOL_TEXT (sym), file);
- fputc ('\n', file);
- break;
-
- case TOKEN_FUNC:
- bp = find_builtin_by_addr (SYMBOL_FUNC (sym));
- if (bp == NULL)
- {
- assert (!"produce_frozen_state");
- abort ();
- }
- xfprintf (file, "F%d,%d\n",
- (int) SYMBOL_NAME_LEN (sym),
- (int) strlen (bp->name));
- fwrite (SYMBOL_NAME (sym), 1, SYMBOL_NAME_LEN (sym), file);
- fputs (bp->name, file);
- fputc ('\n', file);
- break;
-
- case TOKEN_VOID:
- /* Ignore placeholder tokens that exist due to traceon. */
- break;
-
- default:
- assert (!"produce_frozen_state");
- abort ();
- break;
- }
- sym = sym->stack;
- }
- /* Reverse the stack once more, putting it back as it was. */
- stack = reverse_symbol_list (stack);
- }
+ hack_all_symbols (dump_symbol_CB, file);
/* Let diversions be issued from output.c module, its cleaner to have this
piece of code there. */
diff --git a/src/m4.c b/src/m4.c
index c73e275..7f0af7e 100644
--- a/src/m4.c
+++ b/src/m4.c
@@ -38,9 +38,6 @@ int sync_output = 0;
/* Debug (-d[flags]). */
int debug_level = 0;
-/* Hash table size (should be a prime) (-Hsize). */
-size_t hash_table_size = HASHMAX;
-
/* Disable GNU extensions (-G). */
int no_gnu_extensions = 0;
@@ -412,6 +409,8 @@ main (int argc, char *const *argv, char *const *envp)
const char *frozen_file_to_read = NULL;
const char *frozen_file_to_write = NULL;
const char *macro_sequence = "";
+ /* Hash table size (should be a prime) (-Hsize). */
+ size_t hash_table_size = HASHMAX;
set_program_name (argv[0]);
retcode = EXIT_SUCCESS;
@@ -592,7 +591,7 @@ main (int argc, char *const *argv, char *const *envp)
input_init ();
output_init ();
- symtab_init ();
+ symtab_init (hash_table_size);
set_macro_sequence (macro_sequence);
include_env_init ();
diff --git a/src/m4.h b/src/m4.h
index dcb4613..011439e 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -127,7 +127,6 @@ typedef unsigned int bool_bitfield;
/* Option flags. */
extern int sync_output; /* -s */
extern int debug_level; /* -d */
-extern size_t hash_table_size; /* -H */
extern int no_gnu_extensions; /* -G */
extern int prefix_all_builtins; /* -P */
extern size_t max_debug_argument_length;/* -l */
@@ -448,10 +447,8 @@ typedef void hack_symbol (symbol *, void *);
#define HASHMAX 509 /* default, overridden by -Hsize */
-extern symbol **symtab;
-
-void free_symbol (symbol *sym);
-void symtab_init (void);
+void free_symbol (symbol *);
+void symtab_init (size_t);
symbol *lookup_symbol (const char *, size_t, symbol_lookup);
void hack_all_symbols (hack_symbol *, void *);
diff --git a/src/symtab.c b/src/symtab.c
index 381d025..7420e50 100644
--- a/src/symtab.c
+++ b/src/symtab.c
@@ -93,18 +93,17 @@ profile_memcmp (const char *s1, const char *s2, size_t l)
`----------------------------------------------------------------------*/
/* Pointer to symbol table. */
-symbol **symtab;
+static symbol **symtab;
+
+/* Number of buckets in symbol table. */
+static size_t hash_table_size;
void
-symtab_init (void)
+symtab_init (size_t size)
{
- size_t i;
- symbol **s;
-
- s = symtab = (symbol **) xnmalloc (hash_table_size, sizeof (symbol *));
-
- for (i = 0; i < hash_table_size; i++)
- s[i] = NULL;
+ hash_table_size = size;
+ symtab = (symbol **) xnmalloc (hash_table_size, sizeof *symtab);
+ memset (symtab, 0, hash_table_size * sizeof *symtab);
#ifdef DEBUG_SYM
atexit (show_profile); /* Ignore failure, since this is debug code. */
--
1.5.6.4
>From ffaa885fc0efb39bf0ca0df0a592a1c39f92729c Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sat, 26 Jul 2008 13:27:02 -0600
Subject: [PATCH] Use hash module for self-growing symbol table.
* m4/gnulib-cache.m4: Import hash module.
* src/m4.h (struct symbol): Remove next member, change stack to be
circular.
(SYMBOL_NEXT): Delete.
(symtab_free): New prototype.
* src/symtab.c (show_profile) [DEBUG_SYM]: Track more hash
statistics, and dump to /dev/tty rather than stderr.
(symtab): Change type.
(hash_table_size): Delete.
(symtab_hasher, symtab_comparator, symtab_free_entry): New
functions.
(symtab_init, lookup_symbol, hack_all_symbols): Rewrite to wrap
external hash table.
(symtab_free): New function.
(symtab_debug) [DEBUG_SYM]: Adjust client.
* src/m4.c (main): Call symbol table cleanup.
* src/freeze.c (dump_symbol_CB, reverse_symbol_list): Deal with
fact that pushdef stack is now circular.
Signed-off-by: Eric Blake <address@hidden>
---
ChangeLog | 22 ++++
m4/gnulib-cache.m4 | 3 +-
src/freeze.c | 25 ++++--
src/m4.c | 5 +
src/m4.h | 5 +-
src/symtab.c | 271 +++++++++++++++++++++++++++++++--------------------
6 files changed, 213 insertions(+), 118 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index ed2acce..de44226 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,25 @@
+2008-07-26 Eric Blake <address@hidden>
+
+ Use hash module for self-growing symbol table.
+ * m4/gnulib-cache.m4: Import hash module.
+ * src/m4.h (struct symbol): Remove next member, change stack to be
+ circular.
+ (SYMBOL_NEXT): Delete.
+ (symtab_free): New prototype.
+ * src/symtab.c (show_profile) [DEBUG_SYM]: Track more hash
+ statistics, and dump to /dev/tty rather than stderr.
+ (symtab): Change type.
+ (hash_table_size): Delete.
+ (symtab_hasher, symtab_comparator, symtab_free_entry): New
+ functions.
+ (symtab_init, lookup_symbol, hack_all_symbols): Rewrite to wrap
+ external hash table.
+ (symtab_free): New function.
+ (symtab_debug) [DEBUG_SYM]: Adjust client.
+ * src/m4.c (main): Call symbol table cleanup.
+ * src/freeze.c (dump_symbol_CB, reverse_symbol_list): Deal with
+ fact that pushdef stack is now circular.
+
2008-07-22 Eric Blake <address@hidden>
Make symbol table opaque.
diff --git a/m4/gnulib-cache.m4 b/m4/gnulib-cache.m4
index f716908..a512830 100644
--- a/m4/gnulib-cache.m4
+++ b/m4/gnulib-cache.m4
@@ -15,7 +15,7 @@
# Specification in the form of a command-line invocation:
-# gnulib-tool --import --dir=. --local-dir=local --lib=libm4
--source-base=lib --m4-base=m4 --doc-base=doc --aux-dir=build-aux --with-tests
--no-libtool --macro-prefix=M4 announce-gen assert autobuild avltree-oset
binary-io clean-temp cloexec close-stream closein config-h error fdl fflush
flexmember fopen-safer fseeko gendocs getopt git-version-gen gnumakefile
gnupload gpl-3.0 intprops memmem mkstemp obstack obstack-printf-posix progname
quote regex stdbool stdint stdlib-safer strtod strtol unlocked-io
vasnprintf-posix verror version-etc version-etc-fsf xalloc xmemdup0 xprintf
xvasprintf-posix
+# gnulib-tool --import --dir=. --local-dir=local --lib=libm4
--source-base=lib --m4-base=m4 --doc-base=doc --aux-dir=build-aux --with-tests
--no-libtool --macro-prefix=M4 announce-gen assert autobuild avltree-oset
binary-io clean-temp cloexec close-stream closein config-h error fdl fflush
flexmember fopen-safer fseeko gendocs getopt git-version-gen gnumakefile
gnupload gpl-3.0 hash intprops memmem mkstemp obstack obstack-printf-posix
progname quote regex stdbool stdint stdlib-safer strtod strtol unlocked-io
vasnprintf-posix verror version-etc version-etc-fsf xalloc xmemdup0 xprintf
xvasprintf-posix
# Specification in the form of a few gnulib-tool.m4 macro invocations:
gl_LOCAL_DIR([local])
@@ -42,6 +42,7 @@ gl_MODULES([
gnumakefile
gnupload
gpl-3.0
+ hash
intprops
memmem
mkstemp
diff --git a/src/freeze.c b/src/freeze.c
index 3a48b03..2a7d9dc 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -30,17 +30,24 @@
static symbol *
reverse_symbol_list (symbol *sym)
{
- symbol *result;
+ symbol *first = sym;
symbol *next;
+ symbol *prev = sym;
+ symbol *result;
- result = NULL;
- while (sym)
+ assert (sym);
+ if (sym->stack == sym)
+ return sym;
+ next = sym->stack;
+ do
{
- next = sym->stack;
- sym->stack = result;
- result = sym;
+ result = prev;
sym = next;
+ next = sym->stack;
+ sym->stack = prev;
+ prev = sym;
}
+ while (prev != first);
return result;
}
@@ -59,8 +66,9 @@ dump_symbol_CB (symbol *sym, void *f)
/* Process all entries in each stack from the last to the first.
This order ensures that, at reload time, pushdef's will be
executed with the oldest definitions first. */
- sym = stack = reverse_symbol_list (sym);
- while (sym)
+ stack = sym;
+ sym = reverse_symbol_list (sym);
+ do
{
switch (SYMBOL_TYPE (sym))
{
@@ -99,6 +107,7 @@ dump_symbol_CB (symbol *sym, void *f)
}
sym = sym->stack;
}
+ while (sym != stack->stack);
/* Reverse the stack once more, putting it back as it was. */
reverse_symbol_list (stack);
}
diff --git a/src/m4.c b/src/m4.c
index 7f0af7e..551d80c 100644
--- a/src/m4.c
+++ b/src/m4.c
@@ -683,8 +683,13 @@ main (int argc, char *const *argv, char *const *envp)
undivert_all ();
}
output_exit ();
+#ifndef NDEBUG
+ /* Only spend time freeing memory to help isolate leaks; if
+ assertions are disabled, save the time and exit now. */
free_regex ();
quotearg_free ();
+ symtab_free ();
+#endif /* NDEBUG */
#ifdef DEBUG_REGEX
if (trace_file)
fclose (trace_file);
diff --git a/src/m4.h b/src/m4.h
index 011439e..ff0377a 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -416,8 +416,7 @@ enum symbol_lookup
/* Symbol table entry. */
struct symbol
{
- struct symbol *next; /* Next symbol with the same hash. */
- struct symbol *stack; /* Stack of shadowed symbols of the same name. */
+ struct symbol *stack; /* Circular list for pushdef stack of symbol. */
bool_bitfield traced : 1;
bool_bitfield macro_args : 1;
bool_bitfield blind_no_args : 1;
@@ -429,7 +428,6 @@ struct symbol
token_data data; /* Type should be only TOKEN_TEXT or TOKEN_FUNC. */
};
-#define SYMBOL_NEXT(S) ((S)->next)
#define SYMBOL_TRACED(S) ((S)->traced)
#define SYMBOL_MACRO_ARGS(S) ((S)->macro_args)
#define SYMBOL_BLIND_NO_ARGS(S) ((S)->blind_no_args)
@@ -449,6 +447,7 @@ typedef void hack_symbol (symbol *, void *);
void free_symbol (symbol *);
void symtab_init (size_t);
+void symtab_free (void);
symbol *lookup_symbol (const char *, size_t, symbol_lookup);
void hack_all_symbols (hack_symbol *, void *);
diff --git a/src/symtab.c b/src/symtab.c
index 7420e50..a9160c8 100644
--- a/src/symtab.c
+++ b/src/symtab.c
@@ -33,6 +33,8 @@
#include "m4.h"
+#include "hash.h"
+
#ifdef DEBUG_SYM
/* When evaluating hash table performance, this profiling code shows
how many collisions were encountered. */
@@ -47,19 +49,28 @@ struct profile
static struct profile profiles[5];
static symbol_lookup current_mode;
+static unsigned long long hash_entry;
+static unsigned long long comparator_entry;
+static size_t current_size;
+static unsigned int resizes;
/* On exit, show a profile of symbol table performance. */
static void
show_profile (void)
{
int i;
+ FILE *f = fopen ("/dev/tty", "w");
for (i = 0; i < 5; i++)
{
- xfprintf(stderr, "m4: lookup mode %d called %d times, %d compares, "
+ xfprintf(f, "m4: lookup mode %d called %d times, %d compares, "
"%d misses, %lld bytes\n",
i, profiles[i].entry, profiles[i].comparisons,
profiles[i].misses, profiles[i].bytes);
}
+ xfprintf(f, "m4: %llu hash callbacks, %llu compare callbacks, "
+ "%zu buckets, %u resizes\n",
+ hash_entry, comparator_entry, current_size, resizes - 1);
+ fclose (f);
}
/* Like memcmp (S1, S2, L), but also track profiling statistics. */
@@ -87,33 +98,12 @@ profile_memcmp (const char *s1, const char *s2, size_t l)
#endif /* DEBUG_SYM */
-/*----------------------------------------------------------------------.
-| Initialise the symbol table, by allocating the necessary storage, and |
-| zeroing all the entries. |
-`----------------------------------------------------------------------*/
-
/* Pointer to symbol table. */
-static symbol **symtab;
-
-/* Number of buckets in symbol table. */
-static size_t hash_table_size;
-
-void
-symtab_init (size_t size)
-{
- hash_table_size = size;
- symtab = (symbol **) xnmalloc (hash_table_size, sizeof *symtab);
- memset (symtab, 0, hash_table_size * sizeof *symtab);
-
-#ifdef DEBUG_SYM
- atexit (show_profile); /* Ignore failure, since this is debug code. */
-#endif /* DEBUG_SYM */
-}
+static Hash_table *symtab;
/*--------------------------------------------------.
| Return a hashvalue for a string S of length LEN. |
`--------------------------------------------------*/
-
static size_t
hash (const char *s, size_t len)
{
@@ -126,6 +116,82 @@ hash (const char *s, size_t len)
return val;
}
+/*----------------------------------------------------.
+| Wrap our hash inside signature expected by hash.h. |
+`----------------------------------------------------*/
+static size_t
+symtab_hasher (const void *entry, size_t buckets)
+{
+#ifdef DEBUG_SYM
+ hash_entry++;
+ if (buckets != current_size)
+ {
+ resizes++;
+ current_size = buckets;
+ }
+#endif /* DEBUG_SYM */
+ const symbol *sym = (const symbol *) entry;
+ return hash (SYMBOL_NAME (sym), SYMBOL_NAME_LEN (sym)) % buckets;
+}
+
+/*----------------------------------------------.
+| Compare two hash table entries for equality. |
+`----------------------------------------------*/
+static bool
+symtab_comparator (const void *entry_a, const void *entry_b)
+{
+#ifdef DEBUG_SYM
+ comparator_entry++;
+#endif /* DEBUG_SYM */
+ const symbol *sym_a = (const symbol *) entry_a;
+ const symbol *sym_b = (const symbol *) entry_b;
+ return (SYMBOL_NAME_LEN (sym_a) == SYMBOL_NAME_LEN (sym_b)
+ && memcmp (SYMBOL_NAME (sym_a), SYMBOL_NAME (sym_b),
+ SYMBOL_NAME_LEN (sym_a)) == 0);
+}
+
+/*---------------------------.
+| Reclaim an entry on exit. |
+`---------------------------*/
+static void
+symtab_free_entry (void *entry)
+{
+ symbol *sym = (symbol *) entry;
+ while (sym->stack != sym)
+ {
+ symbol *old = sym->stack;
+ sym->stack = old->stack;
+ assert (!SYMBOL_PENDING_EXPANSIONS (old));
+ free_symbol (old);
+ }
+ assert (!SYMBOL_PENDING_EXPANSIONS (sym));
+ free_symbol (sym);
+}
+
+/*--------------------------------------------------------------.
+| Initialize the symbol table, with SIZE as a hint for expected |
+| number of entries. |
+`--------------------------------------------------------------*/
+void
+symtab_init (size_t size)
+{
+ symtab = hash_initialize (size, NULL, symtab_hasher, symtab_comparator,
+ symtab_free_entry);
+
+#ifdef DEBUG_SYM
+ atexit (show_profile); /* Ignore failure, since this is debug code. */
+#endif /* DEBUG_SYM */
+}
+
+/*------------------------.
+| Clean up entire table. |
+`------------------------*/
+void
+symtab_free (void)
+{
+ hash_free (symtab);
+}
+
/*--------------------------------------------.
| Free all storage associated with a symbol. |
`--------------------------------------------*/
@@ -162,38 +228,23 @@ free_symbol (symbol *sym)
symbol *
lookup_symbol (const char *name, size_t len, symbol_lookup mode)
{
- size_t h;
- int cmp = 1;
- symbol *sym, *prev;
- symbol **spp;
+ symbol *sym;
+ symbol *entry;
+ symbol tmp;
#if DEBUG_SYM
current_mode = mode;
profiles[mode].entry++;
#endif /* DEBUG_SYM */
- h = hash (name, len);
- sym = symtab[h % hash_table_size];
-
- for (prev = NULL; sym != NULL; prev = sym, sym = sym->next)
- {
- cmp = (len < SYMBOL_NAME_LEN (sym) ? -1 : len > SYMBOL_NAME_LEN (sym) ? 1
- : memcmp (SYMBOL_NAME (sym), name, len));
- if (cmp >= 0)
- break;
- }
-
- /* If just searching, return status of search. */
-
- if (mode == SYMBOL_LOOKUP)
- return cmp == 0 ? sym : NULL;
-
- /* Symbol not found. */
-
- spp = (prev != NULL) ? &prev->next : &symtab[h % hash_table_size];
+ tmp.name = (char *) name;
+ tmp.len = len;
+ entry = (symbol *) hash_lookup (symtab, &tmp);
switch (mode)
{
+ case SYMBOL_LOOKUP:
+ return entry ? entry->stack : NULL;
case SYMBOL_INSERT:
@@ -202,14 +253,15 @@ lookup_symbol (const char *name, size_t len,
symbol_lookup mode)
a new one; if not, just return the symbol. If not found, just
insert the name, and return the new symbol. */
- if (cmp == 0 && sym != NULL)
+ if (entry)
{
+ sym = entry->stack;
if (SYMBOL_PENDING_EXPANSIONS (sym) > 0)
{
symbol *old = sym;
SYMBOL_DELETED (old) = true;
- sym = (symbol *) xmalloc (sizeof (symbol));
+ sym = (symbol *) xmalloc (sizeof *sym);
SYMBOL_TYPE (sym) = TOKEN_VOID;
SYMBOL_TRACED (sym) = SYMBOL_TRACED (old);
SYMBOL_NAME (sym) = xmemdup0 (name, len);
@@ -219,11 +271,20 @@ lookup_symbol (const char *name, size_t len,
symbol_lookup mode)
SYMBOL_DELETED (sym) = false;
SYMBOL_PENDING_EXPANSIONS (sym) = 0;
- SYMBOL_NEXT (sym) = SYMBOL_NEXT (old);
- SYMBOL_NEXT (old) = NULL;
- sym->stack = old->stack;
+ if (old == entry)
+ {
+ old = (symbol *) hash_delete (symtab, entry);
+ assert (entry == old);
+ sym->stack = sym;
+ entry = (symbol *) hash_insert (symtab, sym);
+ assert (sym == entry);
+ }
+ else
+ {
+ entry->stack = sym;
+ sym->stack = old->stack;
+ }
old->stack = NULL;
- *spp = sym;
}
return sym;
}
@@ -231,11 +292,13 @@ lookup_symbol (const char *name, size_t len,
symbol_lookup mode)
case SYMBOL_PUSHDEF:
- /* Insert a name in the symbol table. If there is already a symbol
- with the name, insert this in front of it, and mark the old
- symbol as "shadowed". */
-
- sym = (symbol *) xmalloc (sizeof (symbol));
+ /* Insert a name in the symbol table. If there is already a
+ symbol with the name, add it to the pushdef stack. Since the
+ hash table does not allow the insertion of duplicates, the
+ pushdef stack is a circular chain; the hash entry is the
+ oldest entry, which points to the newest entry; all other
+ entries point to the next older entry. */
+ sym = (symbol *) xmalloc (sizeof *sym);
SYMBOL_TYPE (sym) = TOKEN_VOID;
SYMBOL_TRACED (sym) = false;
SYMBOL_NAME (sym) = xmemdup0 (name, len);
@@ -245,17 +308,19 @@ lookup_symbol (const char *name, size_t len,
symbol_lookup mode)
SYMBOL_DELETED (sym) = false;
SYMBOL_PENDING_EXPANSIONS (sym) = 0;
- SYMBOL_NEXT (sym) = *spp;
- sym->stack = NULL;
- *spp = sym;
-
- if (mode == SYMBOL_PUSHDEF && cmp == 0)
+ if (entry)
{
- sym->stack = sym->next;
- sym->next = sym->stack->next;
- sym->stack->next = NULL;
+ assert (mode == SYMBOL_PUSHDEF);
+ sym->stack = entry->stack;
+ entry->stack = sym;
SYMBOL_TRACED (sym) = SYMBOL_TRACED (sym->stack);
}
+ else
+ {
+ sym->stack = sym;
+ entry = (symbol *) hash_insert (symtab, sym);
+ assert (sym == entry);
+ }
return sym;
case SYMBOL_DELETE:
@@ -268,37 +333,36 @@ lookup_symbol (const char *name, size_t len,
symbol_lookup mode)
definition is still in use, let the caller free the memory
after it is done with the symbol. */
- if (cmp != 0 || sym == NULL)
+ if (!entry)
return NULL;
{
bool traced = false;
- symbol *result = sym;
- if (sym->stack && mode == SYMBOL_POPDEF)
+ symbol *result = sym = entry->stack;
+ if (sym != entry && mode == SYMBOL_POPDEF)
{
SYMBOL_TRACED (sym->stack) = SYMBOL_TRACED (sym);
- sym->stack->next = sym->next;
- *spp = sym->stack;
- sym->next = NULL;
+ entry->stack = sym->stack;
sym->stack = NULL;
free_symbol (sym);
}
else
{
traced = SYMBOL_TRACED (sym);
- *spp = sym->next;
- do
+ while (sym != entry)
{
symbol *old = sym;
sym = sym->stack;
- old->next = NULL;
old->stack = NULL;
free_symbol (old);
}
- while (sym);
+ sym = (symbol *) hash_delete (symtab, entry);
+ assert (sym == entry);
+ sym->stack = NULL;
+ free_symbol (sym);
}
if (traced)
{
- sym = (symbol *) xmalloc (sizeof (symbol));
+ sym = (symbol *) xmalloc (sizeof *sym);
SYMBOL_TYPE (sym) = TOKEN_VOID;
SYMBOL_TRACED (sym) = true;
SYMBOL_NAME (sym) = xmemdup0 (name, len);
@@ -308,9 +372,9 @@ lookup_symbol (const char *name, size_t len, symbol_lookup
mode)
SYMBOL_DELETED (sym) = false;
SYMBOL_PENDING_EXPANSIONS (sym) = 0;
- SYMBOL_NEXT (sym) = *spp;
- sym->stack = NULL;
- *spp = sym;
+ sym->stack = sym;
+ entry = (symbol *) hash_insert (symtab, sym);
+ assert (sym == entry);
}
return result;
}
@@ -335,20 +399,17 @@ lookup_symbol (const char *name, size_t len,
symbol_lookup mode)
void
hack_all_symbols (hack_symbol *func, void *data)
{
- size_t h;
- symbol *sym;
+ symbol *sym = (symbol *) hash_get_first (symtab);
symbol *next;
- for (h = 0; h < hash_table_size; h++)
+ while (sym)
{
/* We allow func to call SYMBOL_POPDEF, which can invalidate
sym, so we must grab the next element to traverse before
calling func. */
- for (sym = symtab[h]; sym != NULL; sym = next)
- {
- next = SYMBOL_NEXT (sym);
- func (sym, data);
- }
+ next = (symbol *) hash_get_next (symtab, sym);
+ func (sym->stack, data);
+ sym = next;
}
}
@@ -396,28 +457,26 @@ symtab_debug (void)
static void
symtab_print_list (int i)
{
- symbol *sym;
+ symbol *sym = (symbol *) hash_get_first (symtab);
symbol *stack;
- size_t h;
xprintf ("Symbol dump #%d:\n", i);
- for (h = 0; h < hash_table_size; h++)
- for (sym = symtab[h]; sym != NULL; sym = sym->next)
- {
- stack = sym;
- do
- {
- xprintf ("\tname %s, len %zu, bucket %lu, addr %p, next %p, "
- "stack %p, flags%s%s, pending %d\n",
- SYMBOL_NAME (stack), SYMBOL_NAME_LEN (stack),
- (unsigned long int) h, stack, SYMBOL_NEXT (stack),
- stack->stack, SYMBOL_TRACED (stack) ? " traced" : "",
- SYMBOL_DELETED (stack) ? " deleted" : "",
- SYMBOL_PENDING_EXPANSIONS (stack));
- stack = stack->stack;
- }
- while (stack);
- }
+ while (sym)
+ {
+ stack = sym->stack;
+ do
+ {
+ xprintf ("\tname %s, len %zu, addr %p, "
+ "stack %p, flags%s%s, pending %d\n",
+ SYMBOL_NAME (stack), SYMBOL_NAME_LEN (stack),
+ stack, stack->stack, SYMBOL_TRACED (stack) ? " traced" : "",
+ SYMBOL_DELETED (stack) ? " deleted" : "",
+ SYMBOL_PENDING_EXPANSIONS (stack));
+ stack = stack->stack;
+ }
+ while (stack != sym);
+ sym = (symbol *) hash_get_next (symtab, sym);
+ }
}
#endif /* DEBUG_SYM */
--
1.5.6.4
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: linkedhash-list vs. hash,
Eric Blake <=