[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Poor hashtable collision performance
From: |
Eric Blake |
Subject: |
Poor hashtable collision performance |
Date: |
Sat, 17 Apr 2021 07:38:54 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 |
Consider the following program I wrote to divide a 50,000 byte string
into a stack of one character per definition (this is supposed to be an
O(n log n) task, because it repeatedly cuts the input in half, and
should be more efficient than iterating on
substr(`$1',0,1)$0(substr(`$1',1) with its O(n^2) complexity):
$ cat tmp
define(`chew', `ifelse($1, 1, `pushdef(`stack', substr(`$2',0, 1))',
`$0(eval($1/2), substr(`$2', 0, eval($1/2))_)$0(eval($1 - $1/2),
substr(`$2', eval($1/2)))')')dnl
chew(50000, format(`%050000d', 0))dnl
$ time m4 tmp
real 0m22.264s
user 0m22.180s
sys 0m0.010s
$ time m4 -H517 tmp
real 0m0.321s
user 0m0.316s
sys 0m0.004s
What gives? It turns out that with the default hash size of 509,
'stack' and 'substr' hash to the same bucket, and as the pushdef depth
of stack increases, the lookup time for 'substr' gets progressively
worse. With -H517, the two names hash to different buckets and no
longer interfere. I'm working on a patch to separate the pushdef chain
from the hash bucket chain, so that hash collisions only have to visit
one instance of a colliding name, not the entire stack.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Poor hashtable collision performance,
Eric Blake <=