[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__)
From: |
Eric Blake |
Subject: |
Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__) |
Date: |
Mon, 11 Aug 2008 20:38:36 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Ralf Wildenhues <Ralf.Wildenhues <at> gmx.de> writes:
>
> Hello again, and apologies for breaking the threading,
No problem. In truth, this is enough of an independent topic to be worth the
broken threading.
>
> I've done a wee bit of measuring now. Time for running autoconf in OpenMPI
> is 15s with branch-1_4 and branch-1.6, 27s with master, and 23s when master
> is configured --disable-shared.
Thanks for the stats.
>
> Then, a gprof comparison between 1.6 and master shows that a significant other
> part of the slowdown is due to the fact that master has to do an indirect
> function call to for every character in next_char. Can't the module interface
> use larger boundaries than character for its interface, like reading a whole
> token or so? I mean, we're talking about roughly 140M function calls here.
Sweet! Your measurements confirmed what I already suspected. And this means a
performance patch is already in the pipeline - the moment I port stage 29 from
the argv_ref branch (currently at [1], although that branch is still being
actively rewound at times as I rebase in various bug fixes), then the input
engine will be doing just that - reading blocks of data rather than bytes.
[1] http://git.savannah.gnu.org/gitweb/?p=m4.git;a=commitdiff;h=32c3fec7
>
> Then, I saw that debug stuff like m4_set_current_{file,line} was called veeery
> often (more than once per character). Rebuilding optimized with -DNDEBUG got
> master to 18s (with --disable-shared).
OK, something I will take a look at improving. The speed from -DNDEBUG comes
from avoiding the overhead of a function, thanks to inline accessor macros, but
avoiding changing the current line and file more than necessary seems like a
good idea. At any rate, './configure --disable-assert' is very much a
performance improvement, on all of the m4 branches.
>
> The gprof output files seem to indicate that next_char is called much more
> often m4__next_token in master than next_char_1 is from next_token in
> branch-1.6. However, gcov output does not confirm this, so I guess this is
> an artifact from finite sampling density (and the amount that next_char_1
> is faster) or inlining artifacts.
This doesn't surprise me: in branch-1.6, the macro next_char inlines the common
case of rereading from a string, avoiding a number of next_char_1 calls, but in
master, there is no inlining because all access is done through indirect
functions.
--
Eric Blake