[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gperf] shrinking the asso_values array
From: |
Bruno Haible |
Subject: |
Re: [bug-gperf] shrinking the asso_values array |
Date: |
Thu, 05 Sep 2019 02:13:18 +0200 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-159-generic; KDE/5.18.0; x86_64; ; ) |
Hi Mark,
> FOURTH SUGGESTION ( MEDIUM DIFFICULTY )
>
> I was wondering if it would be worthwhile to use an "if" statement
> and a shorter static array asso_values in the "hash" function in some
> cases. The reason would be that the array is 256 bytes and loading
> different parts of the array could use up to 4 cache lines. Loading
> caches lines is very expensive from the point of view of the CPU
> (sometimes 100 cycles) as opposed to an if statement (which is a lot less,
> but is blocking.
>
> I looked at permut2.gperf in the examples directory of the gperf
> distribution. I will talk about this and next suggestion via examples
> related to that file.
>
> In our case, the largest hash value is 5 and all but four entries in
> the array are 6.
> Recall that in the perumt2.gperf example, only the letters x, y and z are
> used.
>
> Instead of
>
> return len + asso_values[(unsigned char) str[1] + 1] +
> assoc_values[(unsigned char)[str[0]];
>
> in the hash function, there could be an optimized(?) version of the following
>
> unsigned int returned_value = 6;
> if ( len <= 2 && 'x' <= str[1] && str[1] <= 'z' && 'x' <= str[0] &&
> str[0] <= 'z' )
> {
> unsigned char first = (unsigned char) str[1] - 'x' + 1;
> unsigned char second = (unsigned char) str[0] - 'x';
> returned_value len + modified_asso_values[first] +
> modified_asso_values[second];
> }
> return returned_value;
On today's CPUs, with their long pipelines of prepared instructions, the
condition you are writing there would cost 5 conditional jumps. Or, when
we leave out the redundant 'len <= 2' test (already done by the caller)
and assume an optimizing compiler that optimizes
'x' <= str[1] && str[1] <= 'z'
to
(unsigned int)(str[1] - 'x') <= 'z' - 'x'
we still have 2 conditional jumps.
> If either letter is out of range, the array is not accessed at all.
Optimizing cache behaviour at the expense of conditional jumps is not
good. You need a reasonable balance between both.
Bruno