[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How does gawk allocate memory for arrays?
From: |
Andrew J. Schorr |
Subject: |
Re: How does gawk allocate memory for arrays? |
Date: |
Mon, 30 May 2022 23:02:49 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Mon, May 30, 2022 at 09:46:09PM -0500, Ed Morton wrote:
> That's what I suspected and I was asking out of curiosity if anyone know what
> that size was and if the expansion occurs in chunks that are the same size as
> the original or different sizes.
>From str_array.c:
/* grow_table --- grow a hash table */
static void
grow_table(NODE *symbol)
{
BUCKET **old, **new;
BUCKET *chain, *next;
int i, j;
unsigned long oldsize, newsize, k;
unsigned long hash1;
/*
* This is an array of primes. We grow the table by an order of
* magnitude each time (not just doubling) so that growing is a
* rare operation. We expect, on average, that it won't happen
* more than twice. The final size is also chosen to be small
* enough so that MS-DOG mallocs can handle it. When things are
* very large (> 8K), we just double more or less, instead of
* just jumping from 8K to 64K.
*/
static const unsigned long sizes[] = {
13, 127, 1021, 8191, 16381, 32749, 65497,
131101, 262147, 524309, 1048583, 2097169,
4194319, 8388617, 16777259, 33554467,
67108879, 134217757, 268435459, 536870923,
1073741827
};
And int_array.c:grow_int_table appears to use the same sizes.
However, cint_array.c appears to have a different approach for
arrays of consecutive integers:
/*
* To store 2^n integers, allocate top-level array of size n, elements
* of which are 1-Dimensional (leaf-array) of geometrically increasing
* size (power of 2).
*
* [0] --> [ 0 ]
* [1] --> [ 1 ]
* |2| --> [ 2 | 3 ]
* |3| --> [ 4 | 5 | 6 | 7 ]
* |.|
* |k| --> [ 2^(k - 1)| ... | 2^k - 1 ]
* ...
*
* For a given integer n (> 0), the leaf-array is at 1 + floor(log2(n)).
*
* The idea for the geometrically increasing array sizes is from:
* Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays.
* Bagwell, Phil (2002).
* http://infoscience.epfl.ch/record/64410/files/techlists.pdf
*
* Disadvantage:
* Worst case memory waste > 99% and will happen when each of the
* leaf arrays contains only a single element. Even with consecutive
* integers, memory waste can be as high as 50%.
*
* Solution: Hashed Array Trees (HATs).
*
*/
If the subscripts are record numbers, then I guess the "cint" array
type is the relevant one.
Regards,
Andy