[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
How does gawk allocate memory for arrays?
From: |
Ed Morton |
Subject: |
How does gawk allocate memory for arrays? |
Date: |
Mon, 30 May 2022 08:54:23 -0500 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 |
A question came up recently that can be reduced to comparing the line
numbers associated with unique key values in 2 files that are over 10G
lines each. While unique, those values only need to be compared within
groups of 10 contiguous lines (in reality there's other things going on
irrelevant to this discussion).
So we had:
* Approach 1: while reading file1 create an array that's 10G of
entries then while reading file2 just do a hash lookup.
* Approach 2: while reading file1 for every 10 lines create an array
that holds those 10 entries, then getline 10 entries from file2 into
a second array of 10 entries, then loop through all the values in
the file1 array to compare to the file2 array then delete both
arrays and start over.
Obviously the 2nd approach was going to use far less memory but
according to the OP it was also an order of magnitude faster than the
1st approach so that got me wondering about how awk arrays are
allocated, e.g. is there a default size that gets allocated initially
and then new chunks of memory get allocated as needed? If so what is
that size? I expect I could find and read the code but I'm really
hoping to not have to do that and the design is documented somewhere or
someone can just tell me what it is.
Ed.
- How does gawk allocate memory for arrays?,
Ed Morton <=