Interesting... I would try the 'hash-map' implementation from gnulib.
unordered_set is effectively hash-map impl, and it's pretty good. Moreover, I stored elements by pointers without copying them, yet, gperf was extremely slow, so slow that it couldn't even finish. Perhaps I had a bug, but it did seem like the code was OK.
For thousands of keys it needs to build hash map with thousands of keys, which might be the reason it was so slow.
By the way, which profiler would you recommend for CPU-profiling of a program
like gperf? I have a couple of old notes regarding profiling (below), but
can't really tell which one to start with.
I totally prefer Visual Studio, it comes with both sampling and instrumentation profilers. It easily shows what function take most times, and what lines in these functions consume cpu the most. For example, this is the heaviest function and line in gperf:
as you can see, "_collision_detector->set_bit(hashcode)" takes 23% of cpu in entire program.
If you have a mac, xcode also has a decent/usable profiler
Bruno
--- --- --- --- --- --- --- --- --- --- --- --- ---
Profilers with call-graph functionality
=======================================
See https://en.wikipedia.org/wiki/Call_graph#Free_software_call-graph_generators
Comparisons:
http://gernotklingler.com/blog/gprof-valgrind-gperftools-evaluation-tools-application-level-cpu-profiling-linux/
Profiling with perf
-------------------
Doc:
https://perf.wiki.kernel.org/index.php/Tutorial#Period_and_rate
http://www.brendangregg.com/perf.html
Works on: Linux with packages 'perf' and 'linux-tools-<version>' installed.
To get just the important methods:
# perf record -c 1000 src/wc -Lm < mbc.txt
# perf report
To get the call graph as well:
# perf record -c 1000 -a --call-graph fp src/wc -Lm < mbc.txt
# perf record -c 2000 -a --call-graph dwarf src/wc -Lm < mbc.txt
# perf report --call-graph --stdio
???
Profiling with valgrind
-----------------------
Doc:
http://valgrind.org/docs/manual/cl-manual.html
$ valgrind --tool=callgrind src/wc -m < mbc.txt
$ callgrind_annotate callgrind.out.10379
$ callgrind_annotate --tree=calling callgrind.out.10379
$ kcachegrind callgrind.out.10379 ; switch to callee map
Works on: Linux (sampling + call-tree), macOS (only sampling, not call-tree)
Profiling with gprof
--------------------
Compile and link with "-pg".
Visualization: https://stackoverflow.com/questions/2439060/is-it-possible-to-get-a-graphical-representation-of-gprof-results
Works on Linux and other systems with libc_g. Not clang!
Profiling with gperftools
-------------------------
Doc: https://github.com/gperftools/gperftools
1) Link your executable with -lprofiler
2) Run your executable with the CPUPROFILE environment var set:
$ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
3) Run pprof to analyze the CPU usage
$ pprof --text <path/to/binary> /tmp/prof.out # -pg-like text output
$ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
On macOS: Cannot map addresses to symbols. => Unusable.