[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Performance improvement for large keysets
From: |
Bruno Haible |
Subject: |
Re: Performance improvement for large keysets |
Date: |
Thu, 30 Jan 2020 04:09:17 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-171-generic; KDE/5.18.0; x86_64; ; ) |
Hi Pavel,
> > Probably it can be optimized even more, by using a hash table in this place
> > (mapping an undetermined_chars array to the list of keywords that have this
> > same undetermined_chars array)...
>
> This will heavily depend on hash table implementation, as building and
> updating it might be more expensive. I did try to use std::unordered_set
> and it was too slow
Interesting... I would try the 'hash-map' implementation from gnulib.
By the way, which profiler would you recommend for CPU-profiling of a program
like gperf? I have a couple of old notes regarding profiling (below), but
can't really tell which one to start with.
Bruno
--- --- --- --- --- --- --- --- --- --- --- --- ---
Profilers with call-graph functionality
=======================================
See https://en.wikipedia.org/wiki/Call_graph#Free_software_call-graph_generators
Comparisons:
http://gernotklingler.com/blog/gprof-valgrind-gperftools-evaluation-tools-application-level-cpu-profiling-linux/
Profiling with perf
-------------------
Doc:
https://perf.wiki.kernel.org/index.php/Tutorial#Period_and_rate
http://www.brendangregg.com/perf.html
Works on: Linux with packages 'perf' and 'linux-tools-<version>' installed.
To get just the important methods:
# perf record -c 1000 src/wc -Lm < mbc.txt
# perf report
To get the call graph as well:
# perf record -c 1000 -a --call-graph fp src/wc -Lm < mbc.txt
# perf record -c 2000 -a --call-graph dwarf src/wc -Lm < mbc.txt
# perf report --call-graph --stdio
???
Profiling with valgrind
-----------------------
Doc:
http://valgrind.org/docs/manual/cl-manual.html
$ valgrind --tool=callgrind src/wc -m < mbc.txt
$ callgrind_annotate callgrind.out.10379
$ callgrind_annotate --tree=calling callgrind.out.10379
$ kcachegrind callgrind.out.10379 ; switch to callee map
Works on: Linux (sampling + call-tree), macOS (only sampling, not call-tree)
Profiling with gprof
--------------------
Compile and link with "-pg".
Visualization:
https://stackoverflow.com/questions/2439060/is-it-possible-to-get-a-graphical-representation-of-gprof-results
Works on Linux and other systems with libc_g. Not clang!
Profiling with gperftools
-------------------------
Doc: https://github.com/gperftools/gperftools
1) Link your executable with -lprofiler
2) Run your executable with the CPUPROFILE environment var set:
$ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
3) Run pprof to analyze the CPU usage
$ pprof --text <path/to/binary> /tmp/prof.out # -pg-like text output
$ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
On macOS: Cannot map addresses to symbols. => Unusable.