Re: Performance improvement for large keysets

bug-gperf

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Performance improvement for large keysets

From:	Bruno Haible
Subject:	Re: Performance improvement for large keysets
Date:	Thu, 30 Jan 2020 04:09:17 +0100
User-agent:	KMail/5.1.3 (Linux/4.4.0-171-generic; KDE/5.18.0; x86_64; ; )

Hi Pavel,

> > Probably it can be optimized even more, by using a hash table in this place
> > (mapping an undetermined_chars array to the list of keywords that have this
> > same undetermined_chars array)...
> 
> This will heavily depend on hash table implementation, as building and
> updating it might be more expensive. I did try to use std::unordered_set
> and it was too slow

Interesting... I would try the 'hash-map' implementation from gnulib.

By the way, which profiler would you recommend for CPU-profiling of a program
like gperf? I have a couple of old notes regarding profiling (below), but
can't really tell which one to start with.

Bruno

---   ---   ---   ---   ---   ---   ---   ---   ---   ---   ---   ---   ---

Profilers with call-graph functionality
=======================================

See https://en.wikipedia.org/wiki/Call_graph#Free_software_call-graph_generators

Comparisons:
http://gernotklingler.com/blog/gprof-valgrind-gperftools-evaluation-tools-application-level-cpu-profiling-linux/


Profiling with perf
-------------------
Doc:
https://perf.wiki.kernel.org/index.php/Tutorial#Period_and_rate
http://www.brendangregg.com/perf.html

Works on: Linux with packages 'perf' and 'linux-tools-<version>' installed.

To get just the important methods:

# perf record -c 1000 src/wc -Lm < mbc.txt
# perf report

To get the call graph as well:

# perf record -c 1000 -a --call-graph fp src/wc -Lm < mbc.txt
# perf record -c 2000 -a --call-graph dwarf src/wc -Lm < mbc.txt
# perf report --call-graph --stdio
???


Profiling with valgrind
-----------------------
Doc:
http://valgrind.org/docs/manual/cl-manual.html

$ valgrind --tool=callgrind src/wc -m < mbc.txt
$ callgrind_annotate callgrind.out.10379
$ callgrind_annotate --tree=calling callgrind.out.10379
$ kcachegrind callgrind.out.10379   ; switch to callee map

Works on: Linux (sampling + call-tree), macOS (only sampling, not call-tree)


Profiling with gprof
--------------------
Compile and link with "-pg".
Visualization: 
https://stackoverflow.com/questions/2439060/is-it-possible-to-get-a-graphical-representation-of-gprof-results

Works on Linux and other systems with libc_g. Not clang!


Profiling with gperftools
-------------------------
Doc: https://github.com/gperftools/gperftools

1) Link your executable with -lprofiler
2) Run your executable with the CPUPROFILE environment var set:
     $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
3) Run pprof to analyze the CPU usage
     $ pprof --text <path/to/binary> /tmp/prof.out      # -pg-like text output
     $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
On macOS: Cannot map addresses to symbols. => Unusable.

[Prev in Thread]

Current Thread

[Next in Thread]

Performance improvement for large keysets, Pavel P, 2020/01/29
- Re: Performance improvement for large keysets, Bruno Haible, 2020/01/29
  - Re: Performance improvement for large keysets, Pavel P, 2020/01/29
    - Re: Performance improvement for large keysets, Bruno Haible <=
    - Re: Performance improvement for large keysets, Pavel P, 2020/01/30

Prev by Date: Re: Performance improvement for large keysets
Next by Date: Re: Performance improvement for large keysets
Previous by thread: Re: Performance improvement for large keysets
Next by thread: Re: Performance improvement for large keysets
Index(es):
- Date
- Thread