Floating point error and reproducibility is a very complex topic.
Why this behavior?
I am not a GCC expert. However, my guess is that GCC knows the type of CPU
on which it is installed. Then by default, it emits the most optimal
instructions for that particular CPU. Different CPU's have different
floating point hardware. This is how the binaries compiled on A and B can
be different. And then, different instructions and hardware can result in
differing roundoff error in some steps of large calculations.
Which one is the good one?
That can only be determined by careful numerical analysis or sufficiency
testing, or both. That is beyond the scope of this forum. Perhaps both
results are good, depending on your actual needs. For example, in my field
of atmospheric forecasting, an accuracy of 1% would be astoundingly good.
Can I solve the issue?
There are some compiler optimizations specifically designed to improve
reproducibility. Look for this in your compiler documentation.
One approach is to experimentally disable some or all of the
compiler-controlled optimizations. This only gets you closer to
reproducibility, and is not much help to determine actual correctness.
Typically I try "gcc -O0" to approach reproducibility.
Also look for a way to display the applied optimizations for a particular
compile, and compare them between A and B. Perhaps all you care about is
finding the one or two specific optimizations that can be disabled to make
identical results in your case.
Try running the code generated by machine B on machine A. If it crashes,
the diagnostic might tell you specifically which hardware feature on B is
absent from A.
On Wed, Nov 13, 2024 at 12:10 PM Patrick Dupre <pdupre@gmx.com> wrote:
Hello,
The same application (relatively heavy code), provides different values
when it
is run and compiled on 2 different machines.
Both F40 (last update)
gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3)
One
Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
The other one
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Actually, this happens when I use the gsl library.
gsl-devel-2.7.1-8.fc40.x86_64
for integration (gsl_integration_cquad).
Before integration, the values are strictly identical.
The same Makefile is used.
Now, if I copy the code generated by the machine A on machine B,
I get the same results as it had been run on machine A.
The size of both codes are slightly different.
I conclude that the issue is due to the compiler.
Indeed, the difference in the generated values seems pretty constant,
i.e., it seems proportional to the value itself: of the order 2.7e-8
(relative difference)
i.e. a lot higher than the accuracy of the machine: < 1e-35.
Which one is the good one?
Why this behavior?
Can I solve the issue?
Thank for any help.
===========================================================================
Patrick DUPRÉ | | email: pdupre@gmx.com
===========================================================================