This bugzilla service is closed. All entries have been migrated to

Bug 1728

Summary: Non-deterministic results with unaligned reductions
Product: Eigen Reporter: Jose Rubio <jose.rubio>
Component: Core - vectorizationAssignee: Nobody <eigen.nobody>
Status: DECISIONNEEDED ---    
Severity: Accuracy Problem CC: chtz, gael.guennebaud, jacob.benoit.1, jose.rubio, markos
Priority: Low    
Version: 3.3 (current stable)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 1608    

Description Jose Rubio 2019-07-04 17:02:01 UTC
OS: Debian 4.9.30-2+deb9u5 (2017-09-19) x86_64 GNU/Linux

I observed subtle differences when summing a sparse matrix across different runs. 

This test reproduces the issue (fails around 50% of the time it's run)

#include <Eigen/Sparse>
#include <gtest/gtest.h>
#include <stdlib.h>
#include <time.h>

TEST(Sparse, Reduction)
    int nrows = 11300;
    int ncols = 600;
    int num_non_zeros = 100;

    std::vector<Eigen::Triplet<float, int> > triplets;
    for (int i = 0; i < num_non_zeros; i++) {
        int row = rand() % nrows;
        int col = rand() % ncols;
        float value = static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
        triplets.push_back(Eigen::Triplet<float, int>(row, col, value));

    Eigen::SparseMatrix<float, 0, int> mat = Eigen::SparseMatrix<float, 0, int>(nrows, ncols);
    mat.setFromTriplets(triplets.begin(), triplets.end());

    int num_trials = 10000;
    for (int tr = 0; tr < num_trials; tr++) {
        Eigen::SparseMatrix<float, 0, int> mat2 = Eigen::SparseMatrix<float, 0, int>(nrows, ncols);
        mat2.setFromTriplets(triplets.begin(), triplets.end());
        EXPECT_TRUE(mat.sum() == mat2.sum());

Comment 1 Christoph Hertzberg 2019-07-04 17:31:09 UTC
Can't reproduce your error -- I copied this locally into the sparse_basic unit test, replacing the `EXPECT_TRUE` by a `VERIFY_IS_EQUAL` and tried a few different clang and gcc versions. And I would actually be a bit surprised to see an error here.

Please tell:
* What compiler are you using?
* What compilation options? (e.g., any non-associative math optimizations could lead to different sums inside and outside the loop)
* With which seeds does the test fail? (A few examples would suffice)
* Are you on 3.3 head, or 3.3.7?
Comment 2 Jose Rubio 2019-07-04 21:59:37 UTC
Compiler: g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Options:-msse3 -mavx -fopenmp -march=native -funroll-loops -mfpmath=sse -fno-guess-branch-probability

Eigen Version: 3.3.7

I don't think the seeds are relevant. It fails more or less half of the times I run the test.
Comment 3 Christoph Hertzberg 2019-07-05 09:24:56 UTC
The culprit seems to be "-mavx"

What happens is that the sum-reduction does as many aligned memory-accesses as possible, so if the coefficients are aligned differently, slightly different sums are calculated (due to non-associativity).
You get the same behavior when summing non-aligned dense vectors.

I'd say this is a WONTFIX, since expecting exact results with floating-point math does not really make sense.
An alternative would be to let redux-operations always start at the beginning (if `EIGEN_UNALIGNED_VECTORIZE` is enabled).
Comment 4 Christoph Hertzberg 2019-07-05 10:56:02 UTC
Actually, another alternative would be to make SparseMatrix use an aligned allocator (perhaps optionally, depending on the Options), but that would introduce ABI incompatibilities.
And of course, disabling vectorization would be an option (which would cost performance, of course).
Comment 5 Jose Rubio 2019-07-05 12:04:08 UTC
I haven't managed to reproduce the bug using dense matrices, nor have I noticed this non-deterministic behavior with the rest of the dense vectorized operations in the project. I guess for the time being we'll drop the use of the sparse module, as we need consistent results across runs.
Comment 6 Christoph Hertzberg 2019-07-05 12:44:00 UTC
This is a simple example with dense vectors which occasionally fails:

#include <Eigen/Core>
#include <iostream>

int main() {
  Eigen::VectorXf v0 = Eigen::VectorXf::Random(99), v1(100);
  v1.tail(99) = v0;
  std::cout << "Diff: " << v0.sum() - v1.tail(99).sum() << "\n";

Your test should be fine without AVX (on 64bit systems), since memory will be 16byte aligned automatically).

With some effort, it should actually be possible to get deterministic behavior, even with aligned loads (assuming the reduction is commutative, and there is a neutral element).

Something like:
  // choose k, so that data + k is aligned
  Packet sum = {-0.0, ..., data[0], ..., data[k-1]};
  Index i;
  for(i=k; i <= n-PacketSize; i+=PacketSize)
    sum = padd(sum, pload<Aligned>(data + i);
  Packet lastPacket = {data[i], ..., data[n-1], -0.0, ..., -0.0};
  sum = padd(sum, lastPacket);

  // Now content of sum will be the same, except for rotation, regardless of k
  // predux must always reduce upper half + lower half, in remaining sub-vector
  return predux(sum);

`Packet` should of course be two (or four) vectors to compensate for latency.
Generating the first and last Packet could cause some overhead, which may not really be worth it, though.

Changing this to DECISIONNEEDED.
Comment 7 Christoph Hertzberg 2019-07-05 12:45:42 UTC
Maybe, first benchmark if always using unaligned loads makes a difference (on modern hardware). If not, use that (at least if EIGEN_UNALIGNED_VECTORIZE is enabled).
Comment 8 Jose Rubio 2019-07-05 13:18:21 UTC
So indeed dropping AVX seems to fix the issue. We'll see if we can go without it and we'll consider the alternatives otherwise. Thanks for the effort!
Comment 9 Nobody 2019-12-04 18:42:07 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: