OS: Debian 4.9.30-2+deb9u5 (2017-09-19) x86_64 GNU/Linux I observed subtle differences when summing a sparse matrix across different runs. This test reproduces the issue (fails around 50% of the time it's run) #include <Eigen/Sparse> #include <gtest/gtest.h> #include <stdlib.h> #include <time.h> TEST(Sparse, Reduction) { srand(time(NULL)); int nrows = 11300; int ncols = 600; int num_non_zeros = 100; std::vector<Eigen::Triplet<float, int> > triplets; for (int i = 0; i < num_non_zeros; i++) { int row = rand() % nrows; int col = rand() % ncols; float value = static_cast<float>(rand()) / static_cast<float>(RAND_MAX); triplets.push_back(Eigen::Triplet<float, int>(row, col, value)); } Eigen::SparseMatrix<float, 0, int> mat = Eigen::SparseMatrix<float, 0, int>(nrows, ncols); mat.reserve(num_non_zeros); mat.setFromTriplets(triplets.begin(), triplets.end()); int num_trials = 10000; for (int tr = 0; tr < num_trials; tr++) { Eigen::SparseMatrix<float, 0, int> mat2 = Eigen::SparseMatrix<float, 0, int>(nrows, ncols); mat2.reserve(num_non_zeros); mat2.setFromTriplets(triplets.begin(), triplets.end()); EXPECT_TRUE(mat.sum() == mat2.sum()); } }
Can't reproduce your error -- I copied this locally into the sparse_basic unit test, replacing the `EXPECT_TRUE` by a `VERIFY_IS_EQUAL` and tried a few different clang and gcc versions. And I would actually be a bit surprised to see an error here. Please tell: * What compiler are you using? * What compilation options? (e.g., any non-associative math optimizations could lead to different sums inside and outside the loop) * With which seeds does the test fail? (A few examples would suffice) * Are you on 3.3 head, or 3.3.7?
Compiler: g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 Options:-msse3 -mavx -fopenmp -march=native -funroll-loops -mfpmath=sse -fno-guess-branch-probability Eigen Version: 3.3.7 I don't think the seeds are relevant. It fails more or less half of the times I run the test.
The culprit seems to be "-mavx" What happens is that the sum-reduction does as many aligned memory-accesses as possible, so if the coefficients are aligned differently, slightly different sums are calculated (due to non-associativity). You get the same behavior when summing non-aligned dense vectors. I'd say this is a WONTFIX, since expecting exact results with floating-point math does not really make sense. An alternative would be to let redux-operations always start at the beginning (if `EIGEN_UNALIGNED_VECTORIZE` is enabled).
Actually, another alternative would be to make SparseMatrix use an aligned allocator (perhaps optionally, depending on the Options), but that would introduce ABI incompatibilities. And of course, disabling vectorization would be an option (which would cost performance, of course).
I haven't managed to reproduce the bug using dense matrices, nor have I noticed this non-deterministic behavior with the rest of the dense vectorized operations in the project. I guess for the time being we'll drop the use of the sparse module, as we need consistent results across runs.
This is a simple example with dense vectors which occasionally fails: #include <Eigen/Core> #include <iostream> int main() { srand(time(0)); Eigen::VectorXf v0 = Eigen::VectorXf::Random(99), v1(100); v1.tail(99) = v0; std::cout << "Diff: " << v0.sum() - v1.tail(99).sum() << "\n"; } Your test should be fine without AVX (on 64bit systems), since memory will be 16byte aligned automatically). With some effort, it should actually be possible to get deterministic behavior, even with aligned loads (assuming the reduction is commutative, and there is a neutral element). Something like: // choose k, so that data + k is aligned Packet sum = {-0.0, ..., data[0], ..., data[k-1]}; Index i; for(i=k; i <= n-PacketSize; i+=PacketSize) sum = padd(sum, pload<Aligned>(data + i); Packet lastPacket = {data[i], ..., data[n-1], -0.0, ..., -0.0}; sum = padd(sum, lastPacket); // Now content of sum will be the same, except for rotation, regardless of k // predux must always reduce upper half + lower half, in remaining sub-vector return predux(sum); `Packet` should of course be two (or four) vectors to compensate for latency. Generating the first and last Packet could cause some overhead, which may not really be worth it, though. Changing this to DECISIONNEEDED.
Maybe, first benchmark if always using unaligned loads makes a difference (on modern hardware). If not, use that (at least if EIGEN_UNALIGNED_VECTORIZE is enabled).
So indeed dropping AVX seems to fix the issue. We'll see if we can go without it and we'll consider the alternatives otherwise. Thanks for the effort!
-- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1728.