1728
2019-07-04 17:02:01 +0000
Non-deterministic results with unaligned reductions
2019-12-04 18:42:07 +0000
1
1
1
Unclassified
Eigen
Core - vectorization
3.3 (current stable)
All
All
DECISIONNEEDED
Low
Accuracy Problem
---
1608
1
jose.rubio
eigen.nobody
chtz
gael.guennebaud
jacob.benoit.1
jose.rubio
markos
oldest_to_newest
8187
0
jose.rubio
2019-07-04 17:02:01 +0000
OS: Debian 4.9.30-2+deb9u5 (2017-09-19) x86_64 GNU/Linux
I observed subtle differences when summing a sparse matrix across different runs.
This test reproduces the issue (fails around 50% of the time it's run)
#include <Eigen/Sparse>
#include <gtest/gtest.h>
#include <stdlib.h>
#include <time.h>
TEST(Sparse, Reduction)
{
srand(time(NULL));
int nrows = 11300;
int ncols = 600;
int num_non_zeros = 100;
std::vector<Eigen::Triplet<float, int> > triplets;
for (int i = 0; i < num_non_zeros; i++) {
int row = rand() % nrows;
int col = rand() % ncols;
float value = static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
triplets.push_back(Eigen::Triplet<float, int>(row, col, value));
}
Eigen::SparseMatrix<float, 0, int> mat = Eigen::SparseMatrix<float, 0, int>(nrows, ncols);
mat.reserve(num_non_zeros);
mat.setFromTriplets(triplets.begin(), triplets.end());
int num_trials = 10000;
for (int tr = 0; tr < num_trials; tr++) {
Eigen::SparseMatrix<float, 0, int> mat2 = Eigen::SparseMatrix<float, 0, int>(nrows, ncols);
mat2.reserve(num_non_zeros);
mat2.setFromTriplets(triplets.begin(), triplets.end());
EXPECT_TRUE(mat.sum() == mat2.sum());
}
}
8188
1
chtz
2019-07-04 17:31:09 +0000
Can't reproduce your error -- I copied this locally into the sparse_basic unit test, replacing the `EXPECT_TRUE` by a `VERIFY_IS_EQUAL` and tried a few different clang and gcc versions. And I would actually be a bit surprised to see an error here.
Please tell:
* What compiler are you using?
* What compilation options? (e.g., any non-associative math optimizations could lead to different sums inside and outside the loop)
* With which seeds does the test fail? (A few examples would suffice)
* Are you on 3.3 head, or 3.3.7?
8189
2
jose.rubio
2019-07-04 21:59:37 +0000
Compiler: g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Options:-msse3 -mavx -fopenmp -march=native -funroll-loops -mfpmath=sse -fno-guess-branch-probability
Eigen Version: 3.3.7
I don't think the seeds are relevant. It fails more or less half of the times I run the test.
8190
3
chtz
2019-07-05 09:24:56 +0000
The culprit seems to be "-mavx"
What happens is that the sum-reduction does as many aligned memory-accesses as possible, so if the coefficients are aligned differently, slightly different sums are calculated (due to non-associativity).
You get the same behavior when summing non-aligned dense vectors.
I'd say this is a WONTFIX, since expecting exact results with floating-point math does not really make sense.
An alternative would be to let redux-operations always start at the beginning (if `EIGEN_UNALIGNED_VECTORIZE` is enabled).
8191
4
chtz
2019-07-05 10:56:02 +0000
Actually, another alternative would be to make SparseMatrix use an aligned allocator (perhaps optionally, depending on the Options), but that would introduce ABI incompatibilities.
And of course, disabling vectorization would be an option (which would cost performance, of course).
8192
5
jose.rubio
2019-07-05 12:04:08 +0000
I haven't managed to reproduce the bug using dense matrices, nor have I noticed this non-deterministic behavior with the rest of the dense vectorized operations in the project. I guess for the time being we'll drop the use of the sparse module, as we need consistent results across runs.
8193
6
chtz
2019-07-05 12:44:00 +0000
This is a simple example with dense vectors which occasionally fails:
#include <Eigen/Core>
#include <iostream>
int main() {
srand(time(0));
Eigen::VectorXf v0 = Eigen::VectorXf::Random(99), v1(100);
v1.tail(99) = v0;
std::cout << "Diff: " << v0.sum() - v1.tail(99).sum() << "\n";
}
Your test should be fine without AVX (on 64bit systems), since memory will be 16byte aligned automatically).
With some effort, it should actually be possible to get deterministic behavior, even with aligned loads (assuming the reduction is commutative, and there is a neutral element).
Something like:
// choose k, so that data + k is aligned
Packet sum = {-0.0, ..., data[0], ..., data[k-1]};
Index i;
for(i=k; i <= n-PacketSize; i+=PacketSize)
sum = padd(sum, pload<Aligned>(data + i);
Packet lastPacket = {data[i], ..., data[n-1], -0.0, ..., -0.0};
sum = padd(sum, lastPacket);
// Now content of sum will be the same, except for rotation, regardless of k
// predux must always reduce upper half + lower half, in remaining sub-vector
return predux(sum);
`Packet` should of course be two (or four) vectors to compensate for latency.
Generating the first and last Packet could cause some overhead, which may not really be worth it, though.
Changing this to DECISIONNEEDED.
8194
7
chtz
2019-07-05 12:45:42 +0000
Maybe, first benchmark if always using unaligned loads makes a difference (on modern hardware). If not, use that (at least if EIGEN_UNALIGNED_VECTORIZE is enabled).
8195
8
jose.rubio
2019-07-05 13:18:21 +0000
So indeed dropping AVX seems to fix the issue. We'll see if we can go without it and we'll consider the alternatives otherwise. Thanks for the effort!
10172
9
eigen.nobody
2019-12-04 18:42:07 +0000
-- GitLab Migration Automatic Message --
This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1728.