Created attachment 756 [details]
I've attached a minimal testcase which demonstrates the problem. When compiling with -fsanitize=undefined, a runtime error is recorded. In this minimal test case, it seems that the code is able to recover and produce some output (presumably it is correct? Haven't checked). However, in my larger real-world examples, it crashes with segmentation faults outright in release builds in which I have not enabled fsanitize. I've reprinted the code, compile output and output from running the example below, including the compiler versions of G++ and clang++ that I tested it with.
Eigen::SparseMatrix<double> mat(4, 4);
mat.coeffRef(0,0) = 4;
mat.coeffRef(1,0) = -1;
mat.coeffRef(2, 0) = -1;
mat.coeffRef(3, 0) = 0.87499999999999978;
mat.coeffRef(0, 1) = -1;
mat.coeffRef(1, 1) = 4;
mat.coeffRef(2, 1) = 0;
mat.coeffRef(3, 1) = 0.125;
mat.coeffRef(0, 2) = -1;
mat.coeffRef(1, 2) = 0;
mat.coeffRef(2, 2) = 4;
mat.coeffRef(3, 2) = 0.1249999999999999;
mat.coeffRef(0, 3) = 0.87499999999999978;
mat.coeffRef(1, 3) = 0.125;
mat.coeffRef(2, 3) = 0.1249999999999999;
mat.coeffRef(3, 3) = 0;
b(0) = -0.5;
b(1) = 0.0;
b(2) = 0.0;
b(3) = 0.0;
sol = solver.solve(b);
std::cout << sol << std::endl;
clang++ -fsanitize=undefined -std=c++11 -o eigen_testcase_clang eigen_testcase.cpp
g++ -fsanitize=undefined -std=c++11 -o eigen_testcase_gcc eigen_testcase.cpp
Running the test binaries for each compiler:
/usr/local/include/Eigen/src/Core/CoreEvaluators.h:181:14: runtime error: reference binding to null pointer of type 'Scalar' (aka 'double')
/usr/local/include/Eigen/src/Core/CoreEvaluators.h:181:75: runtime error: reference binding to null pointer of type 'Scalar'
$ g++ --version
g++ (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ clang++ --version
Ubuntu clang version 3.6.2-1 (tags/RELEASE_362/final) (based on LLVM 3.6.2)
Thread model: posix
I forgot to mention that I had no problems in 3.2.10.
The following may be relevant. I built Eigen's test suite in Debug mode, added -fsanitize=address -fsanitize=undefined, and ran the SparseLU tests by running:
and the tests failed, giving the following output:
[ 0%] Built target sparselu_4
[100%] Built target sparselu_2
[100%] Built target sparselu_1
[100%] Built target sparselu_3
[100%] Built target sparselu
Test project /home/andreas/Build/eigen/eigen-eigen-26667be4f70b/target
Start 603: sparselu_1
1/4 Test #603: sparselu_1 .......................***Failed 39.68 sec
Start 604: sparselu_2
2/4 Test #604: sparselu_2 .......................***Failed 42.62 sec
Start 605: sparselu_3
3/4 Test #605: sparselu_3 .......................***Failed 46.51 sec
Start 606: sparselu_4
4/4 Test #606: sparselu_4 .......................***Failed 73.16 sec
0% tests passed, 4 tests failed out of 4
Label Time Summary:
Official = 201.97 sec
Total Test time (real) = 205.02 sec
The following tests FAILED:
603 - sparselu_1 (Failed)
604 - sparselu_2 (Failed)
605 - sparselu_3 (Failed)
606 - sparselu_4 (Failed)
Errors while running CTest
It doesn't say the reason for failure, and I'm not at all familiar with the code base, so I'm not sure if it fails for the same reason. In any case, I thought this might be relevant.
The first version where the SparseLU tests fail (with -fsanitize) is
Created attachment 757 [details]
Failed unit tests in Eigen 3.3 when running with -fsanitize=undefined,address
For completeness, I've added an attachment detailing the failed unit tests in Eigen 3.3 when running with GCC's -fsanitize=address,undefined in Debug mode.
As is apparent from the tests, a number of the other sparse implementations also fail, though this is likely for the same reason (?).
As far as I debugged this, it appears to be caused by expressions like
where X is the evaluator of some empty expression which has a null data-pointer and coeffRef essentially returns something like this
*(m_data + 0 + 0*m_outerstride);
If m_data is a nullptr this is undefined behavior and the compiler is allowed to do insane optimizations if it knows in advance that it actually is a nullptr.
A safer implementation of this would be to have all evaluators also (or only) implement a coeffPtr(row, col) function, or -- as this seems to be a common use-case provide a dataPtr() function, which just returns the internal pointer.
And I agree with Andreas, we should definitely run our test-suite with -fsanitizer more regularly (at least before new releases).
This issue reduces to a simple dense product with empty rows:
Eigen::MatrixXd w(4,1), A(4,4), C(4,4);
w.block(0,0,0,1) = A.block(0,0,0,4) * C.block(0,0,4,1);
Fix for the initial test-case, need to backport it and check other unit tests.
Summary: Bug 1356: fix calls to evaluator::coeffRef(0,0) to get the address of the destination
by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).
all sparse tests are green for me.