New user self-registration is currently disabled. Please email eigen-core-team @ lists.tuxfamily.org if you need an account.
Bug 1356 - SparseLU .solve() crashes for various input
SparseLU .solve() crashes for various input
Status: RESOLVED FIXED
Product: Eigen
Classification: Unclassified
Component: Sparse
3.3 (current stable)
All All
: High Crash
Assigned To: Nobody
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-12-02 10:48 UTC by Andreas Longva
Modified: 2016-12-05 14:44 UTC (History)
2 users (show)



Attachments
Minimal testcase (10.00 KB, application/x-tar)
2016-12-02 10:48 UTC, Andreas Longva
no flags Details
Failed unit tests in Eigen 3.3 when running with -fsanitize=undefined,address (1.59 KB, text/plain)
2016-12-05 13:19 UTC, Andreas Longva
no flags Details

Description Andreas Longva 2016-12-02 10:48:06 UTC
Created attachment 756 [details]
Minimal testcase

I've attached a minimal testcase which demonstrates the problem. When compiling with -fsanitize=undefined, a runtime error is recorded. In this minimal test case, it seems that the code is able to recover and produce some output (presumably it is correct? Haven't checked). However, in my larger real-world examples, it crashes with segmentation faults outright in release builds in which I have not enabled fsanitize. I've reprinted the code, compile output and output from running the example below, including the compiler versions of G++ and clang++ that I tested it with.

Code:

    #include <Eigen/Sparse>
    #include <Eigen/SparseLU>
    
    #include <iostream>
    
    int main()
    {
        Eigen::SparseMatrix<double> mat(4, 4);
        mat.coeffRef(0,0) =   4;
        mat.coeffRef(1,0) =  -1;
        mat.coeffRef(2, 0) = -1;
        mat.coeffRef(3, 0) =  0.87499999999999978;
        mat.coeffRef(0, 1) = -1;
        mat.coeffRef(1, 1) =  4;
        mat.coeffRef(2, 1) =  0;
        mat.coeffRef(3, 1) =  0.125;
        mat.coeffRef(0, 2) = -1;
        mat.coeffRef(1, 2) =  0;
        mat.coeffRef(2, 2) =  4;
        mat.coeffRef(3, 2) =  0.1249999999999999;
        mat.coeffRef(0, 3) =  0.87499999999999978;
        mat.coeffRef(1, 3) =  0.125;
        mat.coeffRef(2, 3) =  0.1249999999999999;
        mat.coeffRef(3, 3) =  0;
    
        Eigen::VectorXd b(4);
        b(0) = -0.5;
        b(1) = 0.0;
        b(2) = 0.0;
        b(3) = 0.0;
    
        Eigen::SparseLU<Eigen::SparseMatrix<double>> solver;
        solver.analyzePattern(mat);
        solver.factorize(mat);
    
        Eigen::VectorXd sol(4);
        sol = solver.solve(b);
    
        std::cout << sol << std::endl;
        return 0;
    }

Running `make`:

    $ make
    clang++ -fsanitize=undefined -std=c++11 -o eigen_testcase_clang eigen_testcase.cpp 
    g++ -fsanitize=undefined -std=c++11 -o eigen_testcase_gcc eigen_testcase.cpp
    
Running the test binaries for each compiler:

    $ ./eigen_testcase_clang 
    /usr/local/include/Eigen/src/Core/CoreEvaluators.h:181:14: runtime error: reference binding to null pointer of type 'Scalar' (aka 'double')
    -0.00431034
     0.0150862
     0.0150862
    -0.517241
      
    $ ./eigen_testcase_gcc
    /usr/local/include/Eigen/src/Core/CoreEvaluators.h:181:75: runtime error: reference binding to null pointer of type 'Scalar'
    -0.00431034
     0.0150862
     0.0150862
    -0.517241
    
Compiler versions:
    
    $ g++ --version
    g++ (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010
    Copyright (C) 2015 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    $ clang++ --version
    Ubuntu clang version 3.6.2-1 (tags/RELEASE_362/final) (based on LLVM 3.6.2)
    Target: x86_64-pc-linux-gnu
    Thread model: posix
Comment 1 Andreas Longva 2016-12-02 10:50:14 UTC
I forgot to mention that I had no problems in 3.2.10.
Comment 2 Andreas Longva 2016-12-05 10:28:44 UTC
The following may be relevant. I built Eigen's test suite in Debug mode, added -fsanitize=address -fsanitize=undefined, and ran the SparseLU tests by running:

./check.sh sparselu

and the tests failed, giving the following output:

[  0%] Built target sparselu_4
[100%] Built target sparselu_2
[100%] Built target sparselu_1
[100%] Built target sparselu_3
[100%] Built target sparselu
Test project /home/andreas/Build/eigen/eigen-eigen-26667be4f70b/target
    Start 603: sparselu_1
1/4 Test #603: sparselu_1 .......................***Failed   39.68 sec
    Start 604: sparselu_2
2/4 Test #604: sparselu_2 .......................***Failed   42.62 sec
    Start 605: sparselu_3
3/4 Test #605: sparselu_3 .......................***Failed   46.51 sec
    Start 606: sparselu_4
4/4 Test #606: sparselu_4 .......................***Failed   73.16 sec

0% tests passed, 4 tests failed out of 4

Label Time Summary:
Official    = 201.97 sec

Total Test time (real) = 205.02 sec

The following tests FAILED:
	603 - sparselu_1 (Failed)
	604 - sparselu_2 (Failed)
	605 - sparselu_3 (Failed)
	606 - sparselu_4 (Failed)
Errors while running CTest



It doesn't say the reason for failure, and I'm not at all familiar with the code base, so I'm not sure if it fails for the same reason. In any case, I thought this might be relevant.
Comment 3 Christoph Hertzberg 2016-12-05 12:48:51 UTC
The first version where the SparseLU tests fail (with -fsanitize) is
https://bitbucket.org/eigen/eigen/commits/7231911cef80
Comment 4 Andreas Longva 2016-12-05 13:19:52 UTC
Created attachment 757 [details]
Failed unit tests in Eigen 3.3 when running with -fsanitize=undefined,address
Comment 5 Andreas Longva 2016-12-05 13:22:39 UTC
For completeness, I've added an attachment detailing the failed unit tests in Eigen 3.3 when running with GCC's -fsanitize=address,undefined in Debug mode.

As is apparent from the tests, a number of the other sparse implementations also fail, though this is likely for the same reason (?).
Comment 6 Christoph Hertzberg 2016-12-05 13:50:41 UTC
As far as I debugged this, it appears to be caused by expressions like
  (& X.coeffRef(0,0))

where X is the evaluator of some empty expression which has a null data-pointer and coeffRef essentially returns something like this
  *(m_data + 0 + 0*m_outerstride);

If m_data is a nullptr this is undefined behavior and the compiler is allowed to do insane optimizations if it knows in advance that it actually is a nullptr.

A safer implementation of this would be to have all evaluators also (or only) implement a coeffPtr(row, col) function, or -- as this seems to be a common use-case provide a dataPtr() function, which just returns the internal pointer.

And I agree with Andreas, we should definitely run our test-suite with -fsanitizer more regularly (at least before new releases).
Comment 7 Gael Guennebaud 2016-12-05 13:52:46 UTC
This issue reduces to a simple dense product with empty rows:

Eigen::MatrixXd w(4,1), A(4,4), C(4,4);
A.setRandom();
C.setRandom();
w.block(0,0,0,1) = A.block(0,0,0,4) * C.block(0,0,4,1);
Comment 8 Gael Guennebaud 2016-12-05 14:11:05 UTC
Fix for the initial test-case, need to backport it and check other unit tests.

https://bitbucket.org/eigen/eigen/commits/2b0dd72c3777/
Summary:     Bug 1356: fix calls to evaluator::coeffRef(0,0) to get the address of the destination
by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).
Comment 9 Gael Guennebaud 2016-12-05 14:44:01 UTC
Backport: https://bitbucket.org/eigen/eigen/commits/216b67e06d89/

all sparse tests are green for me.

Note You need to log in before you can comment on or make changes to this bug.