This bugzilla service is closed. All entries have been migrated to https://gitlab.com/libeigen/eigen
Bug 1723 - BDCSVD: segmentation faults for some matrices
Summary: BDCSVD: segmentation faults for some matrices
Status: NEW
Alias: None
Product: Eigen
Classification: Unclassified
Component: SVD (show other bugs)
Version: 3.3 (current stable)
Hardware: x86 - AVX Linux
: High Crash
Assignee: Nobody
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-13 11:24 UTC by David Aceituno
Modified: 2019-12-04 18:40 UTC (History)
3 users (show)



Attachments
Matrix in binary format (20.14 KB, application/x-sega-cd-rom)
2019-06-13 11:24 UTC, David Aceituno
no flags Details

Description David Aceituno 2019-06-13 11:24:54 UTC
Created attachment 944 [details]
Matrix in binary format

BDCSVD fails with an assertion error (in Debug mode) or segmentation fault (in Release mode, -DNDEBUG), when using Intel MKL on a Xeon Gold 6130 machine with flag -mavx.
I am using Eigen version 3.3.7.

This is the GDB backtrace:

#0  0x0000000001b0b7fb in raise ()
#1  0x0000000001bd8b38 in abort ()
#2  0x0000000001bd26c4 in __assert_fail_base ()
#3  0x0000000001bd271e in __assert_fail ()
#4  0x000000000043d57c in Eigen::DenseCoeffsBase<Eigen::Ref<Eigen::Array<long, 1, -1, 1, 1, -1>, 0, Eigen::InnerStride<1> >, 0>::operator() (this=0x7fffffff6890, index=-1)
    at /home/x_davac/.conda/envs/dmrg/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h:180
#5  0x0000000000437a4a in Eigen::BDCSVD<Eigen::Matrix<std::complex<double>, -1, -1, 0, -1, -1> >::perturbCol0 (this=0x7fffffff8660, col0=..., diag=..., perm=..., singVals=..., shifts=..., mus=..., zhat=...)
    at /home/x_davac/.conda/envs/dmrg/include/eigen3/Eigen/src/SVD/BDCSVD.h:924
#6  0x000000000042d632 in Eigen::BDCSVD<Eigen::Matrix<std::complex<double>, -1, -1, 0, -1, -1> >::computeSVDofM (this=0x7fffffff8660, firstCol=0, n=46, U=..., singVals=..., V=...)
    at /home/x_davac/.conda/envs/dmrg/include/eigen3/Eigen/src/SVD/BDCSVD.h:638
#7  0x00000000004258cb in Eigen::BDCSVD<Eigen::Matrix<std::complex<double>, -1, -1, 0, -1, -1> >::divide (this=0x7fffffff8660, firstCol=0, lastCol=45, firstRowW=0, firstColW=0, shift=0)
    at /home/x_davac/.conda/envs/dmrg/include/eigen3/Eigen/src/SVD/BDCSVD.h:534
#8  0x000000000041f51a in Eigen::BDCSVD<Eigen::Matrix<std::complex<double>, -1, -1, 0, -1, -1> >::compute (this=0x7fffffff8660, matrix=..., computationOptions=40)
    at /home/x_davac/.conda/envs/dmrg/include/eigen3/Eigen/src/SVD/BDCSVD.h:278
#9  0x00000000004028fc in main () at /home/x_davac/svd_bug/main.cpp:53


In line 924 of eigen3/Eigen/src/SVD/BDCSVD.h, an index is out of bounds because l == 0:
    Index j = i<k ? i : perm(l-1);

    
I have tried to reproduce the error on these machines (from /proc/cpuinfo):
    - Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz   (Ubuntu 19.04)  ---> Success always
    - Intel(R) Xeon(R) CPU E5-1660 v4 @ 3.20GHz  (Ubuntu 16.04)  ---> Success always
    - Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz   (CentOS 7.6)    ---> Failure with mkl + vectorization
    

    
The error disappears in the following cases:
    - Edit line line 95 of eigen3/Eigen/src/SVD/BDCSVD.h
        from typedef Matrix<Scalar, Dynamic, Dynamic, ColMajor> MatrixX;
        into typedef Matrix<Scalar, Dynamic, Dynamic, ColMajor | DontAlign> MatrixX;
    - Remove define EIGEN_USE_BLAS (or EIGEN_USE_MKL_ALL)
    - Remove -mavx (or compile with -march=nehalem or older)
    
   
    
The error remains when changing the following: 
    - GNU C++ 7.3.0  <-> Clang 6.0.1
    - Intel MKL 2018 <-> Intel MKL 2019 v4
    - -std=c++17     <-> (none) 
    
    
A minimal code to reproduce the error can be found here: https://github.com/DavidAce/svd_bug
The CMake project should compile fine if your MKL installation is in a standard path (opt/intel/mkl, $HOME/intel/mkl, etc), defined in MKL_ROOT or in LD_LIBRARY_PATH.
The same matrix is attached in binary format as well as hardcoded into the source file main.cpp.
Weirdly enough, the hardcoded one succeeds, presumably the error is sensitive to the precision of the numbers somehow.
To read the attached binary file, use:

template<typename Derived>
void read(const char* filename, Eigen::MatrixBase<Derived>& matrix){
    std::ifstream in(filename, std::ios::in | std::ios::binary);
    typename Derived::Index rows=0, cols=0;
    in.read((char*) (&rows),sizeof(typename Derived::Index));
    in.read((char*) (&cols),sizeof(typename Derived::Index));
    matrix.derived().resize(rows, cols);
    in.read( (char *) matrix.derived().data() , rows*cols*sizeof(typename Derived::Scalar) );
    in.close();
}

as shown in main.cpp in the github link above.

    
I can provide more matrices that fail if necessary.
Comment 1 David Aceituno 2019-08-08 06:38:09 UTC
I changed the title, 
From: BDCSVD fails for some matrices with MKL+vectorization on a Xeon cpu
To  : BDCSVD segmentation faults for some matrices


Digging further I managed to reproduce the error on other CPU's as well. I was wrong about the statement BDCSVD "Success always" in the problem description. I was also wrong in thinking this had anything to do with MKL or AVX alignment. Some matrices just fail, in a very sensitive way.


In any case, the problem still occurs in line 925 in BDCSVD.h
The relevant part of the back-trace is here:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000626ed6 in Eigen::BDCSVD<Eigen::Matrix<std::complex<double>, -1, -1, 0, -1, -1> >::perturbCol0 (this=<optimized out>, col0=..., diag=..., perm=..., singVals=..., shifts=..., mus=..., zhat=...)
    at Eigen3/include/eigen3/Eigen/src/SVD/BDCSVD.h:925
925               prod *= ((singVals(j)+dk) / ((diag(i)+dk))) * ((mus(j)+(shifts(j)-dk)) / ((diag(i)-dk)));


To keep things stable, I have produced a patch that does not fix the problem, but avoids a segmentation fault (or a failed assert in DEBUG mode). This way BDCSVD returns U,S and V full of NaN's but then at least I can switch to  Jacobi or Lapacke's svd implementation (which succeeds).


Eigen 3.3.7 patch below:



diff --git a/Eigen/src/SVD/BDCSVD.h b/Eigen/src/SVD/BDCSVD.h
index 1134d66e7..bb9944b41 100644
--- a/Eigen/src/SVD/BDCSVD.h
+++ b/Eigen/src/SVD/BDCSVD.h
@@ -921,7 +921,17 @@ void BDCSVD<MatrixType>::perturbCol0
         Index i = perm(l);
         if(i!=k)
         {
-          Index j = i<k ? i : perm(l-1);
+          //Sometimes we get i >= k and l == 0, leading to perm(l-1) being out of bounds
+          //Here we make sure that perm isn't accessed out of bounds.
+          //However, when this happens, the resulting U,S and V^T matrices will usually contain
+          //NAN's, but at least we then get a chance to do something about it, instead of segfault.
+
+          //Index j = i<k ? i : perm(l-1);
+          Index j;
+          if (i<k) j = i;
+          else if (l >  0 and l < m) j = perm(l-1);
+          else continue;
+
           prod *= ((singVals(j)+dk) / ((diag(i)+dk))) * ((mus(j)+(shifts(j)-dk)) / ((diag(i)-dk)));
 #ifdef EIGEN_BDCSVD_DEBUG_VERBOSE
           if(i!=k && std::abs(((singVals(j)+dk)*(mus(j)+(shifts(j)-dk)))/((diag(i)+dk)*(diag(i)-dk)) - 1) > 0.9 )
Comment 2 Nobody 2019-12-04 18:40:51 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1723.

Note You need to log in before you can comment on or make changes to this bug.