Difference between revisions of "SparseMatrix"

From Eigen
Jump to: navigation, search
(Matrix product (again !))
Line 19: Line 19:
 
* we already have some proof of concept code for CCS matrix (class SparseMatrix) and std::map based matrix (HashMatrix).
 
* we already have some proof of concept code for CCS matrix (class SparseMatrix) and std::map based matrix (HashMatrix).
 
** unlike GMM++, our CCS matrix can be filled dynamically in a coherent order, i.e. with increasing i+j*rows (no random write)
 
** unlike GMM++, our CCS matrix can be filled dynamically in a coherent order, i.e. with increasing i+j*rows (no random write)
** internally, a HashMatrix is simply a std::vector< std::map<int,Scalar> >
+
** internally, a HashMatrix is simply a std::vector< std::map<int,Scalar> >. It is simple to implement and easy to use but it is also damn slow ! would be really nice if we could find a better alternative for efficient random writes.
 
* each expression defines a InnerIterator which allows to efficiently traverse the inner coefficients of a sparse (or dense) matrix. Of course, these InnerIterator can be nested exactly like expressions are nested such that our sparse matrices already support expression templates !
 
* each expression defines a InnerIterator which allows to efficiently traverse the inner coefficients of a sparse (or dense) matrix. Of course, these InnerIterator can be nested exactly like expressions are nested such that our sparse matrices already support expression templates !
 
* InnerIterator are implemented for:
 
* InnerIterator are implemented for:
Line 28: Line 28:
 
** HashMatrix
 
** HashMatrix
 
* efficient code for the product of two CCS to a CCS.
 
* efficient code for the product of two CCS to a CCS.
 +
  
 
== Issues ==
 
== Issues ==
  
 
The major issue with sparse matrices is that they can only be efficiently traversed in a specific order, i.e., per column for a CCS matrix and per row for a CRS matrix. The nighmare starts when you have to deal with expressions mixing CCS matrices (column major) and CRS matrices (row major).
 
The major issue with sparse matrices is that they can only be efficiently traversed in a specific order, i.e., per column for a CCS matrix and per row for a CRS matrix. The nighmare starts when you have to deal with expressions mixing CCS matrices (column major) and CRS matrices (row major).
 +
 +
 +
=== Generic API to fill a sparse/dense matrices ===
 +
 +
The current SparseMatrix can be filled like that:
 +
<pre>
 +
SparseMatrix m(rows,cols);
 +
m.startFill(2*cols);
 +
// 2*cols is a hint on the number of nonzero entries
 +
// note that startFill() delete all previous elements in the matrix
 +
m.fill(2,0) = rand();
 +
m.fill(3,0) = rand();
 +
m.fill(0,2) = rand();
 +
m.fill(7,2) = rand();
 +
m.fill(12,9) = rand();
 +
m.fill(8,11) = rand();
 +
m.endFill();
 +
</pre>
 +
At that point m.nonZeros() == 6 and you cannot add any other nonzero entries. For instance x=m(0,0); will returns zero, while m(0,0)=x; will issue an assert. Of course you can still update an existing nonzero: m(7,2) += 1;.
 +
 +
To allow to treat any matrix as a sparse matrix we could define dummy startFill(), fill() and endFill() members to MatrixBase but maybe we could find some better API ? Indeed, while efficient there are two major limitations:
 +
* we cannot update the matrix, only set it from zero
 +
* to be consistent, for dense matrices startFill() should do a setZero() that is not really nice
 +
 +
For instance we could imagine something based on the facade design pattern where you could request for a random setter or coherent setter or an updater or whatever else is needed ? For dense matrices these facade objects would degenerate to a simple references.
  
  
Line 99: Line 125:
 
|like col row col
 
|like col row col
 
|}
 
|}
 +
  
 
== Review of some existing libs ==
 
== Review of some existing libs ==
  
 
=== GMM++ ===
 
=== GMM++ ===
** native support of basics (add, mul, etc)
+
* native support of basics (add, mul, etc)
** native support of various iterative linear solvers including various pre-conditioners for selfadjoint and non-selfadjoint matrix (most of them come from ITL)
+
* native support of various iterative linear solvers including various pre-conditioners for selfadjoint and non-selfadjoint matrix (most of them come from ITL)
** can use superLU for direct solvers
+
* can use superLU for direct solvers
** provides various sparse matrix formats:
+
* provides various sparse matrix formats:
*** dense array (col or row) of std::map<int,scalar>
+
** dense array (col or row) of std::map<int,scalar>
**** pro: easy to implement, relatively fast random access (read/write)
+
*** pro: easy to implement, relatively fast random access (read/write)
**** cons: no ideal to use as input of the algorithms, one dimension is dense
+
*** cons: no ideal to use as input of the algorithms, one dimension is dense
*** Compressed Col/Row Storage (CCS/CRS) (see http://netlib.org/linalg/html_templates/node91.html)
+
** Compressed Col/Row Storage (CCS/CRS) (see http://netlib.org/linalg/html_templates/node91.html)
**** pro: compact storage (overhead = (rows/cols + nnz) * sizeof(int) where nnz = number of non zeros), fast to loop over the non-zeros, compatible with other very optimized C library (TAUCS, SuperLU)
+
*** pro: compact storage (overhead = (rows/cols + nnz) * sizeof(int) where nnz = number of non zeros), fast to loop over the non-zeros, compatible with other very optimized C library (TAUCS, SuperLU)
**** cons: random access (GMM++ does not provide write access to such matrices), one dimension is dense
+
*** cons: random access (GMM++ does not provide write access to such matrices), one dimension is dense
*** block matrix (not documented)
+
** block matrix (not documented)
  
  
 
=== MTL4 / ITL ===
 
=== MTL4 / ITL ===
** native support of basics (in MTL4)
+
* native support of basics (in MTL4)
** various iterative linear solvers  (in ITL)
+
* various iterative linear solvers  (in ITL)
** provides CCS/CRS matrix format (read only)
+
* provides CCS/CRS matrix format (read only)
** writes/update of a sparse matrix are done via a generic facade/proxy:
+
* writes/update of a sparse matrix are done via a generic facade/proxy:
*** this object allocate a fixed number of coefficients per column (e.g. 5) and manage overflow using std::maps
+
** this object allocate a fixed number of coefficients per column (e.g. 5) and manage overflow using std::maps
*** when this object is deleted (after the write operations) this dynamic representation is packed to the underlying matrix
+
** when this object is deleted (after the write operations) this dynamic representation is packed to the underlying matrix
*** for dense matrix this facade object degenerate to the matrix itself allowing a uniform interface to fill/update both dense and sparse matrix
+
** for dense matrix this facade object degenerate to the matrix itself allowing a uniform interface to fill/update both dense and sparse matrix
  
  
 
=== boost::ublas ===
 
=== boost::ublas ===
** provides only basic linear algebra
+
* provides only basic linear algebra
** provides tons of sparse matrix/vector formats:
+
* provides tons of sparse matrix/vector formats:
*** CCS/CRS
+
** CCS/CRS
*** std::vector< std::map<int,value> >
+
** std::vector< std::map<int,value> >
*** std::map< int, std::map<int,value> >
+
** std::map< int, std::map<int,value> >
*** coordinate matrix: similar to CCS/CRS but with unsorted entries and the same coeff can appear multiple time (its real value is the sum of its occurances)
+
** coordinate matrix: similar to CCS/CRS but with unsorted entries and the same coeff can appear multiple time (its real value is the sum of its occurances)
*** and some other variants
+
** and some other variants
  
  
 
=== Other related libs ===
 
=== Other related libs ===
** sparselib++ / IML++: basics, various iterative solvers (in IML++)
+
* sparselib++ / IML++: basics, various iterative solvers (in IML++)
** superLU: direct solver
+
* superLU: direct solver
** UMFPACK/CHOLMOD: direct solvers (LU/cholesky), LGPL, depend on lapack
+
* UMFPACK/CHOLMOD: direct solvers (LU/cholesky), LGPL, depend on lapack
** IETL: eigen problems
+
* IETL: eigen problems
** TAUCS: direct and iterative solvers, depend on lapack and metis
+
* TAUCS: direct and iterative solvers, depend on lapack and metis
  
  
 
== Links ==
 
== Links ==
** Common storage formats: http://netlib.org/linalg/html_templates/node90.html
+
* Common storage formats: http://netlib.org/linalg/html_templates/node90.html
** A comparison of free libs: http://www.netlib.org/utk/people/JackDongarra/la-sw.html
+
* A comparison of free libs: http://www.netlib.org/utk/people/JackDongarra/la-sw.html
** A collection of sparse matrices: http://www.cise.ufl.edu/research/sparse/matrices/
+
* A collection of sparse matrices: http://www.cise.ufl.edu/research/sparse/matrices/

Revision as of 13:01, 24 June 2008

Preliminaries

What's the plan ?

Ideally:

  • define our own classes for storage and basic algebra (sum, product, triangular solver)
  • a good support of CCS/CRS sparse matrix is certainly the priority as it is the most common storage format
  • we could probably adapt the linear solver algorithms of GMM++ (or ITL) to directly use our matrix classes
    • see the licensing issues
  • provide the ability to use more optimized backends like TAUCS or SuperLU as well as backends for eigen value solvers via a unified API


Current state

  • we already have some proof of concept code for CCS matrix (class SparseMatrix) and std::map based matrix (HashMatrix).
    • unlike GMM++, our CCS matrix can be filled dynamically in a coherent order, i.e. with increasing i+j*rows (no random write)
    • internally, a HashMatrix is simply a std::vector< std::map<int,Scalar> >. It is simple to implement and easy to use but it is also damn slow ! would be really nice if we could find a better alternative for efficient random writes.
  • each expression defines a InnerIterator which allows to efficiently traverse the inner coefficients of a sparse (or dense) matrix. Of course, these InnerIterator can be nested exactly like expressions are nested such that our sparse matrices already support expression templates !
  • InnerIterator are implemented for:
    • a default implementation for MatrixBase
    • CwiseUnaryOp
    • CwiseBinaryOp (with some shortcomings, see below)
    • SparseMatrix
    • HashMatrix
  • efficient code for the product of two CCS to a CCS.


Issues

The major issue with sparse matrices is that they can only be efficiently traversed in a specific order, i.e., per column for a CCS matrix and per row for a CRS matrix. The nighmare starts when you have to deal with expressions mixing CCS matrices (column major) and CRS matrices (row major).


Generic API to fill a sparse/dense matrices

The current SparseMatrix can be filled like that:

SparseMatrix m(rows,cols);
m.startFill(2*cols);
// 2*cols is a hint on the number of nonzero entries
// note that startFill() delete all previous elements in the matrix
m.fill(2,0) = rand();
m.fill(3,0) = rand();
m.fill(0,2) = rand();
m.fill(7,2) = rand();
m.fill(12,9) = rand();
m.fill(8,11) = rand();
m.endFill();

At that point m.nonZeros() == 6 and you cannot add any other nonzero entries. For instance x=m(0,0); will returns zero, while m(0,0)=x; will issue an assert. Of course you can still update an existing nonzero: m(7,2) += 1;.

To allow to treat any matrix as a sparse matrix we could define dummy startFill(), fill() and endFill() members to MatrixBase but maybe we could find some better API ? Indeed, while efficient there are two major limitations:

  • we cannot update the matrix, only set it from zero
  • to be consistent, for dense matrices startFill() should do a setZero() that is not really nice

For instance we could imagine something based on the facade design pattern where you could request for a random setter or coherent setter or an updater or whatever else is needed ? For dense matrices these facade objects would degenerate to a simple references.


Binary operators (aka. operator+)

Here the problem is that we cannot mix a CCS with a CRS. In such a case one of the argument have to be evaluated to the other format (let's pick the default format). This requires some modifications in Eigen/Core:

  • probably needs to add a SparseBit flag
  • needs a more advanced mechanism to determine the storage order of a CwiseBinaryOP<> expression
  • needs to modify nesting in ei_traits<CwiseBinaryOP<>> such that if an argument is sparse and has a different storage order than the expression itself, then it has to be evaluated to a sparse temporary.
  • needs to modify ei_eval such that a sparse xpr gets evaluated to a sparse matrix (simply add an optional template parameter equal to Xpr::Flags&SparseBit and do the specialization in Eigen/Sparse)


Transpose

Basically, here the challenge is to copy a CRS matrix to a CCS one. I can imagine three strategies:

  • copy to a temporary column major HashMatrix and then compress it to a CCS (slow)
  • perform a column major traversal of the CRS matrix:
      • create rows InnerIterators iters;
      • for j=0..cols do
        • for i=0..rows do
          • if iters[i].index()==j then
            • res(i,j) = iters[i].value; iters[i]++
    • of course creating rows iterators might be too expensive so we need a way to provide lightweight iterators where the reference to the expression, the current outer index etc. are not stored. Maybe we could slightly update the InnerIterator such that they can loop over an arbitrary number of outer columns/rows.
  • the third options is like the first one but with a much more efficient data structure. Indeed, during a row major processing the coefficients of the column major destination matrix will be set in coherent order per column. So we can use a special temporary sparse matrix with a column/row vector of linked fixed size vectors. This seems to be the best option. If it is a temporary there is no need to convert it to a compact CCS matrix.


Matrix product (again !)

Here we have to investigate all the possible combinations for the 2 operands and the results (2*2*2=8 possibilities):

lhs rhs res comments
col col col Loop over rhs in a column major order (j, k), accumulate res.co(j) += rhs(k,j) * lhs.col(k). The accumulation needs a single column temporary which is initialized for each j. If the result of the product is dense (nnz > 4%), then the best is to allocate a dense vector, otherwise a sorted linked list peforms best (the accumulation of each column of lhs is coherent, so we can exploit this coherence to optimize search and insertion in the linked list)
col col row weird case, let's evaluate to a col major temporary and then transpose
row col * Loop over the result coefficient in the preferred order and perform coherent dot products
row row * like col col *
col row col if transpose is fast then evaluate rhs to a column major format, otherwise use a dynamic temporary for the results (array of sorted linked lists)
col row row like col row col


Review of some existing libs

GMM++

  • native support of basics (add, mul, etc)
  • native support of various iterative linear solvers including various pre-conditioners for selfadjoint and non-selfadjoint matrix (most of them come from ITL)
  • can use superLU for direct solvers
  • provides various sparse matrix formats:
    • dense array (col or row) of std::map<int,scalar>
      • pro: easy to implement, relatively fast random access (read/write)
      • cons: no ideal to use as input of the algorithms, one dimension is dense
    • Compressed Col/Row Storage (CCS/CRS) (see http://netlib.org/linalg/html_templates/node91.html)
      • pro: compact storage (overhead = (rows/cols + nnz) * sizeof(int) where nnz = number of non zeros), fast to loop over the non-zeros, compatible with other very optimized C library (TAUCS, SuperLU)
      • cons: random access (GMM++ does not provide write access to such matrices), one dimension is dense
    • block matrix (not documented)


MTL4 / ITL

  • native support of basics (in MTL4)
  • various iterative linear solvers (in ITL)
  • provides CCS/CRS matrix format (read only)
  • writes/update of a sparse matrix are done via a generic facade/proxy:
    • this object allocate a fixed number of coefficients per column (e.g. 5) and manage overflow using std::maps
    • when this object is deleted (after the write operations) this dynamic representation is packed to the underlying matrix
    • for dense matrix this facade object degenerate to the matrix itself allowing a uniform interface to fill/update both dense and sparse matrix


boost::ublas

  • provides only basic linear algebra
  • provides tons of sparse matrix/vector formats:
    • CCS/CRS
    • std::vector< std::map<int,value> >
    • std::map< int, std::map<int,value> >
    • coordinate matrix: similar to CCS/CRS but with unsorted entries and the same coeff can appear multiple time (its real value is the sum of its occurances)
    • and some other variants


Other related libs

  • sparselib++ / IML++: basics, various iterative solvers (in IML++)
  • superLU: direct solver
  • UMFPACK/CHOLMOD: direct solvers (LU/cholesky), LGPL, depend on lapack
  • IETL: eigen problems
  • TAUCS: direct and iterative solvers, depend on lapack and metis


Links