Difference between revisions of "SparseMatrix"

From Eigen
Jump to: navigation, search
(Inner coherent access)
Line 1: Line 1:
 
== Preliminaries ==
 
== Preliminaries ==
  
* '''CCS''' for Compressed Column Storage is the most common sparse storage scheme (see http://netlib.org/linalg/html_templates/node90.html)
+
* '''CCS''' stands for Compressed Column Storage. It is the most common sparse storage scheme (see http://netlib.org/linalg/html_templates/node90.html).
 
* CRS: Compressed Row Storage, like CCS but row major
 
* CRS: Compressed Row Storage, like CCS but row major
 
* nnz: number of nonzero entries
 
* nnz: number of nonzero entries
  
 
== What's the plan ? ==
 
== What's the plan ? ==
 +
  
 
Ideally:
 
Ideally:
Line 17: Line 18:
 
== Current state ==
 
== Current state ==
  
We already have some proof of concept code for CCS (CRS) matrix (class SparseMatrix).
+
For the first step the goal is to focus on CCS (CRS) sparse matrix. See the class SparseMatrix.
* unlike GMM++, our CCS matrix can be filled dynamically in a coherent order, i.e. with increasing i+j*rows (no random write)
+
* our SparseMatrix matrix can be filled dynamically in a coherent order, i.e. with increasing i+j*rows (no random write)
* unlike some other sparse libraries (e.g., CSparse), the coefficients of our CCS matrices are always sorted
+
* a SparseMatrix can also be set randomly via the class RandomSetter (it temporarily store the matrix coefficient into a set of hash map, built-in support for std::map, std::hash_map, google::dense_hash_map, and google::sparse_hash_map)
* supported operations (for every types):
+
* unlike several other sparse libraries (e.g., CSparse), the coefficients of our CCS matrices are always sorted.
** add, sub, scale, transpose etc., all as expression templates
+
* efficient shallow copy for matrices marked as rvalue
** efficient transposition (when a transpose expression gets evaluated)
+
* most of basic operations are supported in a generic way:
 +
** add, sub, scale, transpose etc. (all via efficient expression templates)
 +
** efficient transposition (i.e., when a transpose expression gets evaluated)
 
** efficient sparse-sparse matrix product (no SSE yet)
 
** efficient sparse-sparse matrix product (no SSE yet)
 
** preliminary support for sub matrices (blocks) but not very good yet
 
** preliminary support for sub matrices (blocks) but not very good yet
 
** triangular solver with a dense right-hand side (vector or matrix)
 
** triangular solver with a dense right-hand side (vector or matrix)
 +
* LDLT Cholesky factorization with direct solver
 +
** adapted from T. Davis's LDL library
 
* LLT Cholesky factorization with direct solver
 
* LLT Cholesky factorization with direct solver
 
** for CCS only
 
** for CCS only
 
** 3 backends:
 
** 3 backends:
*** a built-in one, very basic but useful to perform an incomplete factorization (double and float only, currently)
+
*** a built-in one, very basic but useful to perform an incomplete factorization (currently, double and float only)
*** cholmod (double and float only, currently)
+
*** cholmod (currently, double and float only)
*** TAUCS (double and float only, currently)
+
*** TAUCS (currently, double and float only)
 
* LU factorization with direct solver
 
* LU factorization with direct solver
 
** no built-in implementation yet, 2 external backends:
 
** no built-in implementation yet, 2 external backends:
Line 38: Line 43:
 
** all backends give access to the matrices L and U, the permutations (P and Q) and the determinant
 
** all backends give access to the matrices L and U, the permutations (P and Q) and the determinant
  
We already have some more flexible matrices which can be used to initialize a sparse matrix with more flexibility:
 
* HashMatrix which is simply a std::vector< std::map<int,Scalar> >. However this is not very efficient, both in term of memory and write access. A better tradeoff is probably to allocate a hashmap for a couple of columns (or rows) and move to google's outstanding densehashmap data structure (well we can also make the hashmap type a template parameter).
 
* a LinkedVectorMatrix types which will probably be removed since more efficient approach can be employed.
 
  
Internally, each expression defines a InnerIterator which allows to efficiently traverse the inner coefficients of a sparse (or dense) matrix. Of course, these InnerIterator can be nested exactly like expressions are nested such that our sparse matrices already support expression templates !
+
Internally, each expression defines a InnerIterator which allows to efficiently traverse the inner coefficients of a sparse (or dense) matrix. Of course, these InnerIterator can be nested exactly like expressions are nested such that our sparse matrices support expression templates !
 
* InnerIterator are implemented for:
 
* InnerIterator are implemented for:
 
** a default implementation for MatrixBase (suitable for any dense expressions)
 
** a default implementation for MatrixBase (suitable for any dense expressions)
Line 48: Line 50:
 
** CwiseBinaryOp (with some shortcomings, see below)
 
** CwiseBinaryOp (with some shortcomings, see below)
 
** SparseMatrix
 
** SparseMatrix
** HashMatrix
 
 
** Block (with some shortcomings, see below)
 
** Block (with some shortcomings, see below)
 +
  
 
== Write/Update access patterns ==
 
== Write/Update access patterns ==
Line 57: Line 59:
 
=== Fully coherent access ===
 
=== Fully coherent access ===
  
Here we assume the matrix is set in a fully coherent order, i.e., such that the coefficients (i,j) are set with increasing i + j*outersize where i is the inner coordinates and j the outer coordinate. In such a case we can directly set a compressed sparse matrix as we would fill a dynamic vector. In order to reduce memory allocations/memory copies, it is important to be able to give a hint about the expected number of non-zeros such that we are able to preallocate enough memory. Of course using a more dynamic data structure like a linked list of small array would probably performs better compared to resizing a single large buffer but since this not a standard storage format and that the standard compressed scheme works pretty well it's probably better to use it directly.
+
Here we assume the matrix is set in a fully coherent order, i.e., such that the coefficients (i,j) are set with increasing i + j*outersize where i is the inner coordinates and j the outer coordinate. In such a case we can directly set a compressed sparse matrix as we would fill a dynamic vector. In order to reduce memory allocations/memory copies, it is important to be able to give a hint about the expected number of non-zeros such that we are able to preallocate enough memory.
  
Updating a sparse matrix using this access pattern can be done by filling a new temporary matrix followed by an efficient shallow copy.
+
Updating a sparse matrix using this access pattern can be done by filling a new temporary matrix followed by an efficient shallow copy. The current API is:
 
+
<source lang="cpp">
=== Inner coherent access ===
+
 
+
Here we assume the coefficients (i,j) are set with increasing inner coordinate i for each outer vector j. For instance the following sequence of coordinates is valid:
+
(2,10) (4,7) (1,12) (4,10) (3,12) (2,1)
+
On the other hand, at that point it is forbidden to set the coefficient (2,12) since (3,12) has already been set. A typical use case is to copy a row-major matrix B to a column-major matrix A: since B has to be traversed in a row major order the sequence of coordinates (i,j) won't fit the requirements of the fully coherent access pattern but perfectly match the current one.
+
 
+
'''deprecated: '''
+
''To implement such a behavior, we need a dynamic data structure per inner vector. Since there is no such standard storage scheme, we are free to choose whatever we want. Currently, this access pattern is implemented by mean of small linked array in the LinkedVectorMatrix class. Still to do:''
+
* ''make the granularity of the chunks configurable at runtime ?''
+
* ''write a memory allocator for the chunks shared at the matrix level ?''
+
* ''write a clean linked vector class for reuse.''
+
 
+
 
+
A simpler and more efficient solution, which is already implemented in MatrixBase::operator=, consists to process the input matrix/data in two passes. In the first pass we simply count the number of elements per inner vector of the target matrix. This requires only one integer per inner vector. Then we can already initialize the outer indices (starts of each inner vector) using a prefix sum. Finally, the data are processed a second time to perform the actual insertion. If the non zero coefficients of the input requires some cost to be extracted, then the input must be evaluated to a temporary compressed matrix (with an opposite storage order). This evaluation/copy could be done during the first pass. Still todo: write an easy to use MatrixSetter implementing this concept.
+
 
+
=== Outer coherent access ===
+
 
+
Here each inner vector j is filled randomly, but once we have started to fill the inner vector j, it is forbidden to update any inner vector k with k<j. This scheme occurs in matrix product and triangular solver. The current solution is to allocate a dense vector, fill it, and push it into the sparse matrix once we are done with this inner vector. Currently this strategy is implemented manually when needed, i.e., there is no special class for that yet (unlike LinkedVectorMatrix for inner coherent access).
+
 
+
=== Random access ===
+
 
+
=> currently by mean of an array of std::map but I'd like to try:
+
* array of a custom sorted linked lists (if the number of non zero per inner vector is low, i.e. < 24, this might be much faster)
+
* use a custom binary tree impl. See also the "elastic binary tree" in haproxy (http://haproxy.1wt.eu/) which seems to be 5 times faster than std::map for insertions without being slower for lookups.
+
** hm... actually haproxy's ebt is faster only if elements are not inserted too randomly. anyways this optimization if of very very low priority for now !
+
 
+
== Issues ==
+
 
+
The major issue with sparse matrices is that they can only be efficiently traversed in a specific order, i.e., per column for a CCS matrix and per row for a CRS matrix. The nighmare starts when you have to deal with expressions mixing CCS matrices (column major) and CRS matrices (row major).
+
 
+
 
+
=== Generic API to fill a sparse/dense matrices ===
+
 
+
The current SparseMatrix can be filled like that:
+
<pre>
+
 
SparseMatrix m(rows,cols);
 
SparseMatrix m(rows,cols);
 
m.startFill(2*cols);
 
m.startFill(2*cols);
Line 107: Line 74:
 
m.fill(8,11) = rand();
 
m.fill(8,11) = rand();
 
m.endFill();
 
m.endFill();
</pre>
+
</source>
 
At that point m.nonZeros() == 6 and you cannot add any other nonzero entries. For instance x=m(0,0); will returns zero, while m(0,0)=x; will issue an assert. Of course you can still update an existing nonzero: m(7,2) += 1;.
 
At that point m.nonZeros() == 6 and you cannot add any other nonzero entries. For instance x=m(0,0); will returns zero, while m(0,0)=x; will issue an assert. Of course you can still update an existing nonzero: m(7,2) += 1;.
 
+
However, similarly to RandomSetter, we could add a CoherentSetter wrapper, such that the API would be:
To allow to treat any matrix as a sparse matrix we could define dummy startFill(), fill() and endFill() members to MatrixBase but maybe we could find some better API ? Indeed, while efficient there are two major limitations:
+
* we cannot update the matrix, only set it from zero
+
* to be consistent, for dense matrices startFill() should do a setZero() that is not really nice
+
 
+
For instance we could imagine something based on the facade design pattern where you could request for a random setter or coherent setter or an updater or whatever else is needed ? For dense matrices these facade objects would degenerate to a simple references. This last solution is currently experimented in SparseSetter.
+
 
+
==== FullyCoherentAccessPattern ====
+
 
+
Requirements:
+
* notify the start of the filling
+
* special insertion mechanism
+
* notify the end of the filling
+
 
+
Current API:
+
 
<source lang="cpp">
 
<source lang="cpp">
 
SparseMatrix<float> m;
 
SparseMatrix<float> m;
 
{
 
{
   SparseSetter<MatrixType, FullyCoherentAccessPattern> w(m);
+
   CoherentSetter<MatrixType> w(m);
  w->startFill();
+
 
   for (int j=0; j<cols; ++j)
 
   for (int j=0; j<cols; ++j)
 
     for (int i=0; i<rows; ++i)
 
     for (int i=0; i<rows; ++i)
       if (nonzero) w->fill(i,j) = some_non_zero_value;
+
       if (nonzero(i,j)) w(i,j) = some_non_zero_value;
  w->endFill();
+
 
}
 
}
 
</source>
 
</source>
  
However, the startFill and endFill could be hidden by SparseSetter, thus:
 
<source lang="cpp">
 
SparseMatrix<float> m;
 
{
 
  SparseSetter<MatrixType, FullyCoherentAccessPattern> w(m);
 
  for (int j=0; j<cols; ++j)
 
    for (int i=0; i<rows; ++i)
 
      if (nonzero) w->fill(i,j) = some_non_zero_value;
 
}
 
</source>
 
  
  
==== InnerCoherentAccessPattern ====
+
=== Inner coherent access ===
  
Requirements:
+
Here we assume the coefficients (i,j) are set with increasing inner coordinate i for each outer vector j. For instance the following sequence of coordinates is valid:
* notify the start of the filling (clear)
+
(2,10) (4,7) (1,12) (4,10) (3,12) (2,1)
* special insertion mechanism
+
On the other hand, at that point it is forbidden to set the coefficient (2,12) since (3,12) has already been set. From the user point of view, the solution is to create a temporary row-major matrix, fill it in fully coherent manner, and copy it to the target col-major matrix.
  
Current API:
+
From the implementation point of view, still remains the issue to copy a row-major matrix to a col-major one. A simple and efficient solution, which is already implemented in MatrixBase::operator=, consists to process the input matrix/data in two passes.
<source lang="cpp">
+
# In the first pass we simply count the number of elements per inner vector of the target matrix. This requires only one integer per inner vector. Then we can already initialize the outer indices (starts of each inner vector) using a prefix sum.
SparseMatrix<float> m;
+
# Finally, the data are processed a second time to perform the actual insertion.
{
+
If the non zero coefficients of the input requires some cost to be extracted, then the input must be evaluated to a temporary compressed matrix (with an opposite storage order). This evaluation/copy could be done during the first pass.
  SparseSetter<MatrixType, InnerCoherentAccessPattern> w(m);
+
  w->startFill();
+
  for (int i=0; i<rows; ++i)
+
    for ()
+
      if (nonzero) w->fill(i,rand()) = some_non_zero_value;
+
}
+
</source>
+
  
  
==== OuterCoherentAccessPattern ====
+
=== Outer coherent access ===
  
Requirements:
+
Here each inner vector j is filled randomly, but once we have started to fill the inner vector j, it is forbidden to update any inner vector k with k<j. This scheme occurs in matrix product and triangular solver. Our current solution is called AmbiVector. It implements a vector which can be either a dense one (if the vector appears to be quite dense), or a linked list if the vector is really sparse. However, for a triangular solver with sparse right hand side, the recommended solution seems to be to compute the so called elimination tree.
* notify the start of a column
+
* special insertion function
+
* notify the end of a column
+
  
Proposed API:
 
<source lang="cpp">
 
SparseMatrix<float> m;
 
{
 
  SparseSetter<MatrixType, OuterCoherentAccessPattern> w(m);
 
  for (int j=0; j<cols; ++j)
 
  {
 
    w->startFillInner(j);
 
    for ()
 
      if (nonzero) w->fill(rand(),j) = some_non_zero_value;
 
    w->endFillInner(j);
 
  }
 
}
 
</source>
 
  
==== RandomAccessPattern ====
+
=== Random access ===
  
Current API:
+
Implemented via a RandomSetter wrapper which internally store the matrix coefficient in a set of hash map. Current API:
 
<source lang="cpp">
 
<source lang="cpp">
 
{
 
{
   SparseSetter<MatrixType, RandomAccessPattern> w(m);
+
   RandomSetter<MatrixType> w(m);
   for (...) w->coeffRef(rand(),rand()) = some_non_zero_value;
+
   for (...)
 +
    w(rand(),rand()) = some_non_zero_value;
 
}
 
}
 
</source>
 
</source>
  
  
==== Comments ====
+
== Issues ==
 +
 
 +
The major issue with sparse matrices is that they can only be efficiently traversed in a specific order, i.e., per column for a CCS matrix and per row for a CRS matrix. The nightmare starts when you have to deal with expressions mixing CCS matrices (column major) and CRS matrices (row major).
  
Issues:
 
* currently the SparseSetter works as a pointer to the actual matrix or a temporary matrix, maybe it would be better to only access functions defined in SparseSetter, e.g.:
 
SparseMatrix<float> m;
 
{
 
  SparseSetter<MatrixType, FullyCoherentAccessPattern> w(m);
 
  for (int j=0; j<cols; ++j)
 
    for (int i=0; i<rows; ++i)
 
      if (nonzero) w(i,j) = some_non_zero_value;
 
}
 
* in above we only discussed about filling a matrix from scratch, what about updates ?
 
** when you update you want to either erase previous values or accumulate
 
* the API should allow to use sub-matrix expressions, e.g.:
 
<source lang="cpp">
 
SparseMatrix<float> m;
 
{
 
  SparseSetter<MatrixType, OuterCoherentAccessPattern> w(m);
 
  for (int j=0; j<cols; ++j)
 
  {
 
    w->startFillInner(j);
 
    for (k)
 
      w->col(j) += alpha * m2.col(k);
 
    w->endFillInner(j);
 
  }
 
}
 
</source>
 
  
 
=== Binary operators (aka. operator+) ===
 
=== Binary operators (aka. operator+) ===
  
 
Here the problem is that we cannot mix a CCS with a CRS. In such a case one of the argument have to be evaluated to the other format (let's pick the default format). This requires some modifications in Eigen/Core:
 
Here the problem is that we cannot mix a CCS with a CRS. In such a case one of the argument have to be evaluated to the other format (let's pick the default format). This requires some modifications in Eigen/Core:
* probably needs to add a SparseBit flag
 
 
* needs a more advanced mechanism to determine the storage order of a CwiseBinaryOP<> expression
 
* needs a more advanced mechanism to determine the storage order of a CwiseBinaryOP<> expression
* needs to modify nesting in ei_traits<CwiseBinaryOP<>> such that if an argument is sparse and has a different storage order than the expression itself, then it has to be evaluated to a sparse temporary.
+
* needs to modify nesting in ei_traits<CwiseBinaryOP<> > such that if an argument is sparse and has a different storage order than the expression itself, then it has to be evaluated to a sparse temporary.
* needs to modify ei_eval such that a sparse xpr gets evaluated to a sparse matrix (simply add an optional template parameter equal to Xpr::Flags&SparseBit and do the specialization in Eigen/Sparse)
+
 
+
 
+
=== Transpose ===
+
 
+
Basically, here the challenge is to copy a CRS matrix to a CCS one. I can imagine three strategies:
+
* copy to a temporary column major HashMatrix and then compress it to a CCS (slow)
+
 
+
* perform a column major traversal of the CRS matrix:
+
*** create rows InnerIterators iters;
+
*** for j=0..cols do
+
**** for i=0..rows do
+
***** if iters[i].index()==j then
+
****** res(i,j) = iters[i].value; iters[i]++
+
** of course creating rows iterators might be too expensive so we need a way to provide lightweight iterators where the reference to the expression, the current outer index etc. are not stored. Maybe we could slightly update the InnerIterator such that they can loop over an arbitrary number of outer columns/rows.
+
 
+
* the third options is like the first one but with a much more efficient data structure. Indeed, during a row major processing the coefficients of the column major destination matrix will be set in coherent order per column. So we can use a special temporary sparse matrix with a column/row vector of linked fixed size vectors. This seems to be the best option. If it is a temporary there is no need to convert it to a compact CCS matrix.
+
  
  
Line 300: Line 177:
 
* native support of basics (add, mul, etc)
 
* native support of basics (add, mul, etc)
 
* native support of various iterative linear solvers including various pre-conditioners for selfadjoint and non-selfadjoint matrix (most of them come from ITL)
 
* native support of various iterative linear solvers including various pre-conditioners for selfadjoint and non-selfadjoint matrix (most of them come from ITL)
* can use superLU for direct solvers
+
* can use superLU (direct solver)
 
* provides various sparse matrix formats:
 
* provides various sparse matrix formats:
 
** dense array (col or row) of std::map<int,scalar>
 
** dense array (col or row) of std::map<int,scalar>
Line 334: Line 211:
 
* sparselib++ / IML++: basics, various iterative solvers (in IML++)
 
* sparselib++ / IML++: basics, various iterative solvers (in IML++)
 
* superLU: direct solver
 
* superLU: direct solver
* UMFPACK/CHOLMOD: direct solvers (LU/cholesky), LGPL, depend on lapack
+
* suitesparse by T. Davis (LGPL2.1+). It includes:
 +
** UMFPACK/CHOLMOD: direct solvers (LU/cholesky), depend on BLAS
 +
** LDL: basic LDL^T Cholesky factorization with direct solver
 +
** CSparse: the code of the famous T. Davis's book on sparse matrix
 +
*** it includes basic linear algebra ops, triangular solver, LLT, LU, QR
 +
** CXSparse: extension of CSparse for complex
 
* IETL: eigen problems
 
* IETL: eigen problems
* TAUCS: direct and iterative solvers, depend on lapack and metis
+
* TAUCS: direct and iterative solvers, depend on BLAS and metis
  
  

Revision as of 10:07, 7 November 2008

Preliminaries

What's the plan ?

Ideally:

  • define our own classes for storage and basic algebra (sum, product, triangular solver)
  • a good support of CCS/CRS sparse matrix is certainly the priority as it is the most common storage format
  • we could probably adapt the linear solver algorithms of GMM++ (or ITL) to directly use our matrix classes
    • see the licensing issues
  • provide the ability to use more optimized backends like TAUCS or SuperLU as well as backends for eigen value solvers via a unified API


Current state

For the first step the goal is to focus on CCS (CRS) sparse matrix. See the class SparseMatrix.

  • our SparseMatrix matrix can be filled dynamically in a coherent order, i.e. with increasing i+j*rows (no random write)
  • a SparseMatrix can also be set randomly via the class RandomSetter (it temporarily store the matrix coefficient into a set of hash map, built-in support for std::map, std::hash_map, google::dense_hash_map, and google::sparse_hash_map)
  • unlike several other sparse libraries (e.g., CSparse), the coefficients of our CCS matrices are always sorted.
  • efficient shallow copy for matrices marked as rvalue
  • most of basic operations are supported in a generic way:
    • add, sub, scale, transpose etc. (all via efficient expression templates)
    • efficient transposition (i.e., when a transpose expression gets evaluated)
    • efficient sparse-sparse matrix product (no SSE yet)
    • preliminary support for sub matrices (blocks) but not very good yet
    • triangular solver with a dense right-hand side (vector or matrix)
  • LDLT Cholesky factorization with direct solver
    • adapted from T. Davis's LDL library
  • LLT Cholesky factorization with direct solver
    • for CCS only
    • 3 backends:
      • a built-in one, very basic but useful to perform an incomplete factorization (currently, double and float only)
      • cholmod (currently, double and float only)
      • TAUCS (currently, double and float only)
  • LU factorization with direct solver
    • no built-in implementation yet, 2 external backends:
      • SuperLU (all types, float/double/complex<*>)
      • umfpack (double and complex<double> only)
    • all backends give access to the matrices L and U, the permutations (P and Q) and the determinant


Internally, each expression defines a InnerIterator which allows to efficiently traverse the inner coefficients of a sparse (or dense) matrix. Of course, these InnerIterator can be nested exactly like expressions are nested such that our sparse matrices support expression templates !

  • InnerIterator are implemented for:
    • a default implementation for MatrixBase (suitable for any dense expressions)
    • CwiseUnaryOp
    • CwiseBinaryOp (with some shortcomings, see below)
    • SparseMatrix
    • Block (with some shortcomings, see below)


Write/Update access patterns

When dealing with sparse matrix, a critical operation is the efficient set/update of the coefficients. In order to offer optimal performance it is necessary to propose different solutions according to the required access pattern. For instance, a temporary sparse matrix based on a std::map can handle any kind of access pattern, but it also performs very poorly if you are able to fill a matrix in a coherent order. We can identify four different access pattern schemes with their respective technical solutions, ranging from the most efficient to the most flexible.

Fully coherent access

Here we assume the matrix is set in a fully coherent order, i.e., such that the coefficients (i,j) are set with increasing i + j*outersize where i is the inner coordinates and j the outer coordinate. In such a case we can directly set a compressed sparse matrix as we would fill a dynamic vector. In order to reduce memory allocations/memory copies, it is important to be able to give a hint about the expected number of non-zeros such that we are able to preallocate enough memory.

Updating a sparse matrix using this access pattern can be done by filling a new temporary matrix followed by an efficient shallow copy. The current API is:

SparseMatrix m(rows,cols);
m.startFill(2*cols);
// 2*cols is a hint on the number of nonzero entries
// note that startFill() delete all previous elements in the matrix
m.fill(2,0) = rand();
m.fill(3,0) = rand();
m.fill(0,2) = rand();
m.fill(7,2) = rand();
m.fill(12,9) = rand();
m.fill(8,11) = rand();
m.endFill();

At that point m.nonZeros() == 6 and you cannot add any other nonzero entries. For instance x=m(0,0); will returns zero, while m(0,0)=x; will issue an assert. Of course you can still update an existing nonzero: m(7,2) += 1;. However, similarly to RandomSetter, we could add a CoherentSetter wrapper, such that the API would be:

SparseMatrix<float> m;
{
  CoherentSetter<MatrixType> w(m);
  for (int j=0; j<cols; ++j)
    for (int i=0; i<rows; ++i)
      if (nonzero(i,j)) w(i,j) = some_non_zero_value;
}


Inner coherent access

Here we assume the coefficients (i,j) are set with increasing inner coordinate i for each outer vector j. For instance the following sequence of coordinates is valid:

(2,10) (4,7) (1,12) (4,10) (3,12) (2,1)

On the other hand, at that point it is forbidden to set the coefficient (2,12) since (3,12) has already been set. From the user point of view, the solution is to create a temporary row-major matrix, fill it in fully coherent manner, and copy it to the target col-major matrix.

From the implementation point of view, still remains the issue to copy a row-major matrix to a col-major one. A simple and efficient solution, which is already implemented in MatrixBase::operator=, consists to process the input matrix/data in two passes.

  1. In the first pass we simply count the number of elements per inner vector of the target matrix. This requires only one integer per inner vector. Then we can already initialize the outer indices (starts of each inner vector) using a prefix sum.
  2. Finally, the data are processed a second time to perform the actual insertion.

If the non zero coefficients of the input requires some cost to be extracted, then the input must be evaluated to a temporary compressed matrix (with an opposite storage order). This evaluation/copy could be done during the first pass.


Outer coherent access

Here each inner vector j is filled randomly, but once we have started to fill the inner vector j, it is forbidden to update any inner vector k with k<j. This scheme occurs in matrix product and triangular solver. Our current solution is called AmbiVector. It implements a vector which can be either a dense one (if the vector appears to be quite dense), or a linked list if the vector is really sparse. However, for a triangular solver with sparse right hand side, the recommended solution seems to be to compute the so called elimination tree.


Random access

Implemented via a RandomSetter wrapper which internally store the matrix coefficient in a set of hash map. Current API:

{
  RandomSetter<MatrixType> w(m);
  for (...)
    w(rand(),rand()) = some_non_zero_value;
}


Issues

The major issue with sparse matrices is that they can only be efficiently traversed in a specific order, i.e., per column for a CCS matrix and per row for a CRS matrix. The nightmare starts when you have to deal with expressions mixing CCS matrices (column major) and CRS matrices (row major).


Binary operators (aka. operator+)

Here the problem is that we cannot mix a CCS with a CRS. In such a case one of the argument have to be evaluated to the other format (let's pick the default format). This requires some modifications in Eigen/Core:

  • needs a more advanced mechanism to determine the storage order of a CwiseBinaryOP<> expression
  • needs to modify nesting in ei_traits<CwiseBinaryOP<> > such that if an argument is sparse and has a different storage order than the expression itself, then it has to be evaluated to a sparse temporary.


Matrix product (again !)

Here we have to investigate all the possible combinations for the 2 operands and the results (2*2*2=8 possibilities):

lhs rhs res comments
col col col Loop over rhs in a column major order (j, k), accumulate res.co(j) += rhs(k,j) * lhs.col(k). The accumulation needs a single column temporary which is initialized for each j. If the result of the product is dense (nnz > 4%), then the best is to allocate a dense vector, otherwise a sorted linked list peforms best (the accumulation of each column of lhs is coherent, so we can exploit this coherence to optimize search and insertion in the linked list)
col col row weird case, let's evaluate to a col major temporary and then transpose
row col * Loop over the result coefficient in the preferred order and perform coherent dot products
row row * like col col *
col row col if transpose is fast then evaluate rhs to a column major format, otherwise use a dynamic temporary for the results (array of sorted linked lists)
col row row like col row col


Review of some existing libs

GMM++

  • native support of basics (add, mul, etc)
  • native support of various iterative linear solvers including various pre-conditioners for selfadjoint and non-selfadjoint matrix (most of them come from ITL)
  • can use superLU (direct solver)
  • provides various sparse matrix formats:
    • dense array (col or row) of std::map<int,scalar>
      • pro: easy to implement, relatively fast random access (read/write)
      • cons: no ideal to use as input of the algorithms, one dimension is dense
    • Compressed Col/Row Storage (CCS/CRS) (see http://netlib.org/linalg/html_templates/node91.html)
      • pro: compact storage (overhead = (rows/cols + nnz) * sizeof(int) where nnz = number of non zeros), fast to loop over the non-zeros, compatible with other very optimized C library (TAUCS, SuperLU)
      • cons: random access (GMM++ does not provide write access to such matrices), one dimension is dense
    • block matrix (not documented)


MTL4 / ITL

  • native support of basics (in MTL4)
  • various iterative linear solvers (in ITL)
  • provides CCS/CRS matrix format (read only)
  • writes/update of a sparse matrix are done via a generic facade/proxy:
    • this object allocate a fixed number of coefficients per column (e.g. 5) and manage overflow using std::maps
    • when this object is deleted (after the write operations) this dynamic representation is packed to the underlying matrix
    • for dense matrix this facade object degenerate to the matrix itself allowing a uniform interface to fill/update both dense and sparse matrix


boost::ublas

  • provides only basic linear algebra
  • provides tons of sparse matrix/vector formats:
    • CCS/CRS
    • std::vector< std::map<int,value> >
    • std::map< int, std::map<int,value> >
    • coordinate matrix: similar to CCS/CRS but with unsorted entries and the same coeff can appear multiple time (its real value is the sum of its occurances)
    • and some other variants


Other related libs

  • sparselib++ / IML++: basics, various iterative solvers (in IML++)
  • superLU: direct solver
  • suitesparse by T. Davis (LGPL2.1+). It includes:
    • UMFPACK/CHOLMOD: direct solvers (LU/cholesky), depend on BLAS
    • LDL: basic LDL^T Cholesky factorization with direct solver
    • CSparse: the code of the famous T. Davis's book on sparse matrix
      • it includes basic linear algebra ops, triangular solver, LLT, LU, QR
    • CXSparse: extension of CSparse for complex
  • IETL: eigen problems
  • TAUCS: direct and iterative solvers, depend on BLAS and metis


Links