Difference between revisions of "Todo"

From Eigen
Jump to: navigation, search
(Core module)
(Core module)
Line 4: Line 4:
  
 
This module is essentially complete and working, but needs improvement in these areas:
 
This module is essentially complete and working, but needs improvement in these areas:
 +
 +
* Better partial redux API
 +
** the current approach rather limited and the user has to deal with functors to use it, e.g., to compute the norm of each column of a matrix you have to write:
 +
res = m.square().template verticalRedux<ei_scalar_sum_op<typename MatrixType::Scalar> >().cwise().sqrt();
 +
A simple for loop is probably more readable:
 +
for (int i=0; i<m.cols(); ++i) res[i] = m.col(i).norm();
 +
So it would be nice to be able to write something more high level maybe like:
 +
  res = m.columnWise().norm();
 +
  res = m.foreachColumn().norm();
 +
where columnWise or whatever name would return a proxy with appropriate redux functions (in the spirit of Cwise).
 +
  
 
* Vectorization
 
* Vectorization
Line 24: Line 35:
  
 
* Misc Features
 
* Misc Features
 +
** Shallow copy of dynamic matrix: we already have efficient swap for them, what is really needed is to add a .isTemporary()/.markAsTemporary() to MatrixStorage which could use the last bit if the size to store the state. Then the Matrix constructor and operator= that take the same Matrix type as input can simply check the temporary state to select between a simple swap or a deep copy (see the Sparse module). Note that since the temporary state is know at compile time we could also use a flag but then we create artificial new types.
 
** Peel loops where appropriate.
 
** Peel loops where appropriate.
 
** Consider the use of alloca (dynamic allocation on the stack for dynamic-sized temporary). After some benchmarks, it seems this really pay off for matrices smaller than 16x16, for larger matrices the gain is not so obvious. In practice alloca cannot be called in MatrixStorage, and it as to be called in the function creating the temporary. This might be implemented using Matrix::map().
 
** Consider the use of alloca (dynamic allocation on the stack for dynamic-sized temporary). After some benchmarks, it seems this really pay off for matrices smaller than 16x16, for larger matrices the gain is not so obvious. In practice alloca cannot be called in MatrixStorage, and it as to be called in the function creating the temporary. This might be implemented using Matrix::map().
** Another fine tuning idea (very low priority): reuse a previously allocated temporary when possible. For instance, theoretically, the Cholesky decomposition could be done "in place", my working directly on the input matrix. Currently, Cholesky stores its own data. However, a typical use case is: (m.adjoint() * m).cholesky()... In that case, if m is large enough, the matrix product creates a temporary and Cholesky too while the latter could simply "pick" the data allocated by the product since we know this data cannot be reused by any other expression. I guess this concept could be extended to a more general mechanism. (this is only applicable for dynamic-size matrix and in the case we don't use alloca, cf. the previous point)
+
** Another fine tuning idea (very low priority): reuse a previously allocated temporary when possible. For instance, theoretically, the Cholesky decomposition could be done "in place", by working directly on the input matrix. Currently, Cholesky stores its own data. However, a typical use case is: (m.adjoint() * m).cholesky()... In that case, if m is large enough, the matrix product creates a temporary and Cholesky too while the latter could simply "pick" the data allocated by the product since we know this data cannot be reused by any other expression. I guess this concept could be extended to a more general mechanism. (this is only applicable for dynamic-size matrix and in the case we don't use alloca, cf. the previous point)
  
 
== LU module ==
 
== LU module ==

Revision as of 12:57, 12 July 2008

Eigen Owl Todo.jpg

Core module

This module is essentially complete and working, but needs improvement in these areas:

  • Better partial redux API
    • the current approach rather limited and the user has to deal with functors to use it, e.g., to compute the norm of each column of a matrix you have to write:
res = m.square().template verticalRedux<ei_scalar_sum_op<typename MatrixType::Scalar> >().cwise().sqrt();

A simple for loop is probably more readable:

for (int i=0; i<m.cols(); ++i) res[i] = m.col(i).norm();

So it would be nice to be able to write something more high level maybe like:

 res = m.columnWise().norm();
 res = m.foreachColumn().norm();

where columnWise or whatever name would return a proxy with appropriate redux functions (in the spirit of Cwise).


  • Vectorization
    • Remove unaligned store, replace by a strategy of doing without-vectorization the first few unaligned coeffs of the left-hand-side expression in assignment.
    • vectorize Redux : can use Sum as a starting point. Needs a few more vectorization intrinsics, and perhaps a 'redux' field in the functor traits.
  • BLAS/LAPACK backend?
    • BLAS: apparently that would only be useful for matrix product and backward substitution. However, Gael already optimized the matrix product enough to beat the best free BLAS out there (though some proprietary BLAS such as MKL are still faster)... so perhaps a BLAS backend is not the top priority.
    • LAPACK: letting big algorithms (such as inversion, QR...) use a LAPACK backend should be easier (because here we deal with plain matrices not expressions) and more useful (because there are too many different algorithms to optimize for us to do it ourselves).


  • Fast product for large matrices
    • See EigenInternals for an explanation of our cache friendly algorithm.
    • Issues/todo:
      • B gets evaluated if it is not column-major => offer two different algorithms ?
      • the algorithm actually computes C+=A*B, currently C+=(A*B).lazy() gets indeed optimized via a specialization of operator+=. However D=C+(A*B).lazy() is not optimized (extra temporary + set to zero) while it could be if rewritten like this: (D+=C)+=(A*B).lazy().
      • some trivial expressions get evaluated like negate (take care to write c = -(a*b)) or conjugate (only for complex)
        • more generally, if A as to be evaluated then it could be directly evaluated to a blocky form.


  • Misc Features
    • Shallow copy of dynamic matrix: we already have efficient swap for them, what is really needed is to add a .isTemporary()/.markAsTemporary() to MatrixStorage which could use the last bit if the size to store the state. Then the Matrix constructor and operator= that take the same Matrix type as input can simply check the temporary state to select between a simple swap or a deep copy (see the Sparse module). Note that since the temporary state is know at compile time we could also use a flag but then we create artificial new types.
    • Peel loops where appropriate.
    • Consider the use of alloca (dynamic allocation on the stack for dynamic-sized temporary). After some benchmarks, it seems this really pay off for matrices smaller than 16x16, for larger matrices the gain is not so obvious. In practice alloca cannot be called in MatrixStorage, and it as to be called in the function creating the temporary. This might be implemented using Matrix::map().
    • Another fine tuning idea (very low priority): reuse a previously allocated temporary when possible. For instance, theoretically, the Cholesky decomposition could be done "in place", by working directly on the input matrix. Currently, Cholesky stores its own data. However, a typical use case is: (m.adjoint() * m).cholesky()... In that case, if m is large enough, the matrix product creates a temporary and Cholesky too while the latter could simply "pick" the data allocated by the product since we know this data cannot be reused by any other expression. I guess this concept could be extended to a more general mechanism. (this is only applicable for dynamic-size matrix and in the case we don't use alloca, cf. the previous point)

LU module

  • Inversion: it's done, but more work is needed:
    • fixed-size 4x4: either merge the vectorization code by Markos (hopefully making it work with SSE as well as AltiVec) or improve Eigen's generic vectorization code enough to achieve comparable performance.
    • general case: evaluate doing at least one step of divide-and-conquer as in the size 4 case, to reduce overall complexity and improve parallelizability.
  • Determinant: only fixed-size <= 4 is done, the general case remains to do. Perhaps make it just use a LU decomposition, as large determinants are seldom used anyway.
  • LU decomposition: to do. Should have optimized paths for fixed-size <= 4. Very important, as this will be our main device for exact solving.
    • provide isInvertible(), inverse(), determinant(). It's not redundant to have it also here. Once the LU decomposition is computed these are much faster to compute.
    • provide rank(), kernelBasis(), imageBasis(), someAntecedent(vector).

Cholesky module

This would provide nice exact-solving capabilities for positive matrices.

  • The code for Cholesky decomposition and positive solver is done
  • still need to check the API
  • applications: positive exact solving, generalized eigen problems, etc.
  • should linear regression use that, or SVD? I heard SVD is ideal for regression, but still Cholesky seems good too. At least for now one could implement linear regression with Cholesky, as this is needed soon (required for libavogadro to port to eigen2).

QR module

We have QR, eigenvalues/vectors in selfadjoint case, eigenvalues/vector in real non symmetric matrix, and matrix norm. Todo:

  • implement eigensolver in non-selfadjoint case (complex case)
    • need to get rid of the current code imported from JAMA which is only suitable for real matrix
    • we already have a Hessenberg reduction code for real and complex matrix
    • need to implement the implicit shifted QR algorithm, see http://www.acm.caltech.edu/~mlatini/research/qr_alg-feb04.pdf for a comprehensive introduction to this algorithm and a brief survey of accelerating techniques.
  • implement efficient QR decomposition for 2x2 and 3x3 matrix with optional computation of R (maybe using Gram-Schmitd instead of Householder transformations ?)
  • generalized eigen problem using Cholesky decomposition (here or in Cholesky ?), spectral radius, matrix exponential and maybe more operator calculus by applying a cwiseUnaryOp to the eigenvalues vector.

SVD module

To do. Should provide SVD decomposition, pseudoinverse, what the vision folks need, and least squares / linear regression.

  • Maybe Gram-Schmidt goes here? It's really similar to SVD.

Geometry module

The main part is there: Transform, rotations (Quaternion, EulerAngles, Angle-Axis), cross product. Still need:

  • API review
  • Extend EulerAngles to support any convention ?
  • Add methods to create perspective/orthogonal projection matrix ?
  • Need a fixed-size Gram-Schmidt here, so it's not clear whether it belongs here or in SVD (or QR !). Is this a case for having it in a separate module?

Sparse matrices/vectors

  • The goal is to provide a seamless integration of sparse matrix on top of Eigen with both basic linear algebra features and more advanced algorithms like linear and eigen value solvers. Those high level algorithms could be implemented using external libraries while providing a unified API.
  • This module is just started, see the specific SparseMatrix page for the details.

Unit-tests

  • Keep adding more!

Documentation

  • Keep updating
  • all our issues with doxygen are now solved (i.e. we have workarounds)
  • Write special pages on certain topics, illustrate them with examples:
    • Compiler support/options
    • Vectorization
    • Parallelization
    • Fixed-size matrices/vectors
      • in particular, usage with OpenGL
    • Dynamic-size matrices/vectors