Difference between revisions of "Todo"

From Eigen
Jump to: navigation, search
(LU module)
(Cholesky module)
Line 51: Line 51:
  
 
== Cholesky module ==
 
== Cholesky module ==
* '''Available job:''' re-implement LDLt decomposition to use full-pivoting (e.g., see http://www.alglib.net/matrixops/symmetric/ldlt.php, http://eprints.ma.man.ac.uk/1101/01/covered/MIMS_ep2008_56.pdf)
 
** we still have to check if it is worth keeping the current version as a very efficient solver for small sizes?
 
 
* '''Available job:''' Could be slightly optimized using a block implementation for large sizes. More specifically, the current implementation relies on level-2 BLAS routines while a block implementation enable the use of the usually more efficient level-3 BLAS routines. One the other hand, currently, Eigen exhibits outstanding performances for level-2 BLAS routines, while level-3 one are not 100% optimal yet. Therefore, this optimization first requires to fully optimize large matrix-matrix products and large triangular solver involving a matrix for the right hand side.
 
* '''Available job:''' Could be slightly optimized using a block implementation for large sizes. More specifically, the current implementation relies on level-2 BLAS routines while a block implementation enable the use of the usually more efficient level-3 BLAS routines. One the other hand, currently, Eigen exhibits outstanding performances for level-2 BLAS routines, while level-3 one are not 100% optimal yet. Therefore, this optimization first requires to fully optimize large matrix-matrix products and large triangular solver involving a matrix for the right hand side.
  

Revision as of 01:05, 21 May 2009

Eigen Owl Todo.jpg

Notice to prospective contributors

Before you start spending time on any of these items, contact us about it!

  • Sometimes the Todo is outdated
  • Sometimes some items in the todo haven't been sufficiently discussed

So by contacting us you can avoid any bad surprise.

Complex numbers

Available jobs:

  • Implement our own complex numbers class
    • Of course we continue to support std::complex too.
    • Name: could be ei_complex or Complex (in namespace Eigen)
    • Letter in typedefs: could be "e" so that would give "MatrixXef"
    • Rationale: std::complex has many limitations
      • it is not officially using an array[2] storage layout which makes many optimizations unsafe
      • its real() and imag() return by value in certain implementations (for instance, GCC 4.0 / Mac)
      • with GCC, it doesn't take advantage of vectorization. For complex<double> since the size is 16 bytes we really need to vectorize internally.
    • So, code such a new complex class and make sure that Eigen works with it (update unit tests and documentation).
  • Optimize Eigen's handling of complex matrices.
    • For example, currently, if x is a real number and m is a complex matrix, then doing "x*m" first converts x to complex and does a complex product of every entry in m; this is bad, it would be much faster to keep x real and do real products which are much faster.
      • This is less trivial than it seems and requires reworking our Functors. Currently they take a Scalar parameter, I'm not sure why, I think that we could do better with less hassle if they didn't and instead, inside them the methods were templated.

Geometry module

The main part is there: Transform, rotations (Quaternion, Angle-Axis), cross product.

  • Available job: Add methods to create perspective/orthogonal projection matrix ?
  • Available job: Extend the current HyperPlane and ParametrizedLine classes with all features one could expect to see here, and complete this set of basic geometry primitives with at least AlignedBox and HyperSphere. Perhaps some other primitives could be added too.
  • Need a fixed-size Gram-Schmidt here, so it's not clear whether it belongs here or in SVD (or QR !). Is this a case for having it in a separate module?
  • Available job: Add quaternion fitting. This is an algorithm taking two sets of points and finding the rotation making the first set of points as close as possible to the second set of points. Of course this can be done by performing a Gram-Schmidt on the transition matrix, but there is a quaternion-based method that is reportedly faster, and in wide use in chemistry. This would be useful for OpenBabel. See this link: http://server.ccl.net/cca/software/SOURCES/C/quaternion-mol-fit/README.shtml and this paper: http://charles.karney.info/papers/jmgm07.pdf Update: see this thread on the mailing list: http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/04/msg00173.html (we still need someone to actually do make it happen)

Optimization module

Available job

This module should provide easy to use optimization routines for non-linear minimization, differential equations, etc. For instance, I'm thinking to algorithms such as Levenberg-Marquardt for least-squares problems, the multiple Runge-Kutta variants for ODE, etc. Those routines should provide a flexible interface. For instance, with respect to the derivatives, it should allow to either provide them manually, ask the routines to compute them using finite differences or using an automatic differentiation library (eg., adolc).

Note that this module is highly related to the Sparse module and other linear and dense solver modules (LU, Cholesky, etc.).

LU module

Available jobs:

  • Write fixed-size specializations. Not only of the LU decomposition but of all the methods like solve() and computeImage(), etc.
    • So far we only have specializations for inverse() and determinant(), that's not enough.
  • The benefit of vectorization is not nearly as high as it should be, there is room for improvement.
    • In PartialLU, this should be a matter of examining the asm carefully.
    • In (full pivoting) LU, the main limitation is that the maxCoeff() visitor is not vectorized. So vectorizing visitors would be most useful here. If not possible to do in such generality then consider adding a specialized function for that.
  • Make sure that everything goes as fast for row-major matrices as it does for column-major matrices. At least insofar as it doesn't require to write 2x more code everywhere. Also, the choice of code paths should be made at compile-time, i.e. compile only 1 path.

Cholesky module

  • Available job: Could be slightly optimized using a block implementation for large sizes. More specifically, the current implementation relies on level-2 BLAS routines while a block implementation enable the use of the usually more efficient level-3 BLAS routines. One the other hand, currently, Eigen exhibits outstanding performances for level-2 BLAS routines, while level-3 one are not 100% optimal yet. Therefore, this optimization first requires to fully optimize large matrix-matrix products and large triangular solver involving a matrix for the right hand side.

Sparse matrices/vectors

  • The goal is to provide a seamless integration of sparse matrix on top of Eigen with both basic linear algebra features and more advanced algorithms like linear and eigen value solvers. Those high level algorithms could be implemented using external libraries while providing a unified API.
  • This module is just started, see the specific SparseMatrix page for the details.

Unit-tests

  • Keep adding more!

Documentation

  • Keep updating
  • The online tutorial needs love. badly.
    • Especially the Geometry and Advances linear algebra pages
  • Write special pages on certain topics, illustrate them with examples:
    • compile-time switches (preprocessor symbols like EIGEN_NO_DEBUG etc)
    • Compiler support/options and Vectorization : already covered in the FAQ, but feel free to start a more detailed dox page
    • basic notions about Eigen expressions, MatrixBase, the standard typedefs such as PlainMatrixType and Scalar...
    • Thread safety
    • Fixed-size matrices/vectors and fixed-size variants of many methods like block()
      • in particular, usage with OpenGL
    • Dynamic-size matrices/vectors


Core module

This module is essentially complete and working, but needs improvement in these areas:

  • Vectorization
    • Vectorize diagonal product. The current diagonal product takes an actual matrix that just happens to be diagonal. We need instead it to take a DiagonalMatrix expression (that's just a wrapper around the diagonal vector). We can then remove the old path as it can be implemented (if needed, uncommon anyway) as m.diagonal().asDiagonal() * other. Since m.diagonal() doesn't have the PacketAccessBit, we must ensure that this propagates to asDiagonal() so Eigen knows that it can't vectorize this latter product.
    • vectorize fixed sizes that aren't a multiple of a packet but still are bigger than a packet. Example: Linear operations (like cwise expression assignment) on Matrix3f and Matrix3d.
      • Hm on second thought, doing that without runtime checks would require the data to be aligned. Which can't be the default as it can waste memory (though very little in the case of Matrix3d). But then if it's not the default it's hard to offer as an option because of binary compatibility...
    • Remove unaligned store, replace by a strategy of doing without-vectorization the first few unaligned coeffs of the left-hand-side expression in assignment. Also find a way to allow aligned loads of some parts of the rhs expression when it appears to be aligned with the result. Maybe we could only handle expressions such as: "mat.block() += some_xpr;" which occur very often in high level algorithms. Since, such expressions get transformed to: "mat.block() = BinaryOp(mat.block(),some_xpr);" one solution would be to update the MatrixBase::OP= operators such that *this get transformed to an aligned Map<> (assuming we add a stride to Map) or a "force align block" (requires to add a template parameter to Block)... For instance:
 MatrixBase::operator+=(other) {
   derived() = Map<Matrix<Scalar,RowsAtCompileTime,ColsAtCompileTime,...>, Aligned>(&coeffRef(0,0), cols(), rows(), stride()) + other;
 }
    • vectorize Redux : can use Sum as a starting point. Needs a few more vectorization intrinsics, and perhaps a 'redux' field in the functor traits.
    • vectorize more coeff wise operations (e.g., comparisons)
      • to vectorize comparisons we first need to change the comparison operators to return bitmasks of the same size as the scalar type with all bits equal to ones (or all zero). We can do that only for vectorized types. And good news: the compiler is clever enough to remove this extra stuff when it is not needed (even for double/int64 on a 32 bits system).
      • the second step is to vectorize select using bitmask operations, however make sure it is worth it because here both the "then" and "else" cases will be evaluated (use the expression cost)
      • must vectorize all() and any() as well, otherwise it is useless !
    • Explicitly vectorized loops are usually surrounded by two small scalar loops. We therefore have to take care the compiler does not try to vectorize them itself ! Hopefully it is smart enough to find out the iteration counts is always smaller than a packet size. Otherwise, gcc-4.4 will provide new function attributes allowing to tweak any -f options (as well as several -m options) on a per function basis.
  • use backends? (probaly LAPACK; not much point in a BLAS backend as we already beat most free BLAS's)
    • letting big algorithms (such as inversion, QR...) use a LAPACK backend should be feasible (because here we deal with plain matrices not expressions) and quite useful (because we don't really hope to have all big algorithms optimized ourselved).
    • Backends would probably be especially useful for the Sparse module.


  • Fast product for large matrices
    • See EigenInternals for an explanation of our cache friendly algorithm.
    • Issues/todo:
      • B gets evaluated if it is not column-major => offer two different algorithms ?
      • the algorithm actually computes C+=A*B, currently C+=(A*B).lazy() gets indeed optimized via a specialization of operator+=. However D=C+(A*B).lazy() is not optimized (extra temporary + set to zero) while it could be if rewritten like this: (D+=C)+=(A*B).lazy().
      • some trivial expressions get evaluated like negate (take care to write c = -(a*b)) or conjugate (only for complex)
        • more generally, if A as to be evaluated then it could be directly evaluated to a blocky form.


  • Misc Features
    • Add stride to Map ? Note the current workaround is to use block, e.g.: Map<MatrixXf>(ptr,stride,cols).block(offset,0,rows,cols)
    • The respective implementations of Map and the Block<DirectAccessBit> specialization could be somehow merged since they are basically the same, only the constructors and template parameters slightly diverge. In practice one could inherit the other, the question being in which order ? This actually depends on the previous point.
    • Shallow copy of dynamic matrix: we already have efficient swap for them, what is really needed is to add a .isTemporary()/.markAsTemporary() to MatrixStorage which could use the last bit if the size to store the state. Then the Matrix constructor and operator= that take the same Matrix type as input can simply check the temporary state to select between a simple swap or a deep copy (see the Sparse module). Note that since the temporary state is know at compile time we could also use a flag but then we create artificial new types.
    • Peel loops where appropriate.
    • (NOTE: isn't this outdated? We do use alloca currently) Consider the use of alloca (dynamic allocation on the stack for dynamic-sized temporary). After some benchmarks, it seems this really pay off for matrices smaller than 16x16, for larger matrices the gain is not so obvious. In practice alloca cannot be called in MatrixStorage, and it as to be called in the function creating the temporary. This might be implemented using Matrix::map().
    • Another fine tuning idea (very low priority): reuse a previously allocated temporary when possible. For instance, theoretically, the Cholesky decomposition could be done "in place", by working directly on the input matrix. Currently, Cholesky stores its own data. However, a typical use case is: (m.adjoint() * m).cholesky()... In that case, if m is large enough, the matrix product creates a temporary and Cholesky too while the latter could simply "pick" the data allocated by the product since we know this data cannot be reused by any other expression. I guess this concept could be extended to a more general mechanism. (this is only applicable for dynamic-size matrix and in the case we don't use alloca, cf. the previous point)

Array module

  • Misc features
    • add shift(int) and rotate(int) for vectors. The arg for both is an integer, guess negative would mean "on the left", and positive "on the right", thought that seems quite western-oriented.

Transcendental functions

It would be great if Eigen used Julien Pommier's fast, SSE optimized log/exp/sin/cos functions available from this page. Done for sin, cos, log, exp

Available job: do the same for other functions taking inspiration from [cephes] (tan, acos, asin, atan, etc.) note: have to discuss which functions are useful and which aren't.

Special matrices

(bjacob handling this at the moment) Goal; provide good support for: triangular, self-adjoint, diagonal, band, Toeplitz matrices. See this page: SpecialMatrix.

SVD module

The current implementation uses code from JAMA. It supports MxN rectangular matrices with M>=N of real scalar values (no complex yet). It also provides a robust linear solver. job (bjacob handling this at the moment):

  • rewrite the decomposition algorithm in a clean and understandable way,
  • with support for complex matrices,
  • Consider support for reduced versions : http://en.wikipedia.org/wiki/Singular_value_decomposition#Reduced_SVDs
  • and maybe with specializations for small fixed-size if needed (since it a high level algo, it should be automatic though).
    • I don't think we can expect GCC to adequately unroll loops in the case of fixed size. That alone may justify doing fixed-size specializations. Then quite often there also shortcuts available for small sizes.

QR module

at the moment the following seem mostly being taken care of by Andrea and bjacob

  • implement Givens QR : taken care of by Andrea
  • implement efficient QR decomposition for 2x2 and 3x3 matrix with optional computation of R (maybe using Gram-Schmitd instead of Householder transformations ? or Givens)
  • polish (partially rewrite) eigensolvers, probably split them away in a separate module (bjacob handling this, as this is very similar to SVD)
    • see http://www.acm.caltech.edu/~mlatini/research/qr_alg-feb04.pdf for a comprehensive introduction to shifted QR and a brief survey of accelerating techniques.
    • spectral radius, matrix exponential and maybe more operator calculus by applying a cwiseUnaryOp to the eigenvalues vector.
    • available job: generalized eigen problem using Cholesky decomposition (done for the Ax=lBx case, perhaps we want to provide API to handle the other cases: BAx=lx and ABx=lx ?)

Statistics module

This seems to be taken care of by Marton, see his branch at http://bitbucket.org/marton78/eigen2/ , feel free to contact him to offer help...

This module should support at least mean, variance, standard deviation, sorting, median, etc. all in a matrix-, column- and row-wise flavors. See the respective ML thread for some design discussion (http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2008/09/msg00014.html). Feature list proposal:

  • mean, variance, standard deviation, absolute deviation, skew, kurtosis
  • covariance / principal component analysis (PCA), dimension reductions
    • then the regression module could also be merged here (e.g., fitHyperplane could simply call the PCA function).
  • support for weighted data
  • sorting
  • median, percentiles
  • histogram

FFT module

An initial offering was started by Tim Molteno. See this ML thread for the details. He offered a template metaprogramming example that, while clever, had two drawbacks. It was only capable of power-of-2 transforms and the size must be known at compile time.

A new effort was started in May 2009 by Mark Borgerding to port/adapt his kissfft library to Eigen. The work is currently underway under a fork at BitBucket. If all goes well, this will be eventually merged into the mainline development.

Help is welcome. Please send an email on the mailing list or to Mark directly.

The major development milestones are:

  1. one-dimensional FFTs, forward and inverse for float, double,long double: this is roughly complete
  2. real-only optimization (this involves mostly cannibalizing the existing kissfft C code)
  3. multi-dimensional FFTs ( more cannibalizing from C version)
  4. backend for FFTW ( using this requires GPL-compatible application, or the commercial MIT license)
  5. backend for intel Math Kernel Library (for closed source applications)

kissfft, FFTW, and IMKL have much in common that allows for a consistent interface:

  • arbitrary FFT sizes
  • optimization for real-only FFTs (i.e. real x <=> half spectrum X)
  • multi-dimensional FFTs
  • the concept of a "plan"

The three libraries serve different niches in the FFT ecosystem:

  • kissfft is small, reasonably fast, permissive license (BSD revised)
  • FFTW is large, very fast, distributed under the terms of the GPL (v2?)
  • IMKL is large, very fast, commercial library usable in closed software

Allowing the different backends makes sure everybody can get what they need based on their individual requirements regarding licensing, finances, code size, and execution speed. The default choice will be the kissfft code since that offers a good balance between licensing ease, size, and speed. This will be easily overridden on system that have the other choices.

The FFT module is derived from the kissfft project, but has undergone and continues to undergo considerable adaptation for use in C++ and Eigen. The licensing for the derived work has been changed by Mark Borgerding, the kissfft copyright holder, to the Eigen dual LGPL/GPL license.