Todo

From Eigen
Revision as of 13:47, 20 May 2009 by Bjacob (Talk | contribs)

Jump to: navigation, search
Eigen Owl Todo.jpg

Notice to prospective contributors

Before you start spending time on any of these items, contact us about it!

  • Sometimes the Todo is outdated
  • Sometimes some items in the todo haven't been sufficiently discussed

So by contacting us you can avoid any bad surprise.

Transcendental functions

It would be great if Eigen used Julien Pommier's fast, SSE optimized log/exp/sin/cos functions available from this page. Done for sin, cos, log, exp

Available job: do the same for other functions taking inspiration from [cephes] (tan, acos, asin, atan, etc.) note: have to discuss which functions are useful and which aren't.

Special matrices

(bjacob handling this at the moment) Goal; provide good support for: triangular, self-adjoint, diagonal, band, Toeplitz matrices. See this page: SpecialMatrix.

SVD module

The current implementation uses code from JAMA. It supports MxN rectangular matrices with M>=N of real scalar values (no complex yet). It also provides a robust linear solver. job (bjacob handling this at the moment):

  • rewrite the decomposition algorithm in a clean and understandable way,
  • with support for complex matrices,
  • Consider support for reduced versions : http://en.wikipedia.org/wiki/Singular_value_decomposition#Reduced_SVDs
  • and maybe with specializations for small fixed-size if needed (since it a high level algo, it should be automatic though).
    • I don't think we can expect GCC to adequately unroll loops in the case of fixed size. That alone may justify doing fixed-size specializations. Then quite often there also shortcuts available for small sizes.

QR module

We have QR, eigenvalues/vectors in selfadjoint case, eigenvalues/vector in real non symmetric matrix, and matrix norm. Available jobs::

  • implement eigensolver in non-selfadjoint case (complex case)
    • need to get rid of the current code imported from JAMA which is only suitable for real matrix
    • we already have a Hessenberg reduction code for real and complex matrix
    • need to implement the implicit shifted QR algorithm, see http://www.acm.caltech.edu/~mlatini/research/qr_alg-feb04.pdf for a comprehensive introduction to this algorithm and a brief survey of accelerating techniques.
  • implement efficient QR decomposition for 2x2 and 3x3 matrix with optional computation of R (maybe using Gram-Schmitd instead of Householder transformations ?)
  • generalized eigen problem using Cholesky decomposition (done for the Ax=lBx case, perhaps we want to provide API to handle the other cases: BAx=lx and ABx=lx ?)
  • spectral radius, matrix exponential and maybe more operator calculus by applying a cwiseUnaryOp to the eigenvalues vector.

Geometry module

The main part is there: Transform, rotations (Quaternion, Angle-Axis), cross product.

  • Available job: Add methods to create perspective/orthogonal projection matrix ?
  • Available job: Extend the current HyperPlane and ParametrizedLine classes with all features one could expect to see here, and complete this set of basic geometry primitives with at least AlignedBox and HyperSphere. Perhaps some other primitives could be added too.
  • Need a fixed-size Gram-Schmidt here, so it's not clear whether it belongs here or in SVD (or QR !). Is this a case for having it in a separate module?
  • Available job: Add quaternion fitting. This is an algorithm taking two sets of points and finding the rotation making the first set of points as close as possible to the second set of points. Of course this can be done by performing a Gram-Schmidt on the transition matrix, but there is a quaternion-based method that is reportedly faster, and in wide use in chemistry. This would be useful for OpenBabel. See this link: http://server.ccl.net/cca/software/SOURCES/C/quaternion-mol-fit/README.shtml and this paper: http://charles.karney.info/papers/jmgm07.pdf

Statistics module

Available job

This module should support at least mean, variance, standard deviation, sorting, median, etc. all in a matrix-, column- and row-wise flavors. See the respective ML thread for some design discussion (http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2008/09/msg00014.html). Feature list proposal:

  • mean, variance, standard deviation, absolute deviation, skew, kurtosis
  • covariance / principal component analysis (PCA), dimension reductions
    • then the regression module could also be merged here (e.g., fitHyperplane could simply call the PCA function).
  • support for weighted data
  • sorting
  • median, percentiles
  • histogram

Optimization module

Available job

This module should provide easy to use optimization routines for non-linear minimization, differential equations, etc. For instance, I'm thinking to algorithms such as Levenberg-Marquardt for least-squares problems, the multiple Runge-Kutta variants for ODE, etc. Those routines should provide a flexible interface. For instance, with respect to the derivatives, it should allow to either provide them manually, ask the routines to compute them using finite differences or using an automatic differentiation library (eg., adolc).

Note that this module is highly related to the Sparse module and other linear and dense solver modules (LU, Cholesky, etc.).

FFT module

An initial offering was started by Tim Molteno. See this ML thread for the details. He offered a template metaprogramming example that, while clever, had two drawbacks. It was only capable of power-of-2 transforms and the size must be known at compile time.

A new effort was started in May 2009 by Mark Borgerding to port/adapt his kissfft library to Eigen. The work is currently underway under a fork at BitBucket. If all goes well, this will be eventually merged into the mainline development.

Help is welcome. Please send an email on the mailing list or to Mark directly.

The major development milestones are:

  1. one-dimensional FFTs, forward and inverse for float, double,long double: this is roughly complete
  2. real-only optimization (this involves mostly cannibalizing the existing kissfft C code)
  3. multi-dimensional FFTs ( more cannibalizing from C version)
  4. backend for FFTW ( using this requires GPL-compatible application, or the commercial MIT license)
  5. backend for intel Math Kernel Library (for closed source applications)

kissfft, FFTW, and IMKL have much in common that allows for a consistent interface:

  • arbitrary FFT sizes
  • optimization for real-only FFTs (i.e. real x <=> half spectrum X)
  • multi-dimensional FFTs
  • the concept of a "plan"

The three libraries serve different niches in the FFT ecosystem:

  • kissfft is small, reasonably fast, permissive license (BSD revised)
  • FFTW is large, very fast, distributed under the terms of the GPL (v2?)
  • IMKL is large, very fast, commercial library usable in closed software

Allowing the different backends makes sure everybody can get what they need based on their individual requirements regarding licensing, finances, code size, and execution speed. The default choice will be the kissfft code since that offers a good balance between licensing ease, size, and speed. This will be easily overridden on system that have the other choices.

The FFT module is derived from the kissfft project, but has undergone and continues to undergo considerable adaptation for use in C++ and Eigen. The licensing for the derived work has been changed by Mark Borgerding, the kissfft copyright holder, to the Eigen dual LGPL/GPL license.

LU module

  • It's all done now.
    • Well, more fixed-size specializations may still be useful.
    • In particular, we don't have any fixed-size specializations for the LU decomposition itself and the solver. At least the 2x2 and 3x3 cases would be very useful. If possible 4x4 too.

Cholesky module

  • Available job: re-implement LDLt decomposition to use full-pivoting (e.g., see http://www.alglib.net/matrixops/symmetric/ldlt.php, http://eprints.ma.man.ac.uk/1101/01/covered/MIMS_ep2008_56.pdf)
    • we still have to check if it is worth keeping the current version as a very efficient solver for small sizes?
  • Available job: Could be slightly optimized using a block implementation for large sizes. More specifically, the current implementation relies on level-2 BLAS routines while a block implementation enable the use of the usually more efficient level-3 BLAS routines. One the other hand, currently, Eigen exhibits outstanding performances for level-2 BLAS routines, while level-3 one are not 100% optimal yet. Therefore, this optimization first requires to fully optimize large matrix-matrix products and large triangular solver involving a matrix for the right hand side.

Sparse matrices/vectors

  • The goal is to provide a seamless integration of sparse matrix on top of Eigen with both basic linear algebra features and more advanced algorithms like linear and eigen value solvers. Those high level algorithms could be implemented using external libraries while providing a unified API.
  • This module is just started, see the specific SparseMatrix page for the details.

Unit-tests

  • Keep adding more!

Documentation

  • Keep updating
  • all our issues with doxygen are now solved (i.e. we have workarounds)
  • Write special pages on certain topics, illustrate them with examples:
    • Compiler support/options
    • Vectorization
    • Parallelization
    • Fixed-size matrices/vectors
      • in particular, usage with OpenGL
    • Dynamic-size matrices/vectors


Core module

This module is essentially complete and working, but needs improvement in these areas:

  • Vectorization
    • Vectorize diagonal product. The current diagonal product takes an actual matrix that just happens to be diagonal. We need instead it to take a DiagonalMatrix expression (that's just a wrapper around the diagonal vector). We can then remove the old path as it can be implemented (if needed, uncommon anyway) as m.diagonal().asDiagonal() * other. Since m.diagonal() doesn't have the PacketAccessBit, we must ensure that this propagates to asDiagonal() so Eigen knows that it can't vectorize this latter product.
    • vectorize fixed sizes that aren't a multiple of a packet but still are bigger than a packet. Example: Linear operations (like cwise expression assignment) on Matrix3f and Matrix3d.
      • Hm on second thought, doing that without runtime checks would require the data to be aligned. Which can't be the default as it can waste memory (though very little in the case of Matrix3d). But then if it's not the default it's hard to offer as an option because of binary compatibility...
    • Remove unaligned store, replace by a strategy of doing without-vectorization the first few unaligned coeffs of the left-hand-side expression in assignment. Also find a way to allow aligned loads of some parts of the rhs expression when it appears to be aligned with the result. Maybe we could only handle expressions such as: "mat.block() += some_xpr;" which occur very often in high level algorithms. Since, such expressions get transformed to: "mat.block() = BinaryOp(mat.block(),some_xpr);" one solution would be to update the MatrixBase::OP= operators such that *this get transformed to an aligned Map<> (assuming we add a stride to Map) or a "force align block" (requires to add a template parameter to Block)... For instance:
 MatrixBase::operator+=(other) {
   derived() = Map<Matrix<Scalar,RowsAtCompileTime,ColsAtCompileTime,...>, Aligned>(&coeffRef(0,0), cols(), rows(), stride()) + other;
 }
    • vectorize Redux : can use Sum as a starting point. Needs a few more vectorization intrinsics, and perhaps a 'redux' field in the functor traits.
    • vectorize more coeff wise operations (e.g., comparisons)
      • to vectorize comparisons we first need to change the comparison operators to return bitmasks of the same size as the scalar type with all bits equal to ones (or all zero). We can do that only for vectorized types. And good news: the compiler is clever enough to remove this extra stuff when it is not needed (even for double/int64 on a 32 bits system).
      • the second step is to vectorize select using bitmask operations, however make sure it is worth it because here both the "then" and "else" cases will be evaluated (use the expression cost)
      • must vectorize all() and any() as well, otherwise it is useless !
    • Explicitly vectorized loops are usually surrounded by two small scalar loops. We therefore have to take care the compiler does not try to vectorize them itself ! Hopefully it is smart enough to find out the iteration counts is always smaller than a packet size. Otherwise, gcc-4.4 will provide new function attributes allowing to tweak any -f options (as well as several -m options) on a per function basis.
  • use backends? (probaly LAPACK; not much point in a BLAS backend as we already beat most free BLAS's)
    • letting big algorithms (such as inversion, QR...) use a LAPACK backend should be feasible (because here we deal with plain matrices not expressions) and quite useful (because we don't really hope to have all big algorithms optimized ourselved).
    • Backends would probably be especially useful for the Sparse module.


  • Fast product for large matrices
    • See EigenInternals for an explanation of our cache friendly algorithm.
    • Issues/todo:
      • B gets evaluated if it is not column-major => offer two different algorithms ?
      • the algorithm actually computes C+=A*B, currently C+=(A*B).lazy() gets indeed optimized via a specialization of operator+=. However D=C+(A*B).lazy() is not optimized (extra temporary + set to zero) while it could be if rewritten like this: (D+=C)+=(A*B).lazy().
      • some trivial expressions get evaluated like negate (take care to write c = -(a*b)) or conjugate (only for complex)
        • more generally, if A as to be evaluated then it could be directly evaluated to a blocky form.


  • Misc Features
    • Add stride to Map ? Note the current workaround is to use block, e.g.: Map<MatrixXf>(ptr,stride,cols).block(offset,0,rows,cols)
    • The respective implementations of Map and the Block<DirectAccessBit> specialization could be somehow merged since they are basically the same, only the constructors and template parameters slightly diverge. In practice one could inherit the other, the question being in which order ? This actually depends on the previous point.
    • Shallow copy of dynamic matrix: we already have efficient swap for them, what is really needed is to add a .isTemporary()/.markAsTemporary() to MatrixStorage which could use the last bit if the size to store the state. Then the Matrix constructor and operator= that take the same Matrix type as input can simply check the temporary state to select between a simple swap or a deep copy (see the Sparse module). Note that since the temporary state is know at compile time we could also use a flag but then we create artificial new types.
    • Peel loops where appropriate.
    • (NOTE: isn't this outdated? We do use alloca currently) Consider the use of alloca (dynamic allocation on the stack for dynamic-sized temporary). After some benchmarks, it seems this really pay off for matrices smaller than 16x16, for larger matrices the gain is not so obvious. In practice alloca cannot be called in MatrixStorage, and it as to be called in the function creating the temporary. This might be implemented using Matrix::map().
    • Another fine tuning idea (very low priority): reuse a previously allocated temporary when possible. For instance, theoretically, the Cholesky decomposition could be done "in place", by working directly on the input matrix. Currently, Cholesky stores its own data. However, a typical use case is: (m.adjoint() * m).cholesky()... In that case, if m is large enough, the matrix product creates a temporary and Cholesky too while the latter could simply "pick" the data allocated by the product since we know this data cannot be reused by any other expression. I guess this concept could be extended to a more general mechanism. (this is only applicable for dynamic-size matrix and in the case we don't use alloca, cf. the previous point)

Array module

  • Misc features
    • add shift(int) and rotate(int) for vectors. The arg for both is an integer, guess negative would mean "on the left", and positive "on the right", thought that seems quite western-oriented.