3.3

From Eigen
Revision as of 15:08, 16 June 2015 by Ggael (Talk | contribs)

Jump to: navigation, search

Eigen 3.3 was released on Mmmm DD, 201Y.

Eigen 3.3 can be downloaded from the Download section on the Main Page. Since Eigen 3.2, the 3.3 development branch received more than 1400 commits [1] representing numerous major changes which are summarized below.


Changes already included in the 3.2.5 release

Eigen 3.3 includes all bug-fixes and improvements of the 3.2 branch, including:

  • Improve robustness of SimplicialLDLT to semidefinite problems by correctly handling structural zeros in AMD reordering.
  • Add a determinant() method to PermutationMatrix: and SparseLU.
  • Various numerical robustness improvements in JacobiSVD, LDLT, 2x2 and 3x3 direct eigenvalues, ColPivHouseholderQR, FullPivHouseholderQR, RealSchur, BiCGSTAB, SparseLU, SparseQR, stableNorm(), Hyperplane::Through(a,b,c), Quaternion::angularDistance, SPQR.
  • Enable Mx0 * 0xN matrix products.
  • EIGEN_STACK_ALLOCATION_LIMIT: Raise its default value to 128KB, make use of it to assert on maximal fixed size object, and allows it to be 0 to mean "no limit"

See the respective change-logs for the details: 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5

Expression evaluators

In Eigen 3.3, the evaluation mechanism of expressions has been completely rewritten. Even though this is really a major change, this mostly concerns internal details and most user should not notice it. In a nutshell, until Eigen 3.3, the evaluation strategy of expressions and subexpressions was decided at the construction time of expression in a bottom-up approach. The strategy consists in completely deferring all these choices until the whole expression has to be evaluated. Decisions are now made in a top-down fashion allowing for more optimization opportunities, cleaner internal code, and easier extensibility.

Regarding optimizations, a typical example is the following:

MatrixXd A, B, C, D;
A.noalias() = B + C * D;

Prior to Eigen 3.3, the "C*D" sub expression would have been evaluated into a temporary by the expression of the addition operator. In other word, this expression would have been compiled to the following code:

tmp = C * D;
A = B + tmp;

In Eigen 3.3, we can now have a view of the complete expression and generate the following temporary-free code:

A = B;
A.noalias() += C * D;

Index typedef

In Eigen 3.3, the "Index" typedef is now global and defined by default to std::ptrdiff_t:

namespace Eigen {
  typedef std::ptrdiff_t Index;
}

This Index type is used throughout Eigen as the preferred type for both sizes and indices. It can be controlled globally through the EIGEN_DEFAULT_INDEX_TYPE macro. The usage of Eigen::DenseIndex and AnyExpression::Index are now deprecated, and they are always equivalent to Eigen::Index.

For expressions storing an array of indices or sizes, the type for storage can be controlled per object through a template parameter. This type is consistently named "StorageIndex", and its default value is "int". See for instance the classes PermutationMatrix and SparseMatrix.

Warning: these changes might affect codes that used the SparseMatrix::Index type. In Eigen 3.2, this type was not documented and it was improperly defined as the storage type (e.g., int), whereas it is now deprecated and always defined as Eigen::Index. Code making use of SparseMatrix::Index, might likely have to be changed to use SparseMatrix::StorageIndex instead.

Vectorization

Eigen 3.3 adds support for AVX (x86_64), FMA (x86_64) and VSX (PowerPC) SIMD instruction sets.

To enable AVX or FMA, you need to compile your code with these instruction sets enabled on the compiler side, for instance using the -mavx and -mfma options with gcc, clang or icc. AVX brings up to a x2 speed up for single and double precision floating point matrices by processing 8 and 4 scalar values at once respectively. Complexes are also supported. To achieve best performance, AVX requires 32 bytes aligned buffers. By default, Eigen's dense objects are thus automatically aligned on 32 bytes when AVX is enabled. Alignment behaviors can be controlled as detailed in this page.

FMA stands for Fused-Multiple-Add. Currently, only Intel's FMA instruction set, as introduced in the Haswell micro-architecture, is supported, and it is explicitly exploited in matrix products for which a x1.7 speedup is obtained.

Limitation - Currently, enabling AVX disables the vectorization of objects which are not a multiple of the full AVX register size (256bits), in particular, this concerns Vector4f and Vector2d.

Dense products

  • In Eigen 3.3, the dense matrix-matrix product has been significantly redesigned to make a best use of recent CPU architectures (i.e., wide SIMD registers, FMA).
  • The heuristic to determine the different cache-level blocking sizes has been significantly improved.
  • The overhead for small products of dynamic sizes has been significantly reduced.

Dense decomposition

Sparse matrices

Sparse solvers

    • ConjugateGradient and BiCGSTAB properly use a zero vector as the default guess
    • Allows Lower|Upper as a template argument of CG and MINRES: in this case the full matrix will be considered

[1] $ hg log -r "branch(default) and 8f8013705345:: and not merge()" | grep changeset | wc -l


Unsupported modules