3.3

From Eigen
Revision as of 08:58, 22 June 2015 by Ggael (Talk | contribs)

Jump to: navigation, search

Eigen 3.3 alpha-1 was released on June ??, 2015.

Eigen 3.3 alpha-1 can be downloaded from the Download section on the Main Page. Since Eigen 3.2, the 3.3 development branch received more than 1500 commits [1] representing numerous major changes. It includes all bug-fixes and improvements of the 3.2 branch, as detailed in the respective change-logs: 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5

The other major changes are summarized below.

Expression evaluators

In Eigen 3.3, the evaluation mechanism of expressions has been completely rewritten. Even though this is really a major change, this mostly concerns internal details and most user should not notice it. In a nutshell, until Eigen 3.3, the evaluation strategy of expressions and subexpressions was decided at the construction time of expression in a bottom-up approach. The strategy consists in completely deferring all these choices until the whole expression has to be evaluated. Decisions are now made in a top-down fashion allowing for more optimization opportunities, cleaner internal code, and easier extensibility.

Regarding novel expression-level optimizations, a typical example is the following:

MatrixXd A, B, C, D;
A.noalias() = B + C * D;

Prior to Eigen 3.3, the "C*D" sub expression would have been evaluated into a temporary by the expression of the addition operator. In other word, this expression would have been compiled to the following code:

tmp = C * D;
A = B + tmp;

In Eigen 3.3, we can now have a view of the complete expression and generate the following temporary-free code:

A = B;
A.noalias() += C * D;

Index typedef

In Eigen 3.3, the "Index" typedef is now global and defined by default to std::ptrdiff_t:

namespace Eigen {
  typedef std::ptrdiff_t Index;
}

This Index type is used throughout Eigen as the preferred type for both sizes and indices. It can be controlled globally through the EIGEN_DEFAULT_INDEX_TYPE macro. The usage of Eigen::DenseIndex and AnyExpression::Index are now deprecated. They are always equivalent to Eigen::Index.

For expressions storing an array of indices or sizes, the type for storage can be controlled per object through a template parameter. This type is consistently named "StorageIndex", and its default value is "int". See for instance the PermutationMatrix and SparseMatrix classes.

Warning: these changes might affect codes that used the SparseMatrix::Index type. In Eigen 3.2, this type was not documented and it was improperly defined as the storage index type (e.g., int), whereas it is now deprecated and always defined as Eigen::Index. Code making use of SparseMatrix::Index, might thus likely have to be changed to use SparseMatrix::StorageIndex instead.

Vectorization

  • Eigen 3.3 adds support for AVX (x86_64), FMA (x86_64) and VSX (PowerPC) SIMD instruction sets.
    • To enable AVX or FMA, you need to compile your code with these instruction sets enabled on the compiler side, for instance using the -mavx and -mfma options with gcc, clang or icc. AVX brings up to a x2 speed up for single and double precision floating point matrices by processing 8 and 4 scalar values at once respectively. Complexes are also supported. To achieve best performance, AVX requires 32 bytes aligned buffers. By default, Eigen's dense objects are thus automatically aligned on 32 bytes when AVX is enabled. Alignment behaviors can be controlled as detailed in this page.
    • FMA stands for Fused-Multiple-Add. Currently, only Intel's FMA instruction set, as introduced in the Haswell micro-architecture, is supported, and it is explicitly exploited in matrix products for which a x1.7 speedup is obtained.
    • Limitation - Currently, enabling AVX disables the vectorization of objects which are not a multiple of the full AVX register size (256bits). For instance, this concerns Vector4f and Vector2d types.
  • Add vectorization of Quaternion::conjugate.

Dense products

  • The dense matrix-matrix product kernel has been significantly redesigned to make a best use of recent CPU architectures (i.e., wide SIMD registers, FMA).
  • The heuristic to determine the different cache-level blocking sizes has been significantly improved.
  • The overhead for small products of dynamic sizes has been significantly reduced by falling-back at runtime to a coefficient-based product implementation (aka., lazyProduct).
    • The criterion to switch to lazyProduct is: (m+n+k)<20
  • Enable Mx0 * 0xN matrix products.

Dense decompositions

  • Eigen 3.3 includes a new Divide & Conquer SVD algorithm through the BDCSVD class. This new algorithm can be more that one order of magnitude faster than JacobiSVD for large matrices.
  • Various numerical robustness improvements in JacobiSVD, LDLT, 2x2 and 3x3 direct eigenvalues, ColPivHouseholderQR, FullPivHouseholderQR, and RealSchur.
  • FullPivLU: pivoting strategy can now be customized for any scalar types.

Sparse matrices

  • MappedSparseMatrix is now deprecated and replaced by the more versatile Map<SparseMatrix> class.
  • Add support for Ref<SparseMatrix>.

Sparse solvers

  • Add a LeastSquareConjugateGradient solver for solving sparse problem of the form argmin_x |A x - b|^2 through the normal equation but without forming A^T A.
  • Improve robustness of SimplicialLDLT to semidefinite problems by correctly handling structural zeros in AMD reordering. Very useful for solving SPD problems with equality constraints.
  • ConjugateGradient and BiCGSTAB now properly use a zero vector as the default guess.
  • Allows Lower|Upper as a template argument of ConjugateGradient and MINRES: in this case the full matrix will be considered. This also simplify writing matrix-free wrappers to ConjugateGradient.
  • Improved numerical robustness in BiCGSTAB, SparseLU, SparseQR and SPQR.
  • Add a determinant() method to SparseLU.
  • Improved handling of inputs in both iterative and direct solvers.

Experimental CUDA support

Starting from Eigen 3.3, it is now possible to use Eigen's objects and algorithms within CUDA kernels. However, only a subset of features are supported to make sure that no dynamic allocation are triggered within a CUDA kernel.

Unsupported Tensor module

Eigen 3.3 includes a previous of a Tensor module for multi-dimensional arrays and tensors in unsupported/Eigen/CXX11/Tensor. It provides numerous features including slicing, coefficient-wise operations, reductions, contractions, convolution, multi-threading, CUDA, etc. This module is developed and used by Google.

Miscellaneous

  • Various numerical robustness improvements in stableNorm(), Hyperplane::Through(a,b,c), Quaternion::angularDistance.
  • Add numerous coefficient-wise methods to Array: isFinite, isNaN, isInf, arg, log10, atan, tanh, sinh, cosh, round, floor, ceil, logical not,
  • Add a determinant() method to PermutationMatrix.
  • EIGEN_STACK_ALLOCATION_LIMIT: Raise its default value to 128KB, make use of it to assert on maximal fixed size object, and allows it to be 0 to mean "no limit"
  • Conversions from Quaternion to AngleAxis implicitly normalize the input quaternion.
  • Improved support of c++11 while preserving a full compatibility with c++98.
  • Support for Eigen 2 as been removed. If you plane to migrate from Eigen 2 to 3, then it is recommended to first do the transition using the facilities offered in Eigen 3.2, drop all Eigen 2 deprecated features, and then move to Eigen 3.3.
  • The BLAS interface library does not require a fortran compiler anymore.
  • The LAPACK interface library has been extended with the following two SVD routines: ?gesdd, ? gesvd.

Known issues

  • Slight increase of compiler memory consumption.
  • MSVC might trigger some internal compiler errors. Please help us working around them if you hit one.
  • Unlike JacobiSVD, BDCSVD still involve heap memory allocations during the factorization which might be a problem for embedded systems.

Footnotes

[1] $ hg log -r "aaec59ac0b4e:: and not merge()" | grep changeset | wc -l