3.3

From Eigen
Revision as of 10:54, 2 December 2015 by Ggael (Talk | contribs)

Jump to: navigation, search

Eigen 3.3 alpha-1 was released on September 4, 2015.

Eigen 3.3 alpha-1 can be downloaded from the Download section on the Main Page. Since Eigen 3.2, the 3.3 development branch received more than 1660 commits [1] representing numerous major changes. It includes all bug-fixes and improvements of the 3.2 branch, as detailed in the respective change-logs: 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5

The other major changes are summarized below.

Expression evaluators

In Eigen 3.3, the evaluation mechanism of expressions has been completely rewritten. Even though this is really a major change, this mostly concerns internal details and most users should not notice it. In a nutshell, until Eigen 3.3, the evaluation strategy of expressions and subexpressions was decided at the construction time of expressions in a bottom-up approach. The novel strategy consists in completely deferring all these choices until the whole expression has to be evaluated. Decisions are now made in a top-down fashion allowing for more optimization opportunities, cleaner internal code, and easier extensibility.

Regarding novel expression-level optimizations, a typical example is the following:

MatrixXd A, B, C, D;
A.noalias() = B + C * D;

Prior to Eigen 3.3, the "C*D" subexpression would have been evaluated into a temporary by the expression representing the addition operator. In other words, this expression would have been compiled to the following code:

tmp = C * D;
A = B + tmp;

In Eigen 3.3, we can now have a view of the complete expression and generate the following temporary-free code:

A = B;
A.noalias() += C * D;

Index typedef

In Eigen 3.3, the "Index" typedef is now global and defined by default to std::ptrdiff_t:

namespace Eigen {
  typedef std::ptrdiff_t Index;
}

This "Index" type is used throughout Eigen as the preferred type for both sizes and indices. It can be controlled globally through the EIGEN_DEFAULT_INDEX_TYPE macro. The usage of Eigen::DenseIndex and AnyExpression::Index are now deprecated. They are always equivalent to Eigen::Index.

For expressions storing an array of indices or sizes, the type for storage can be controlled per object through a template parameter. This type is consistently named "StorageIndex", and its default value is "int". See for instance the PermutationMatrix and SparseMatrix classes.

Warning: these changes might affect codes that used the SparseMatrix::Index type. In Eigen 3.2, this type was not documented and it was improperly defined as the storage index type (e.g., int), whereas it is now deprecated and always defined as Eigen::Index. Codes making use of SparseMatrix::Index, might thus likely have to be changed to use SparseMatrix::StorageIndex instead.

Vectorization

  • Eigen 3.3 adds support for AVX (x86_64), FMA (x86_64) and VSX (PowerPC) SIMD instruction sets.
    • To enable AVX or FMA, you need to compile your code with these instruction sets enabled on the compiler side, for instance using the -mavx and -mfma options with gcc, clang or icc. AVX brings up to a x2 speed up for single and double precision floating point matrices by processing 8 and 4 scalar values at once respectively. Complexes are also supported. To achieve best performance, AVX requires 32 bytes aligned buffers. By default, Eigen's dense objects are thus automatically aligned on 32 bytes when AVX is enabled. Alignment behaviors can be controlled as detailed in this page.
    • FMA stands for Fused-Multiple-Add. Currently, only Intel's FMA instruction set, as introduced in the Haswell micro-architecture, is supported, and it is explicitly exploited in matrix products for which a x1.7 speedup can be expected.
    • When AVX is enabled, Eigen automatically falls back to half register sizes for fixed-size types which are not a multiple of the full AVX register size (256bits). For instance, this concerns Vector4f and Vector2d types.
  • Add vectorization of exp and log for AltiVec/VSX.
  • Add vectorization of Quaternion::conjugate for SSE/AVX

Dense products

  • The dense matrix-matrix product kernel has been significantly redesigned to make a best use of recent CPU architectures (i.e., wide SIMD registers, FMA).
  • The heuristic to determine the different cache-level blocking sizes has been significantly improved.
  • The overhead for small products of dynamic sizes has been significantly reduced by falling-back at runtime to a coefficient-based product implementation (aka., lazyProduct).
    • The criterion to switch to lazyProduct is: (m+n+k)<20
  • Enable Mx0 * 0xN matrix products.

Dense decompositions

  • Eigen 3.3 includes a new Divide & Conquer SVD algorithm through the BDCSVD class. This new algorithm can be more that one order of magnitude faster than JacobiSVD for large matrices.
  • Various numerical robustness improvements in JacobiSVD, LDLT, 2x2 and 3x3 direct eigenvalues, ColPivHouseholderQR, FullPivHouseholderQR, and RealSchur.
  • FullPivLU: pivoting strategy can now be customized for any scalar types.

Sparse matrices

  • MappedSparseMatrix is now deprecated and replaced by the more versatile Map<SparseMatrix> class.
  • Add support for Ref<SparseMatrix>.
  • Add OpenMP parallelization of sparse * dense products. Currently, this is limited to row-major sparse matrices.

Sparse solvers

  • Add a LeastSquareConjugateGradient solver for solving sparse problems of the form argmin_x |A x - b|^2 through the normal equation but without forming A^T A.
  • Improve robustness of SimplicialLDLT to semidefinite problems by correctly handling structural zeros in AMD reordering. Very useful for solving SPD problems with equality constraints.
  • Add OpenMP support in ConjugateGradient, BiCGSTAB, and LeastSquareConjugateGradient. See the respective class's doc for details on the best use of this feature.
  • ConjugateGradient and BiCGSTAB now properly use a zero vector as the default guess.
  • Allows Lower|Upper as a template argument of ConjugateGradient and MINRES: in this case the full matrix will be considered. This also simplifies the writing of matrix-free wrappers to ConjugateGradient.
  • Improved numerical robustness in BiCGSTAB, SparseLU, SparseQR and SPQR.
  • Add a determinant() method to SparseLU.
  • Improved handling of inputs in both iterative and direct solvers.

Experimental CUDA support

Starting from Eigen 3.3, it is now possible to use Eigen's objects and algorithms within CUDA kernels. However, only a subset of features are supported to make sure that no dynamic allocation is triggered within a CUDA kernel.

Unsupported Tensor module

Eigen 3.3 includes a preview of a Tensor module for multi-dimensional arrays and tensors in unsupported/Eigen/CXX11/Tensor. It provides numerous features including slicing, coefficient-wise operations, reductions, contractions, convolution, multi-threading, CUDA, etc. This module is mainly developed and used by Google.

Miscellaneous

  • Various numerical robustness improvements in stableNorm(), Hyperplane::Through(a,b,c), Quaternion::angularDistance.
  • Add numerous coefficient-wise methods to Array: isFinite, isNaN, isInf, arg, log10, atan, tanh, sinh, cosh, round, floor, ceil, logical not, and generalize pow(x,e).
  • Add a determinant() method to PermutationMatrix.
  • EIGEN_STACK_ALLOCATION_LIMIT: Raise its default value to 128KB, make use of it to assert on maximal fixed size objects, and allows it to be 0 to mean "no limit"
  • Conversions from Quaternion to AngleAxis implicitly normalize the input quaternion.
  • Improved support of c++11 while preserving a full compatibility with c++98.
  • Support for Eigen 2 as been removed. If you plane to migrate from Eigen 2 to 3, then it is recommended to first do the transition using the facilities offered in Eigen 3.2, drop all Eigen 2 deprecated features, and then move to Eigen 3.3.
  • The BLAS interface library does not require a fortran compiler anymore.
  • The LAPACK interface library has been extended with the following two SVD routines: ?gesdd, ? gesvd.

Known issues

Footnotes

[1] $ hg log -r "aaec59ac0b4e:: and not merge() and not branch(3.2)" | grep "changeset:" | wc -l