Difference between revisions of "3.3"
(→Dense decompositions) |
(→Vectorization) |
||
Line 65: | Line 65: | ||
=== Vectorization === | === Vectorization === | ||
− | * Eigen 3.3 adds support for '''AVX''' (x86_64), '''FMA''' (x86_64), '''VSX''' (PowerPC), and '''New in beta2:''' ZVector (s390x/zEC13) SIMD instruction sets. | + | * Eigen 3.3 adds support for '''AVX''' (x86_64), '''FMA''' (x86_64), '''VSX''' (PowerPC), and '''New in beta2:''' '''ZVector''' (s390x/zEC13) SIMD instruction sets. |
** To enable AVX or FMA, you need to compile your code with these instruction sets enabled on the compiler side, for instance using the -mavx and -mfma options with gcc, clang or icc. AVX brings up to a x2 speed up for single and double precision floating point matrices by processing 8 and 4 scalar values at once respectively. Complexes are also supported. To achieve best performance, AVX requires 32 bytes aligned buffers. By default, Eigen's dense objects are thus automatically aligned on 32 bytes when AVX is enabled. Alignment behaviors can be controlled as detailed in this page. | ** To enable AVX or FMA, you need to compile your code with these instruction sets enabled on the compiler side, for instance using the -mavx and -mfma options with gcc, clang or icc. AVX brings up to a x2 speed up for single and double precision floating point matrices by processing 8 and 4 scalar values at once respectively. Complexes are also supported. To achieve best performance, AVX requires 32 bytes aligned buffers. By default, Eigen's dense objects are thus automatically aligned on 32 bytes when AVX is enabled. Alignment behaviors can be controlled as detailed in this page. | ||
** FMA stands for Fused-Multiple-Add. Currently, only Intel's FMA instruction set, as introduced in the Haswell micro-architecture, is supported, and it is explicitly exploited in matrix products for which a x1.7 speedup can be expected. | ** FMA stands for Fused-Multiple-Add. Currently, only Intel's FMA instruction set, as introduced in the Haswell micro-architecture, is supported, and it is explicitly exploited in matrix products for which a x1.7 speedup can be expected. |
Revision as of 11:21, 3 June 2016
Note: this page will be updated on a regular basis until the official 3.3.0 release.
The latest 3.3 pre-release is Eigen 3.3 beta-1, which can be downloaded from the Download section on the Main Page. Since Eigen 3.2, the 3.3 development branch received more than 2000 commits [1] representing numerous major changes.
Release history:
- Eigen 3.3 alpha-1 was released on September 4, 2015. It includes about 1660 commits since 3.2. It includes all bug-fixes and improvements of the 3.2 branch up to the 3.2.5 version, as detailed in the respective change-logs: 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5.
- Eigen 3.3 beta-1 was released on December 16, 2015. It includes about 350 commits since 3.3 alpha1, as detailed in the respective change-log.
The other major changes since 3.2 are summarized below (search for "New in beta1", to quickly get an overview of the major novel features between alpha1 and beta1).
Contents
Changes that might impact existing code
Eigen 3.3 fixes a few shortcomings that might impact existing code:
- Eigen 3.3 clarifies the usage of the Index type in SparseMatrix, PermutationMatrix, Transpositions, as detailed below.
- By default Eigen assumes that their is no aliasing between the left and right hand sides of an assignment, at the exception of matrix product factors. In Eigen 3.3, this exception is slightly relaxed to enable global optimization of the expressions. In particular, if a factor of a matrix product appears also on the left-hand-side of the assignment and that this factor gets resized by the assignment and that the product is not directly assigned, then Eigen 3.3 will produce an erroneous result, because the factor will be resized before the evaluation takes place. Here are some examples:
// Let's assume that m!=n MatrixXd A(m,n), B(n,n), X(n,k); X = (A*X).cwiseAbs(); // OK in 3.2, but not OK in 3.3 because X gets resized X.noalias() = (A*X).eval().cwiseAbs(); // Correct solution in 3.3 (also OK in 3.2) X = (B*X).cwiseAbs(); // OK in both 3.2 and 3.3 because X is not resized X = A*X; // OK in both 3.2 and 3.3 because the product is directly assigned
- New in beta2: The normalize and normalized methods will now leave their input unchanged if its norm is 0 or too close to 0. Previously this resulted in a vector containing NaNs or infinities.
Expression evaluators
In Eigen 3.3, the evaluation mechanism of expressions has been completely rewritten. Even though this is really a major change, this mostly concerns internal details and most users should not notice it. In a nutshell, until Eigen 3.3, the evaluation strategy of expressions and subexpressions was decided at the construction time of expressions in a bottom-up approach. The novel strategy consists in completely deferring all these choices until the whole expression has to be evaluated. Decisions are now made in a top-down fashion allowing for more optimization opportunities, cleaner internal code, and easier extensibility.
Regarding novel expression-level optimizations, a typical example is the following:
MatrixXd A, B, C, D; A.noalias() = B + C * D;
Prior to Eigen 3.3, the "C*D" subexpression would have been evaluated into a temporary by the expression representing the addition operator. In other words, this expression would have been compiled to the following code:
tmp = C * D; A = B + tmp;
In Eigen 3.3, we can now have a view of the complete expression and generate the following temporary-free code:
A = B; A.noalias() += C * D;
Index typedef
In Eigen 3.3, the "Index" typedef is now global and defined by default to std::ptrdiff_t:
namespace Eigen { typedef std::ptrdiff_t Index; }
This "Index" type is used throughout Eigen as the preferred type for both sizes and indices. It can be controlled globally through the EIGEN_DEFAULT_INDEX_TYPE macro. The usage of Eigen::DenseIndex and AnyExpression::Index are now deprecated. They are always equivalent to Eigen::Index.
For expressions storing an array of indices or sizes, the type for storage can be controlled per object through a template parameter. This type is consistently named "StorageIndex", and its default value is "int". See for instance the PermutationMatrix and SparseMatrix classes.
Warning: these changes might affect codes that used the SparseMatrix::Index type. In Eigen 3.2, this type was not documented and it was improperly defined as the storage index type (e.g., int), whereas it is now deprecated and always defined as Eigen::Index. Codes making use of SparseMatrix::Index, might thus likely have to be changed to use SparseMatrix::StorageIndex instead.
Vectorization
- Eigen 3.3 adds support for AVX (x86_64), FMA (x86_64), VSX (PowerPC), and New in beta2: ZVector (s390x/zEC13) SIMD instruction sets.
- To enable AVX or FMA, you need to compile your code with these instruction sets enabled on the compiler side, for instance using the -mavx and -mfma options with gcc, clang or icc. AVX brings up to a x2 speed up for single and double precision floating point matrices by processing 8 and 4 scalar values at once respectively. Complexes are also supported. To achieve best performance, AVX requires 32 bytes aligned buffers. By default, Eigen's dense objects are thus automatically aligned on 32 bytes when AVX is enabled. Alignment behaviors can be controlled as detailed in this page.
- FMA stands for Fused-Multiple-Add. Currently, only Intel's FMA instruction set, as introduced in the Haswell micro-architecture, is supported, and it is explicitly exploited in matrix products for which a x1.7 speedup can be expected.
- When AVX is enabled, Eigen automatically falls back to half register sizes for fixed-size types which are not a multiple of the full AVX register size (256bits). For instance, this concerns Vector4f and Vector2d types.
- New in beta2: Eigen now enables vectorization of statically unaligned arrays through unaligned loads/stores. This concerns all architectures and all sizes. This new behavior can be disabled by defining EIGEN_UNALIGNED_VECTORIZE=0.
- New in beta1: Enable fixed-size alignment and vectorization on ARM.
- New in beta1: Add vectorization of round, ceil, floor, for SSE4.1/AVX.
- New in beta2: Add vectorization of tanh for float (SSE/AVX)
- Many ARM NEON improvements, including support for ARMv8 (64-bit code), VFPv4 (fused-multiply-accumulate instruction), and correct tuning of the target number of vector registers.
- Add vectorization of exp and log for AltiVec/VSX.
- Add vectorization of Quaternion::conjugate for SSE/AVX
Dense products
- The dense matrix-matrix product kernel has been significantly redesigned to make a best use of recent CPU architectures (i.e., wide SIMD registers, FMA).
- The heuristic to determine the different cache-level blocking sizes has been significantly improved.
- Reasonable defaults for cache sizes have been added for ARM, where we generally cannot query them at runtime.
- The overhead for small products of dynamic sizes has been significantly reduced by falling-back at runtime to a coefficient-based product implementation (aka., lazyProduct).
- The criterion to switch to lazyProduct is: (m+n+k)<20
- Enable Mx0 * 0xN matrix products.
- New in beta2: avoid dynamic memory allocation in fixed-size rank-updates, matrix products evaluated within a triangular part, and selfadjoint times matrix products.
- New in beta2: Any BLAS library can now be used as a backend when compiling with EIGEN_USE_BLAS.
Dense decompositions
- Eigen 3.3 includes a new Divide & Conquer SVD algorithm through the BDCSVD class. This new algorithm can be more than one order of magnitude faster than JacobiSVD for large matrices.
- New in beta2: Eigen 3.3 includes a new CompleteOrthogonalDecomposition algorithm which can be used for minimal-norm problems (usually faster than SVD).
- New in beta2: Add fast reciprocal condition number estimator in dense LU and Cholesky decompositions.
- Various numerical robustness improvements in JacobiSVD, LDLT, LLT, 2x2 and 3x3 direct eigenvalues, ColPivHouseholderQR, FullPivHouseholderQR, and RealSchur.
- FullPivLU: pivoting strategy can now be customized for any scalar types.
- New in beta1: Add LU::transpose().solve() and LU::adjoint().solve() API (doc).
Sparse matrices
- MappedSparseMatrix is now deprecated and replaced by the more versatile Map<SparseMatrix> class. New in beta1: Ref<SparseVector> is supported too.
- Add support for Ref<SparseMatrix>.
- Add OpenMP parallelization of sparse * dense products. Currently, this is limited to row-major sparse matrices.
- New in beta1: Extend setFromTripplets API to allow passing a functor object controlling how to collapse duplicated entries.
- New in beta1: Optimise assignment into a sparse.block() such that, for instance, row-by-row filling of a row-major sparse matrix is very efficient.
- New in beta1: Add support for dense.cwiseProduct(sparse), thus enabling (dense*sparse).diagonal() expressions.
- New in beta1: Add support for the direct evaluation of the product of two sparse matrices within a dense matrix.
- New in beta2: Add SparseVector::conservativeResize() method.
- New in beta2: Add support for "dense +/- sparse" operations.
Sparse solvers
- Add a LeastSquareConjugateGradient solver for solving sparse problems of the form argmin_x |A x - b|^2 through the normal equation but without forming A^T A.
- New in beta1: Add an IncompleteCholesky preconditioner suitable for ConjugateGradient and MINRES.
- Improve robustness of SimplicialLDLT to semidefinite problems by correctly handling structural zeros in AMD reordering. Very useful for solving SPD problems with equality constraints.
- Add OpenMP support in ConjugateGradient, BiCGSTAB, and LeastSquareConjugateGradient. See the respective class's doc for details on the best use of this feature.
- ConjugateGradient and BiCGSTAB now properly use a zero vector as the default guess.
- Allows Lower|Upper as a template argument of ConjugateGradient and MINRES: in this case the full matrix will be considered. This also simplifies the writing of matrix-free wrappers to ConjugateGradient.
- Improved numerical robustness in BiCGSTAB, SparseLU, SparseQR and SPQR.
- Add a determinant() method to SparseLU.
- Improved handling of generic inputs in both iterative and direct solvers.
- New in beta1: Improve support for matrix-free iterative solvers.
- New in beta1: Add access to UmfPack return code and parameters.
- New in beta2: Add logDeterminant and determinant functions to Cholmod* solvers.
Experimental CUDA support
Starting from Eigen 3.3, it is now possible to use Eigen's objects and algorithms within CUDA kernels. However, only a subset of features are supported to make sure that no dynamic allocation is triggered within a CUDA kernel.
Unsupported Tensor module
Eigen 3.3 includes a preview of a Tensor module for multi-dimensional arrays and tensors in unsupported/Eigen/CXX11/Tensor. It provides numerous features including slicing, coefficient-wise operations, reductions, contractions, convolution, multi-threading, CUDA, etc. This module is mainly developed and used by Google.
Miscellaneous
- Various numerical robustness improvements in stableNorm(), Hyperplane::Through(a,b,c), Quaternion::angularDistance.
- Add numerous coefficient-wise methods to Array: isFinite, isNaN, isInf, arg, log10, atan, tanh, sinh, cosh, round, floor, ceil, logical not, and generalize pow(x,e). New in beta1: sign, rsqrt, lgamma, erf, and erfc. New in beta2: igamma, igammac, digamma, polygamma, zeta, betainc.
- New in beta1: Add support for row/col-wise lpNorm().
- New in beta2: Add stableNormalize[d] methods: they are analogues to normalize[d] but with carefull handling of under/over-flow
- Add a determinant() method to PermutationMatrix.
- EIGEN_STACK_ALLOCATION_LIMIT: Raise its default value to 128KB, make use of it to assert on maximal fixed size objects, and allows it to be 0 to mean "no limit"
- Conversions from Quaternion to AngleAxis implicitly normalize the input quaternion.
- Improved support of c++11 while preserving a full compatibility with c++98.
- Eigen 2 support code has been removed. If you plan to migrate from Eigen 2 to 3, then it is recommended to first do the transition using the facilities offered in Eigen 3.2, drop all Eigen 2 deprecated features, and then move to Eigen 3.3.
- The BLAS interface library does not require a fortran compiler anymore.
- The LAPACK interface library has been extended with the following two SVD routines: ?gesdd, ? gesvd.
- New in beta2: enable zero-sized block at compile-time, e.g.: mat.block<0,0>(i,j)
- New in beta2: Add support for SelfadjointView::triangularView() and diagonal().
- New in beta2: Add Quaternion::UnitRandom() method.
- New in beta2: Add a EIGEN_MAX_CPP_VER option to limit the C++ version to be used, as well as fine grained options to control individual language features.
Known issues
- MSVC+clang frontend is not supported yet, see: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1122
- Unlike JacobiSVD, BDCSVD still involve heap memory allocations during the factorization which might be a problem for embedded systems.
- Slight increase of compiler memory consumption.
Footnotes
[1] $ hg log -r "aaec59ac0b4e:: and not merge() and not branch(3.2)" | grep "changeset:" | wc -l