Difference between revisions of "3.3"
(→Dense products) |
(→Known issues) |
||
(36 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | ''Note: this page will be updated on a regular basis until the official 3.3.0 release.'' | + | <!-- ''Note: this page will be updated on a regular basis until the official 3.3.0 release.'' --> |
− | + | Eigen 3.3 has been released on November 10, 2016. | |
− | Since Eigen 3.2, the 3.3 development branch received more than | + | It can be downloaded from the [[Main_Page#Download|Download]] section on the [[Main Page]]. |
+ | Since Eigen 3.2, the 3.3 development branch received more than 3500 commits [1] representing numerous major changes. | ||
Release history: | Release history: | ||
Line 8: | Line 9: | ||
* Eigen 3.3 alpha-1 was released on September 4, 2015. It includes about 1660 commits since 3.2. It includes all bug-fixes and improvements of the 3.2 branch up to the 3.2.5 version, as detailed in the respective change-logs: [[ChangeLog#Eigen_3.2.1|3.2.1]], [[ChangeLog#Eigen_3.2.2|3.2.2]], [[ChangeLog#Eigen_3.2.3|3.2.3]], [[ChangeLog#Eigen_3.2.4|3.2.4]], [[ChangeLog#Eigen_3.2.5|3.2.5]]. | * Eigen 3.3 alpha-1 was released on September 4, 2015. It includes about 1660 commits since 3.2. It includes all bug-fixes and improvements of the 3.2 branch up to the 3.2.5 version, as detailed in the respective change-logs: [[ChangeLog#Eigen_3.2.1|3.2.1]], [[ChangeLog#Eigen_3.2.2|3.2.2]], [[ChangeLog#Eigen_3.2.3|3.2.3]], [[ChangeLog#Eigen_3.2.4|3.2.4]], [[ChangeLog#Eigen_3.2.5|3.2.5]]. | ||
* Eigen 3.3 beta-1 was released on December 16, 2015. It includes about 350 commits since 3.3 alpha1, as detailed in the respective [[ChangeLog#Eigen_3.3-beta1|change-log]]. | * Eigen 3.3 beta-1 was released on December 16, 2015. It includes about 350 commits since 3.3 alpha1, as detailed in the respective [[ChangeLog#Eigen_3.3-beta1|change-log]]. | ||
+ | * Eigen 3.3 beta-2 was released on July 26, 2016. It includes more that 1000 commits since 3.3 beta1, as detailed in the respective [[ChangeLog#Eigen_3.3-beta2|change-log]]. | ||
+ | * Eigen 3.3 rc-1 was released on September 22, 2016. It includes more that 300 commits since 3.3 beta2, as detailed in the respective [[ChangeLog#Eigen_3.3-rc1|change-log]]. | ||
+ | * Eigen 3.3 rc-2 was released on November 04, 2016. It includes more that 90 commits since 3.3 rc1, as detailed in the respective [[ChangeLog#Eigen_3.3-rc2|change-log]]. | ||
− | The | + | The major changes since 3.2 are summarized below. |
=== Changes that might impact existing code === | === Changes that might impact existing code === | ||
Line 15: | Line 19: | ||
Eigen 3.3 fixes a few shortcomings that might impact existing code: | Eigen 3.3 fixes a few shortcomings that might impact existing code: | ||
* Eigen 3.3 clarifies the usage of the Index type in SparseMatrix, PermutationMatrix, Transpositions, as detailed [[#Index typedef | below]]. | * Eigen 3.3 clarifies the usage of the Index type in SparseMatrix, PermutationMatrix, Transpositions, as detailed [[#Index typedef | below]]. | ||
− | * By default Eigen assumes that | + | * The normalize and normalized methods will now leave their input unchanged if its norm is 0 or too close to 0. Previously this resulted in a vector containing NaNs or infinities. |
+ | * internal::significant_decimals_impl is deprecated and users of custom scalar types are encouraged to overload [http://eigen.tuxfamily.org/dox-devel/structEigen_1_1NumTraits.html NumTraits<>::digits10()]. | ||
+ | * Resolved in 3.3-rc2: <s>By default Eigen assumes that there is no [http://eigen.tuxfamily.org/dox/group__TopicAliasing.html aliasing] between the left and right hand sides of an assignment, at the exception of matrix product factors. In Eigen 3.3, this exception is slightly relaxed to enable global optimization of the expressions. In particular, if a factor of a matrix product appears also on the left-hand-side of the assignment '''and''' that this factor gets resized by the assignment '''and''' that the product is not directly assigned, then Eigen 3.3 will produce an erroneous result, because the factor will be resized before the evaluation takes place. Here are some examples: | ||
<source lang="cpp"> | <source lang="cpp"> | ||
// Let's assume that m!=n | // Let's assume that m!=n | ||
Line 24: | Line 30: | ||
X = A*X; // OK in both 3.2 and 3.3 because the product is directly assigned | X = A*X; // OK in both 3.2 and 3.3 because the product is directly assigned | ||
</source> | </source> | ||
− | + | </s> | |
=== Expression evaluators === | === Expression evaluators === | ||
Line 47: | Line 53: | ||
A.noalias() += C * D; | A.noalias() += C * D; | ||
</source> | </source> | ||
+ | |||
+ | This mechanism is applied to all expressions matching "d?=a+b*c" or "d?=a-b*c". | ||
=== Index typedef === | === Index typedef === | ||
Line 65: | Line 73: | ||
=== Vectorization === | === Vectorization === | ||
− | * Eigen 3.3 adds support for '''AVX''' (x86_64), '''FMA''' (x86_64), '''VSX''' (PowerPC), and '''New in beta2:''' ZVector (s390x/zEC13) SIMD instruction sets. | + | * Eigen 3.3 adds support for '''AVX''' (x86_64), '''FMA''' (x86_64), '''VSX''' (PowerPC), and '''New in beta2:''' '''ZVector''' (s390x/zEC13) and '''AVX512''' (xeon phi) SIMD instruction sets. |
** To enable AVX or FMA, you need to compile your code with these instruction sets enabled on the compiler side, for instance using the -mavx and -mfma options with gcc, clang or icc. AVX brings up to a x2 speed up for single and double precision floating point matrices by processing 8 and 4 scalar values at once respectively. Complexes are also supported. To achieve best performance, AVX requires 32 bytes aligned buffers. By default, Eigen's dense objects are thus automatically aligned on 32 bytes when AVX is enabled. Alignment behaviors can be controlled as detailed in this page. | ** To enable AVX or FMA, you need to compile your code with these instruction sets enabled on the compiler side, for instance using the -mavx and -mfma options with gcc, clang or icc. AVX brings up to a x2 speed up for single and double precision floating point matrices by processing 8 and 4 scalar values at once respectively. Complexes are also supported. To achieve best performance, AVX requires 32 bytes aligned buffers. By default, Eigen's dense objects are thus automatically aligned on 32 bytes when AVX is enabled. Alignment behaviors can be controlled as detailed in this page. | ||
** FMA stands for Fused-Multiple-Add. Currently, only Intel's FMA instruction set, as introduced in the Haswell micro-architecture, is supported, and it is explicitly exploited in matrix products for which a x1.7 speedup can be expected. | ** FMA stands for Fused-Multiple-Add. Currently, only Intel's FMA instruction set, as introduced in the Haswell micro-architecture, is supported, and it is explicitly exploited in matrix products for which a x1.7 speedup can be expected. | ||
** When AVX is enabled, Eigen automatically falls back to half register sizes for fixed-size types which are not a multiple of the full AVX register size (256bits). For instance, this concerns Vector4f and Vector2d types. | ** When AVX is enabled, Eigen automatically falls back to half register sizes for fixed-size types which are not a multiple of the full AVX register size (256bits). For instance, this concerns Vector4f and Vector2d types. | ||
+ | ** To try AVX512 explicit vectorization, in addition to compiling with -mavx512f, you need to define EIGEN_ENABLE_AVX512. | ||
* '''New in beta2:''' Eigen now enables vectorization of statically unaligned arrays through unaligned loads/stores. This concerns all architectures and all sizes. This new behavior can be disabled by defining EIGEN_UNALIGNED_VECTORIZE=0. | * '''New in beta2:''' Eigen now enables vectorization of statically unaligned arrays through unaligned loads/stores. This concerns all architectures and all sizes. This new behavior can be disabled by defining EIGEN_UNALIGNED_VECTORIZE=0. | ||
* '''New in beta1:''' Enable fixed-size alignment and vectorization on ARM. | * '''New in beta1:''' Enable fixed-size alignment and vectorization on ARM. | ||
Line 80: | Line 89: | ||
* The dense matrix-matrix product kernel has been significantly redesigned to make a best use of recent CPU architectures (i.e., wide SIMD registers, FMA). | * The dense matrix-matrix product kernel has been significantly redesigned to make a best use of recent CPU architectures (i.e., wide SIMD registers, FMA). | ||
− | |||
* The heuristic to determine the different cache-level blocking sizes has been significantly improved. | * The heuristic to determine the different cache-level blocking sizes has been significantly improved. | ||
* Reasonable defaults for cache sizes have been added for ARM, where we generally cannot query them at runtime. | * Reasonable defaults for cache sizes have been added for ARM, where we generally cannot query them at runtime. | ||
Line 86: | Line 94: | ||
** The criterion to switch to lazyProduct is: (m+n+k)<20 | ** The criterion to switch to lazyProduct is: (m+n+k)<20 | ||
* Enable Mx0 * 0xN matrix products. | * Enable Mx0 * 0xN matrix products. | ||
− | * '''New in beta2:''' | + | * '''New in beta2:''' Improved heuristics for switching between coeff-based and general matrix product implementation at compile-time. |
− | * '''New in beta2:''' Any BLAS library can now be used as a backend when compiling with EIGEN_USE_BLAS. | + | * '''New in beta2:''' Avoid dynamic memory allocation in fixed-size rank-updates, matrix products evaluated within a triangular part, and selfadjoint times matrix products. |
+ | * '''New in beta2:''' Any BLAS library can now be used as a backend when compiling with [http://eigen.tuxfamily.org/dox-devel/TopicUsingBlasLapack.html EIGEN_USE_BLAS]. | ||
=== Dense decompositions === | === Dense decompositions === | ||
− | * Eigen 3.3 includes a new '''Divide & Conquer SVD''' algorithm through the '''BDCSVD''' class. This new algorithm can be more than one order of magnitude faster than JacobiSVD for large matrices. | + | * Eigen 3.3 includes a new '''Divide & Conquer SVD''' algorithm through the [http://eigen.tuxfamily.org/dox-devel/classEigen_1_1BDCSVD.html '''BDCSVD'''] class. This new algorithm can be more than one order of magnitude faster than JacobiSVD for large matrices. |
− | * '''New in beta2:''' Eigen 3.3 includes a new [http://eigen.tuxfamily.org/dox-devel/classEigen_1_1CompleteOrthogonalDecomposition.html CompleteOrthogonalDecomposition] algorithm which can be used for minimal-norm problems (usually faster than SVD). | + | * '''New in beta2:''' Eigen 3.3 includes a new [http://eigen.tuxfamily.org/dox-devel/classEigen_1_1CompleteOrthogonalDecomposition.html '''CompleteOrthogonalDecomposition'''] algorithm which can be used for minimal-norm problems (usually faster than SVD). |
+ | * '''New in beta2:''' Add fast reciprocal condition number estimator in dense LU and Cholesky decompositions. | ||
+ | * '''New in beta2:''' Add support for [http://eigen.tuxfamily.org/dox-devel/group__InplaceDecomposition.html inplace] dense decompositions. | ||
* Various '''numerical robustness improvements''' in JacobiSVD, LDLT, LLT, 2x2 and 3x3 direct eigenvalues, ColPivHouseholderQR, FullPivHouseholderQR, and RealSchur. | * Various '''numerical robustness improvements''' in JacobiSVD, LDLT, LLT, 2x2 and 3x3 direct eigenvalues, ColPivHouseholderQR, FullPivHouseholderQR, and RealSchur. | ||
* FullPivLU: pivoting strategy can now be customized for any scalar types. | * FullPivLU: pivoting strategy can now be customized for any scalar types. | ||
* '''New in beta1:''' Add LU::transpose().solve() and LU::adjoint().solve() API ([http://eigen.tuxfamily.org/dox-devel/classEigen_1_1SolverBase.html#ac6a6b7e3ae2c0b3ca3808eba8089fa68 doc]). | * '''New in beta1:''' Add LU::transpose().solve() and LU::adjoint().solve() API ([http://eigen.tuxfamily.org/dox-devel/classEigen_1_1SolverBase.html#ac6a6b7e3ae2c0b3ca3808eba8089fa68 doc]). | ||
+ | * '''New in beta2:''' Any LAPACK library can now be used as a backend when compiling with [http://eigen.tuxfamily.org/dox-devel/TopicUsingBlasLapack.html EIGEN_USE_LAPACKE. | ||
+ | * '''New in beta2:''' An insightful doc [http://eigen.tuxfamily.org/dox-devel/group__DenseDecompositionBenchmark.html page] summarizing the true runtime performance of Eigen's dense decomposition algorithms. | ||
+ | * '''New in RC1:''' GeneralizedEigenSolver now extracts the eigenvectors. | ||
+ | * '''New in RC1:''' LDLT detects failures and report them through info(). | ||
=== Sparse matrices === | === Sparse matrices === | ||
Line 108: | Line 123: | ||
* '''New in beta2:''' Add SparseVector::conservativeResize() method. | * '''New in beta2:''' Add SparseVector::conservativeResize() method. | ||
* '''New in beta2:''' Add support for "dense +/- sparse" operations. | * '''New in beta2:''' Add support for "dense +/- sparse" operations. | ||
+ | * '''New in RC1:''' Extend sparse-selfadjoint * dense products to support scalar factor and +=/-= assignments. | ||
=== Sparse solvers === | === Sparse solvers === | ||
− | * Add a ''' | + | * Add a [http://eigen.tuxfamily.org/dox-devel/classEigen_1_1LeastSquaresConjugateGradient.html '''LeastSquaresConjugateGradient'''] solver for solving sparse problems of the form argmin_x |A x - b|^2 through the normal equation but without forming A^T A. |
− | * '''New in beta1:''' Add an '''IncompleteCholesky''' preconditioner suitable for ConjugateGradient and MINRES. | + | * '''New in beta1:''' Add an [http://eigen.tuxfamily.org/dox-devel/classEigen_1_1IncompleteCholesky.html '''IncompleteCholesky'''] preconditioner suitable for ConjugateGradient and MINRES. |
* Improve robustness of SimplicialLDLT to semidefinite problems by correctly handling structural zeros in AMD reordering. Very useful for solving SPD problems with equality constraints. | * Improve robustness of SimplicialLDLT to semidefinite problems by correctly handling structural zeros in AMD reordering. Very useful for solving SPD problems with equality constraints. | ||
* Add '''OpenMP''' support in ConjugateGradient, BiCGSTAB, and LeastSquareConjugateGradient. See the respective class's doc for details on the best use of this feature. | * Add '''OpenMP''' support in ConjugateGradient, BiCGSTAB, and LeastSquareConjugateGradient. See the respective class's doc for details on the best use of this feature. | ||
Line 123: | Line 139: | ||
* '''New in beta1:''' Add access to UmfPack return code and parameters. | * '''New in beta1:''' Add access to UmfPack return code and parameters. | ||
* '''New in beta2:''' Add logDeterminant and determinant functions to Cholmod* solvers. | * '''New in beta2:''' Add logDeterminant and determinant functions to Cholmod* solvers. | ||
+ | |||
+ | === Mixing scalar types ('''New in beta2''') === | ||
+ | |||
+ | Support for mixing scalar types in binary operators and functions has been considerably improved in Eigen 3.3. In particular, thanks to the new class [http://eigen.tuxfamily.org/dox-devel/structEigen_1_1ScalarBinaryOpTraits.html ScalarBinaryOpTraits] it is now possible to enable/disable the mixing of any pair of scalar types and to control the scalar return type. All binary functions have also been generalized to allow one of the argument to be a scalar. In this later case, the type of the scalar operand is promoted to the lightweight compatible scalar type is needed and possible. For instance, in: | ||
+ | <source lang="cpp"> | ||
+ | MatrixXcd A; | ||
+ | A = 2 * A; | ||
+ | </source> | ||
+ | the integer 2 is now nicely promoted to double instead of a complex<double>. | ||
+ | This smart type promotion can be partly controlled through the NumTraits<T>::Literal type defining the preferred type for literals. As the default is T or NumTraits<T>::Real for complexes, it is highly recommended to redefine for your custom scalar types to take full advantage of this new feature. | ||
=== Experimental CUDA support === | === Experimental CUDA support === | ||
Line 131: | Line 157: | ||
Eigen 3.3 includes a preview of a Tensor module for multi-dimensional arrays and tensors in [https://bitbucket.org/eigen/eigen/src/9b065de03d016d802a25366ff5f0055df6318121/unsupported/Eigen/CXX11/src/Tensor/README.md?at=default unsupported/Eigen/CXX11/Tensor]. | Eigen 3.3 includes a preview of a Tensor module for multi-dimensional arrays and tensors in [https://bitbucket.org/eigen/eigen/src/9b065de03d016d802a25366ff5f0055df6318121/unsupported/Eigen/CXX11/src/Tensor/README.md?at=default unsupported/Eigen/CXX11/Tensor]. | ||
− | It provides numerous features including slicing, coefficient-wise operations, reductions, contractions, convolution, multi-threading, CUDA, etc. | + | It provides numerous features including slicing, coefficient-wise operations, reductions, contractions, convolution, multi-threading, CUDA, OpenCL, etc. |
− | This module is mainly developed and used by Google. | + | This module is mainly developed by Google and used by Google's [https://www.tensorflow.org/ tensorflow] library. |
+ | Support for OpenCL has been mainly developed by CodePlay using SYCL. | ||
=== Miscellaneous === | === Miscellaneous === | ||
* Various numerical robustness improvements in stableNorm(), Hyperplane::Through(a,b,c), Quaternion::angularDistance. | * Various numerical robustness improvements in stableNorm(), Hyperplane::Through(a,b,c), Quaternion::angularDistance. | ||
− | * Add numerous coefficient-wise methods to Array: isFinite, isNaN, isInf, arg, log10, atan, tanh, sinh, cosh, round, floor, ceil, logical not, and generalize pow(x,e). '''New in beta1:''' sign, rsqrt | + | * Add numerous coefficient-wise methods to Array: isFinite, isNaN, isInf, arg, log10, atan, tanh, sinh, cosh, round, floor, ceil, logical not, and generalize pow(x,e). '''New in beta1:''' sign, rsqrt. (see the respective [http://eigen.tuxfamily.org/dox-devel/group__CoeffwiseMathFunctions.html summary table]) |
+ | * '''New in beta2:''' new unsupported/Eigen/SpecialFunctions module providing the following functions: erf, erfc, lgamma, igamma, igammac, digamma, polygamma, zeta, betainc. | ||
* '''New in beta1:''' Add support for row/col-wise lpNorm(). | * '''New in beta1:''' Add support for row/col-wise lpNorm(). | ||
* '''New in beta2:''' Add stableNormalize[d] methods: they are analogues to normalize[d] but with carefull handling of under/over-flow | * '''New in beta2:''' Add stableNormalize[d] methods: they are analogues to normalize[d] but with carefull handling of under/over-flow | ||
Line 150: | Line 178: | ||
* '''New in beta2:''' Add support for SelfadjointView::triangularView() and diagonal(). | * '''New in beta2:''' Add support for SelfadjointView::triangularView() and diagonal(). | ||
* '''New in beta2:''' Add Quaternion::UnitRandom() method. | * '''New in beta2:''' Add Quaternion::UnitRandom() method. | ||
+ | * '''New in beta2:''' Enable write access to partial reverse like: mat.rowwise().reverse() = ...; | ||
* '''New in beta2:''' Add a EIGEN_MAX_CPP_VER option to limit the C++ version to be used, as well as [http://eigen.tuxfamily.org/dox-devel/TopicPreprocessorDirectives.html#title1 fine grained options] to control individual language features. | * '''New in beta2:''' Add a EIGEN_MAX_CPP_VER option to limit the C++ version to be used, as well as [http://eigen.tuxfamily.org/dox-devel/TopicPreprocessorDirectives.html#title1 fine grained options] to control individual language features. | ||
+ | * '''New in RC1''' Improve support for custom nullary functors: now the functor only has to expose one relevant operator among f(), f(i), f(i,j). | ||
+ | * '''New in RC1''' [http://eigen.tuxfamily.org/dox-devel/classEigen_1_1DenseBase.html#aaef589c1dbd7fad93f97bd3fa1b1e768 LinSpaced] has been completely rewritten to leverage more numerical robustness and consistency for both real and integer scalar types. | ||
== Known issues == | == Known issues == | ||
Line 161: | Line 192: | ||
** For MSVC 2015 update 1: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1125 | ** For MSVC 2015 update 1: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1125 | ||
--> | --> | ||
+ | * MSVC 2013 triggers internal compilation errors if IntelliSense (/FR flag) is enabled. | ||
* MSVC+clang frontend is not supported yet, see: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1122 | * MSVC+clang frontend is not supported yet, see: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1122 | ||
<!-- | <!-- |
Latest revision as of 09:56, 6 September 2017
Eigen 3.3 has been released on November 10, 2016.
It can be downloaded from the Download section on the Main Page.
Since Eigen 3.2, the 3.3 development branch received more than 3500 commits [1] representing numerous major changes.
Release history:
- Eigen 3.3 alpha-1 was released on September 4, 2015. It includes about 1660 commits since 3.2. It includes all bug-fixes and improvements of the 3.2 branch up to the 3.2.5 version, as detailed in the respective change-logs: 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5.
- Eigen 3.3 beta-1 was released on December 16, 2015. It includes about 350 commits since 3.3 alpha1, as detailed in the respective change-log.
- Eigen 3.3 beta-2 was released on July 26, 2016. It includes more that 1000 commits since 3.3 beta1, as detailed in the respective change-log.
- Eigen 3.3 rc-1 was released on September 22, 2016. It includes more that 300 commits since 3.3 beta2, as detailed in the respective change-log.
- Eigen 3.3 rc-2 was released on November 04, 2016. It includes more that 90 commits since 3.3 rc1, as detailed in the respective change-log.
The major changes since 3.2 are summarized below.
Contents
- 1 Changes that might impact existing code
- 2 Expression evaluators
- 3 Index typedef
- 4 Vectorization
- 5 Dense products
- 6 Dense decompositions
- 7 Sparse matrices
- 8 Sparse solvers
- 9 Mixing scalar types (New in beta2)
- 10 Experimental CUDA support
- 11 Unsupported Tensor module
- 12 Miscellaneous
- 13 Known issues
Changes that might impact existing code
Eigen 3.3 fixes a few shortcomings that might impact existing code:
- Eigen 3.3 clarifies the usage of the Index type in SparseMatrix, PermutationMatrix, Transpositions, as detailed below.
- The normalize and normalized methods will now leave their input unchanged if its norm is 0 or too close to 0. Previously this resulted in a vector containing NaNs or infinities.
- internal::significant_decimals_impl is deprecated and users of custom scalar types are encouraged to overload NumTraits<>::digits10().
- Resolved in 3.3-rc2:
By default Eigen assumes that there is no aliasing between the left and right hand sides of an assignment, at the exception of matrix product factors. In Eigen 3.3, this exception is slightly relaxed to enable global optimization of the expressions. In particular, if a factor of a matrix product appears also on the left-hand-side of the assignment and that this factor gets resized by the assignment and that the product is not directly assigned, then Eigen 3.3 will produce an erroneous result, because the factor will be resized before the evaluation takes place. Here are some examples:
// Let's assume that m!=n MatrixXd A(m,n), B(n,n), X(n,k); X = (A*X).cwiseAbs(); // OK in 3.2, but not OK in 3.3 because X gets resized X.noalias() = (A*X).eval().cwiseAbs(); // Correct solution in 3.3 (also OK in 3.2) X = (B*X).cwiseAbs(); // OK in both 3.2 and 3.3 because X is not resized X = A*X; // OK in both 3.2 and 3.3 because the product is directly assigned
Expression evaluators
In Eigen 3.3, the evaluation mechanism of expressions has been completely rewritten. Even though this is really a major change, this mostly concerns internal details and most users should not notice it. In a nutshell, until Eigen 3.3, the evaluation strategy of expressions and subexpressions was decided at the construction time of expressions in a bottom-up approach. The novel strategy consists in completely deferring all these choices until the whole expression has to be evaluated. Decisions are now made in a top-down fashion allowing for more optimization opportunities, cleaner internal code, and easier extensibility.
Regarding novel expression-level optimizations, a typical example is the following:
MatrixXd A, B, C, D; A.noalias() = B + C * D;
Prior to Eigen 3.3, the "C*D" subexpression would have been evaluated into a temporary by the expression representing the addition operator. In other words, this expression would have been compiled to the following code:
tmp = C * D; A = B + tmp;
In Eigen 3.3, we can now have a view of the complete expression and generate the following temporary-free code:
A = B; A.noalias() += C * D;
This mechanism is applied to all expressions matching "d?=a+b*c" or "d?=a-b*c".
Index typedef
In Eigen 3.3, the "Index" typedef is now global and defined by default to std::ptrdiff_t:
namespace Eigen { typedef std::ptrdiff_t Index; }
This "Index" type is used throughout Eigen as the preferred type for both sizes and indices. It can be controlled globally through the EIGEN_DEFAULT_INDEX_TYPE macro. The usage of Eigen::DenseIndex and AnyExpression::Index are now deprecated. They are always equivalent to Eigen::Index.
For expressions storing an array of indices or sizes, the type for storage can be controlled per object through a template parameter. This type is consistently named "StorageIndex", and its default value is "int". See for instance the PermutationMatrix and SparseMatrix classes.
Warning: these changes might affect codes that used the SparseMatrix::Index type. In Eigen 3.2, this type was not documented and it was improperly defined as the storage index type (e.g., int), whereas it is now deprecated and always defined as Eigen::Index. Codes making use of SparseMatrix::Index, might thus likely have to be changed to use SparseMatrix::StorageIndex instead.
Vectorization
- Eigen 3.3 adds support for AVX (x86_64), FMA (x86_64), VSX (PowerPC), and New in beta2: ZVector (s390x/zEC13) and AVX512 (xeon phi) SIMD instruction sets.
- To enable AVX or FMA, you need to compile your code with these instruction sets enabled on the compiler side, for instance using the -mavx and -mfma options with gcc, clang or icc. AVX brings up to a x2 speed up for single and double precision floating point matrices by processing 8 and 4 scalar values at once respectively. Complexes are also supported. To achieve best performance, AVX requires 32 bytes aligned buffers. By default, Eigen's dense objects are thus automatically aligned on 32 bytes when AVX is enabled. Alignment behaviors can be controlled as detailed in this page.
- FMA stands for Fused-Multiple-Add. Currently, only Intel's FMA instruction set, as introduced in the Haswell micro-architecture, is supported, and it is explicitly exploited in matrix products for which a x1.7 speedup can be expected.
- When AVX is enabled, Eigen automatically falls back to half register sizes for fixed-size types which are not a multiple of the full AVX register size (256bits). For instance, this concerns Vector4f and Vector2d types.
- To try AVX512 explicit vectorization, in addition to compiling with -mavx512f, you need to define EIGEN_ENABLE_AVX512.
- New in beta2: Eigen now enables vectorization of statically unaligned arrays through unaligned loads/stores. This concerns all architectures and all sizes. This new behavior can be disabled by defining EIGEN_UNALIGNED_VECTORIZE=0.
- New in beta1: Enable fixed-size alignment and vectorization on ARM.
- New in beta1: Add vectorization of round, ceil, floor, for SSE4.1/AVX.
- New in beta2: Add vectorization of tanh for float (SSE/AVX)
- Many ARM NEON improvements, including support for ARMv8 (64-bit code), VFPv4 (fused-multiply-accumulate instruction), and correct tuning of the target number of vector registers.
- Add vectorization of exp and log for AltiVec/VSX.
- Add vectorization of Quaternion::conjugate for SSE/AVX
Dense products
- The dense matrix-matrix product kernel has been significantly redesigned to make a best use of recent CPU architectures (i.e., wide SIMD registers, FMA).
- The heuristic to determine the different cache-level blocking sizes has been significantly improved.
- Reasonable defaults for cache sizes have been added for ARM, where we generally cannot query them at runtime.
- The overhead for small products of dynamic sizes has been significantly reduced by falling-back at runtime to a coefficient-based product implementation (aka., lazyProduct).
- The criterion to switch to lazyProduct is: (m+n+k)<20
- Enable Mx0 * 0xN matrix products.
- New in beta2: Improved heuristics for switching between coeff-based and general matrix product implementation at compile-time.
- New in beta2: Avoid dynamic memory allocation in fixed-size rank-updates, matrix products evaluated within a triangular part, and selfadjoint times matrix products.
- New in beta2: Any BLAS library can now be used as a backend when compiling with EIGEN_USE_BLAS.
Dense decompositions
- Eigen 3.3 includes a new Divide & Conquer SVD algorithm through the BDCSVD class. This new algorithm can be more than one order of magnitude faster than JacobiSVD for large matrices.
- New in beta2: Eigen 3.3 includes a new CompleteOrthogonalDecomposition algorithm which can be used for minimal-norm problems (usually faster than SVD).
- New in beta2: Add fast reciprocal condition number estimator in dense LU and Cholesky decompositions.
- New in beta2: Add support for inplace dense decompositions.
- Various numerical robustness improvements in JacobiSVD, LDLT, LLT, 2x2 and 3x3 direct eigenvalues, ColPivHouseholderQR, FullPivHouseholderQR, and RealSchur.
- FullPivLU: pivoting strategy can now be customized for any scalar types.
- New in beta1: Add LU::transpose().solve() and LU::adjoint().solve() API (doc).
- New in beta2: Any LAPACK library can now be used as a backend when compiling with [http://eigen.tuxfamily.org/dox-devel/TopicUsingBlasLapack.html EIGEN_USE_LAPACKE.
- New in beta2: An insightful doc page summarizing the true runtime performance of Eigen's dense decomposition algorithms.
- New in RC1: GeneralizedEigenSolver now extracts the eigenvectors.
- New in RC1: LDLT detects failures and report them through info().
Sparse matrices
- MappedSparseMatrix is now deprecated and replaced by the more versatile Map<SparseMatrix> class. New in beta1: Ref<SparseVector> is supported too.
- Add support for Ref<SparseMatrix>.
- Add OpenMP parallelization of sparse * dense products. Currently, this is limited to row-major sparse matrices.
- New in beta1: Extend setFromTripplets API to allow passing a functor object controlling how to collapse duplicated entries.
- New in beta1: Optimise assignment into a sparse.block() such that, for instance, row-by-row filling of a row-major sparse matrix is very efficient.
- New in beta1: Add support for dense.cwiseProduct(sparse), thus enabling (dense*sparse).diagonal() expressions.
- New in beta1: Add support for the direct evaluation of the product of two sparse matrices within a dense matrix.
- New in beta2: Add SparseVector::conservativeResize() method.
- New in beta2: Add support for "dense +/- sparse" operations.
- New in RC1: Extend sparse-selfadjoint * dense products to support scalar factor and +=/-= assignments.
Sparse solvers
- Add a LeastSquaresConjugateGradient solver for solving sparse problems of the form argmin_x |A x - b|^2 through the normal equation but without forming A^T A.
- New in beta1: Add an IncompleteCholesky preconditioner suitable for ConjugateGradient and MINRES.
- Improve robustness of SimplicialLDLT to semidefinite problems by correctly handling structural zeros in AMD reordering. Very useful for solving SPD problems with equality constraints.
- Add OpenMP support in ConjugateGradient, BiCGSTAB, and LeastSquareConjugateGradient. See the respective class's doc for details on the best use of this feature.
- ConjugateGradient and BiCGSTAB now properly use a zero vector as the default guess.
- Allows Lower|Upper as a template argument of ConjugateGradient and MINRES: in this case the full matrix will be considered. This also simplifies the writing of matrix-free wrappers to ConjugateGradient.
- Improved numerical robustness in BiCGSTAB, SparseLU, SparseQR and SPQR.
- Add a determinant() method to SparseLU.
- Improved handling of generic inputs in both iterative and direct solvers.
- New in beta1: Improve support for matrix-free iterative solvers.
- New in beta1: Add access to UmfPack return code and parameters.
- New in beta2: Add logDeterminant and determinant functions to Cholmod* solvers.
Mixing scalar types (New in beta2)
Support for mixing scalar types in binary operators and functions has been considerably improved in Eigen 3.3. In particular, thanks to the new class ScalarBinaryOpTraits it is now possible to enable/disable the mixing of any pair of scalar types and to control the scalar return type. All binary functions have also been generalized to allow one of the argument to be a scalar. In this later case, the type of the scalar operand is promoted to the lightweight compatible scalar type is needed and possible. For instance, in:
MatrixXcd A; A = 2 * A;
the integer 2 is now nicely promoted to double instead of a complex<double>. This smart type promotion can be partly controlled through the NumTraits<T>::Literal type defining the preferred type for literals. As the default is T or NumTraits<T>::Real for complexes, it is highly recommended to redefine for your custom scalar types to take full advantage of this new feature.
Experimental CUDA support
Starting from Eigen 3.3, it is now possible to use Eigen's objects and algorithms within CUDA kernels. However, only a subset of features are supported to make sure that no dynamic allocation is triggered within a CUDA kernel.
Unsupported Tensor module
Eigen 3.3 includes a preview of a Tensor module for multi-dimensional arrays and tensors in unsupported/Eigen/CXX11/Tensor. It provides numerous features including slicing, coefficient-wise operations, reductions, contractions, convolution, multi-threading, CUDA, OpenCL, etc. This module is mainly developed by Google and used by Google's tensorflow library. Support for OpenCL has been mainly developed by CodePlay using SYCL.
Miscellaneous
- Various numerical robustness improvements in stableNorm(), Hyperplane::Through(a,b,c), Quaternion::angularDistance.
- Add numerous coefficient-wise methods to Array: isFinite, isNaN, isInf, arg, log10, atan, tanh, sinh, cosh, round, floor, ceil, logical not, and generalize pow(x,e). New in beta1: sign, rsqrt. (see the respective summary table)
- New in beta2: new unsupported/Eigen/SpecialFunctions module providing the following functions: erf, erfc, lgamma, igamma, igammac, digamma, polygamma, zeta, betainc.
- New in beta1: Add support for row/col-wise lpNorm().
- New in beta2: Add stableNormalize[d] methods: they are analogues to normalize[d] but with carefull handling of under/over-flow
- Add a determinant() method to PermutationMatrix.
- EIGEN_STACK_ALLOCATION_LIMIT: Raise its default value to 128KB, make use of it to assert on maximal fixed size objects, and allows it to be 0 to mean "no limit"
- Conversions from Quaternion to AngleAxis implicitly normalize the input quaternion.
- Improved support of c++11 while preserving a full compatibility with c++98.
- Eigen 2 support code has been removed. If you plan to migrate from Eigen 2 to 3, then it is recommended to first do the transition using the facilities offered in Eigen 3.2, drop all Eigen 2 deprecated features, and then move to Eigen 3.3.
- The BLAS interface library does not require a fortran compiler anymore.
- The LAPACK interface library has been extended with the following two SVD routines: ?gesdd, ? gesvd.
- New in beta2: enable zero-sized block at compile-time, e.g.: mat.block<0,0>(i,j)
- New in beta2: Add support for SelfadjointView::triangularView() and diagonal().
- New in beta2: Add Quaternion::UnitRandom() method.
- New in beta2: Enable write access to partial reverse like: mat.rowwise().reverse() = ...;
- New in beta2: Add a EIGEN_MAX_CPP_VER option to limit the C++ version to be used, as well as fine grained options to control individual language features.
- New in RC1 Improve support for custom nullary functors: now the functor only has to expose one relevant operator among f(), f(i), f(i,j).
- New in RC1 LinSpaced has been completely rewritten to leverage more numerical robustness and consistency for both real and integer scalar types.
Known issues
- MSVC 2013 triggers internal compilation errors if IntelliSense (/FR flag) is enabled.
- MSVC+clang frontend is not supported yet, see: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1122
- Unlike JacobiSVD, BDCSVD still involve heap memory allocations during the factorization which might be a problem for embedded systems.
- Slight increase of compiler memory consumption.
Footnotes
[1] $ hg log -r "aaec59ac0b4e:: and not merge() and not branch(3.2)" | grep "changeset:" | wc -l