Difference between revisions of "3.4"
From Eigen
(Created page with "Raw dump of the main novelties and improvements that will be part of the 3.4 release compared to the 3.3 branch: * Speed up evaluation of HouseholderSequence to a dense matri...") |
(→New Major Features in Core) |
||
(51 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | Eigen 3.4 was released on August 18 2021. It can be downloaded from the Download section on the | |
+ | [https://eigen.tuxfamily.org/index.php?title=Main_Page Main Page] or from [https://gitlab.com/libeigen/eigen/-/releases/3.4.0 Gitlab]. | ||
− | + | '''Notice:''' that 3.4.x will be the last major release series of Eigen that will support c++03. The master branch will drop c++03 support after this release. | |
− | MatrixXd | + | |
+ | == Changes to supported modules == | ||
+ | |||
+ | === Changes that might break existing code === | ||
+ | |||
+ | * Using float or double for indexing matrices, vectors and arrays will now fail to compile, ex.: | ||
+ | <source lang="cpp"> | ||
+ | MatrixXd A(10,10); | ||
+ | float one = 1; | ||
+ | double a11 = A(one,1.); // compilation error here | ||
</source> | </source> | ||
− | * | + | |
− | * ... | + | === New Major Features in Core === |
+ | |||
+ | * Add c++11 '''initializer_list constructors''' to Matrix and Array [http://eigen.tuxfamily.org/dox-devel/group__TutorialMatrixClass.html#title3 [doc]]: | ||
+ | <source lang="cpp"> | ||
+ | MatrixXi a { // construct a 2x3 matrix | ||
+ | {1,2,3}, // first row | ||
+ | {4,5,6} // second row | ||
+ | }; | ||
+ | VectorXd v{{1, 2, 3, 4, 5}}; // construct a dynamic-size vector with 5 elements | ||
+ | Array<int,1,5> a{1,2, 3, 4, 5}; // initialize a fixed-size 1D array of size 5. | ||
+ | </source> | ||
+ | |||
+ | * Add STL-compatible '''iterators''' for dense expressions [http://eigen.tuxfamily.org/dox-devel/group__TutorialSTL.html [doc]]. Some examples: | ||
+ | <source lang="cpp"> | ||
+ | VectorXd v = ...; | ||
+ | MatrixXd A = ...; | ||
+ | // range for loop over all entries of v then A | ||
+ | for(auto x : v) { cout << x << " "; } | ||
+ | for(auto x : A.reshaped()) { cout << x << " "; } | ||
+ | // sort v then each column of A | ||
+ | std::sort(v.begin(), v.end()); | ||
+ | for(auto c : A.colwise()) | ||
+ | std::sort(c.begin(), c.end()); | ||
+ | </source> | ||
+ | |||
+ | * New versatile API for sub-matrices, '''slices''', and '''indexed views''' [http://eigen.tuxfamily.org/dox-devel/group__TutorialSlicingIndexing.html [doc]]. It basically extends <code>A(.,.)</code> to let it accept anything that looks-like a sequence of indices with random access. To make it usable this new feature comes with new symbols: <code>Eigen::indexing::all</code>, <code>Eigen::indexing::last</code>, and functions generating arithmetic sequences: <code>Eigen::seq(first,last[,incr])</code>, <code>Eigen::seqN(first,size[,incr])</code>, <code>Eigen::lastN(size[,incr])</code>. Here is an example picking even rows but the first and last ones, and a subset of indexed columns: | ||
+ | <source lang="cpp"> | ||
+ | MatrixXd A = ...; | ||
+ | std::vector<int> col_ind{7,3,4,3}; | ||
+ | MatrixXd B = A(seq(2,last-2,fix<2>), col_ind); | ||
+ | </source> | ||
+ | |||
+ | * Add C++11 '''template aliases''' for Matrix, Vector, and Array of common sizes, including generic <code>Vector<Type,Size></code> and <code>RowVector<Type,Size></code> aliases [http://eigen.tuxfamily.org/dox-devel/group__matrixtypedefs.html [doc]]. | ||
+ | <source lang="cpp"> | ||
+ | MatrixX<double> M; // Instead of MatrixXd or Matrix<Dynamic, Dynamic, double> | ||
+ | Vector4<MyType> V; // Instead of Vector<4, MyType> | ||
+ | </source> | ||
+ | |||
+ | * New support for <code>bfloat16</code>. The 16-bit [https://en.wikipedia.org/wiki/Bfloat16_floating-point_format Brain floating point format] is now available as <code>Eigen::bfloat16</code>. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert back-and-forth between <code>uint16_t</code> to extract the bit representation, use <code>Eigen::numext::bit_cast</code>. | ||
+ | <source lang="cpp"> | ||
+ | bfloat16 s(0.25); // explicit construction | ||
+ | uint16_t s_bits = numext::bit_cast<uint16_t>(s); // bit representation | ||
+ | |||
+ | using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>; | ||
+ | MatrixBf16 X = s * MatrixBf16::Random(3, 3); | ||
+ | </source> | ||
+ | |||
+ | === New backends === | ||
+ | |||
+ | * '''Arm SVE:''' Eigen now supports Arm's [https://developer.arm.com/documentation/101726/0300/Learn-about-the-Scalable-Vector-Extension--SVE-/What-is-the-Scalable-Vector-Extension- Scalable Vector Extension (SVE)]. Currently only fixed-length SVE vectors for <code>uint32_t</code> and <code>float</code> are available. | ||
+ | * '''MIPS MSA:''' Eigen now supports the [https://www.mips.com/products/architectures/ase/simd/ MIPS SIMD Architecture (MSA)] | ||
+ | * '''AMD ROCm/HIP:''' Eigen now contains a generic GPU backend that unifies support for [https://developer.nvidia.com/cuda-toolkit NVIDIA/CUDA] and [https://rocmdocs.amd.com/en/latest/ AMD/HIP]. | ||
+ | * '''Power 10 MMA Backend:''' Eigen now has initial support for [https://arxiv.org/pdf/2104.03142.pdf Power 10 matrix multiplication assist instructions] for float32 and float64, real and complex. | ||
+ | |||
+ | === Improvements to Eigen Core === | ||
+ | * Eigen now uses the c++11 '''alignas''' keyword for static alignment. Users targeting C++17 only and recent compilers (e.g., GCC>=7, clang>=5, MSVC>=19.12) will thus be able to completely forget about all [http://eigen.tuxfamily.org/dox-devel/group__TopicUnalignedArrayAssert.html issues] related to static alignment, including <code>EIGEN_MAKE_ALIGNED_OPERATOR_NEW</code>. | ||
+ | * Various performance improvements for products and Eigen's GEBP and GEMV kernels have been implemented: | ||
+ | ** By using half- and quater-packets the performance of matrix multiplications of small to medium sized matrices has been improved | ||
+ | ** Eigen's GEMM now falls back to GEMV if it detects that a matrix is a run-time vector | ||
+ | ** The performance of matrix products using Arm Neon has been drastically improved (up to 20%) | ||
+ | ** Performance of many special cases of matrix products has been improved | ||
+ | * Large speed up from blocked algorithm for <code>.transposeInPlace</code>. | ||
+ | * Speed up misc. operations by propagating compile-time sizes (col/row-wise reverse, PartialPivLU, and others) | ||
+ | * Faster specialized SIMD kernels for small fixed-size inverse, LU decomposition, and determinant. | ||
+ | * Improved or added vectorization of partial or slice reductions along the outer-dimension, for instance: <code>colmajor_mat.rowwise().mean()</code> | ||
+ | |||
+ | === Elementwise math functions === | ||
+ | * Many functions are now implemented and vectorized in generic (backend-agnostic) form. | ||
+ | * Many improvements to correctness, accuracy, and compatibility with c++ standard library. | ||
+ | ** Much improved implementation of <code>ldexp</code>. | ||
+ | ** Misc. fixes for corner cases, NaN/Inf inputs and singular points of many functions. | ||
+ | ** New implementation of the Payne-Hanek for argument reduction algorithm for <code>sin</code> and <code>cos</code> with huge arguments. | ||
+ | ** New faithfully rounded algorithm for <code>pow(x,y)</code>. | ||
+ | * Speedups from (new or improved) vectorized versions of <code>pow, log, sin, cos, arg, pow, log2</code>, complex <code>sqrt, erf, expm1, logp1, logistic, rint, gamma</code> and <code>bessel</code> functions, and more. | ||
+ | * Improved special function support (Bessel and gamma functions, <code>ndtri, erfc</code>, inverse hyperbolic functions and more) | ||
+ | * New elementwise functions for <code>absolute_difference</code>, <code>rint</code>. | ||
+ | |||
+ | === Dense matrix decompositions and solvers === | ||
+ | * All dense linear solvers (i.e., Cholesky, *LU, *QR, CompleteOrthogonalDecomposition, *SVD) now inherit SolverBase and thus support <code>.transpose()</code>, <code>.adjoint()</code> and <code>.solve()</code> APIs. | ||
+ | * SVD implementations now have an <code>info()</code> method for checking convergence. | ||
+ | <source lang="cpp"> | ||
+ | #include <Eigen/SVD> | ||
+ | MatrixXf m = MatrixXf::Random(3,2); | ||
+ | JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV); | ||
+ | if (svd.info() == ComputationInfo::Success) { | ||
+ | // SVD computation was successful. | ||
+ | VectorXf x = svd.solve(b); | ||
+ | } | ||
+ | </source> | ||
+ | * Most decompositions now fail quickly when invalid inputs are detected. | ||
+ | * Optimized the product of a <code>HouseholderSequence</code> with the identity, as well as the evaluation of a <code>HouseholderSequence</code> to a dense matrix using faster blocked product. | ||
+ | * Fixed aliasing issues with in-place small matrix inversions. | ||
+ | * Fixed several edge-cases with empty or zero inputs. | ||
+ | |||
+ | === Sparse matrix support, decompositions and solvers === | ||
+ | * Enabled assignment and addition with diagonal matrix expressions. | ||
+ | <source lang="cpp"> | ||
+ | SparseMatrix<float> A(10, 10); | ||
+ | VectorXf x = VectorXf::Random(10); | ||
+ | A = x.asDiagonal(); | ||
+ | A += x.asDiagonal(); | ||
+ | </source> | ||
+ | * Support added for SuiteSparse KLU routines via the <code>KLUSupport</code> module. SuiteSparse must be installed to use this module. | ||
+ | <source lang="cpp"> | ||
+ | #include <Eigen/KLUSupport> | ||
+ | A.makeCompressed(); // Recommendation is to compress input before calling sparse solvers. | ||
+ | KLU<SparseMatrix<T> > klu(A); | ||
+ | if (klu.info() == ComputationInfo::Success) { | ||
+ | VectorXf x = klu.solve(b); | ||
+ | } | ||
+ | </source> | ||
+ | * <code>SparseCholesky</code> now works with row-major matrices. | ||
+ | * Various bug fixes and performance improvements. | ||
+ | |||
+ | === Type support === | ||
+ | * Improved support for <code>half</code> | ||
+ | ** Native support added for ARM <code>__fp16</code>, CUDA/HIP <code>__half</code>, and <code>F16C</code> conversion intrinsics. | ||
+ | ** Better vectorization support added across all backends. | ||
+ | * Improved bool support | ||
+ | ** Partial vectorization support added for boolean operations. | ||
+ | ** Significantly improved performance (x25) for logical operations with <code>Matrix</code> or <code>Tensor</code> of <code>bool</code>. | ||
+ | * Improved support for custom types | ||
+ | ** More custom types work out-of-the-box (see [https://gitlab.com/libeigen/eigen/-/issues/2201 #2201]). | ||
+ | |||
+ | === Improved Geometry Module === | ||
+ | * '''Behavioral change:''' <code>Transform::computeRotationScaling()</code> and <code>Transform::computeScalingRotation()</code> are now more continuous across degeneracies (see [https://gitlab.com/libeigen/eigen/-/merge_requests/349 !349]). | ||
+ | * New partial vectorization support added for <code>Quaternion</code>. | ||
+ | * Generic vectorized 4x4 matrix inversion. | ||
+ | |||
+ | === Backend-specific improvements === | ||
+ | * '''Arm NEON''' | ||
+ | ** Now provides vectorization for <code>uint64_t</code>, <code>int64_t</code>, <code>uint32_t</code>, <code>int16_t</code>, <code>uint16_t</code>, <code>int16_t</code>, <code>int8_t</code>, and <code>uint8_t</code> | ||
+ | ** Emulates <code>bfloat16</code> support when using <code>Eigen::bfloat16</code> | ||
+ | ** Supports emulated and native <code>float16</code> when using <code>Eigen::half</code> | ||
+ | * '''SSE/AVX/AVX512''' | ||
+ | ** General performance improvements and bugfixes. | ||
+ | ** Enabled AVX512 instructions by default if available. | ||
+ | ** New <code>std::complex</code>, <code>half</code>, and <code>bfloat16</code> vectorization support added. | ||
+ | ** Many missing packet functions added. | ||
+ | * '''Altivec/Power''' | ||
+ | ** General performance improvement and bugfixes. | ||
+ | ** Enhanced vectorization of real and complex scalars. | ||
+ | ** Changes to the <code>gebp_kernel</code> specific to Altivec, using VSX implementation of the MMA instructions that gain speed improvements up to 4x for matrix-matrix products. | ||
+ | ** Dynamic dispatch for GCC greater than 10 enabling selection of MMA or VSX instructions based on <code>__builtin_cpu_supports</code>. | ||
+ | * '''GPU (CUDA and HIP)''' | ||
+ | ** Several optimized math functions added, better support for <code>std::complex</code>. | ||
+ | ** Added option to disable CUDA entirely by defining <code>EIGEN_NO_CUDA</code>. | ||
+ | ** Many more functions can now be used in device code (e.g. comparisons, small matrix inversion). | ||
+ | * '''ZVector''' | ||
+ | ** Vectorized <code>float</code> and <code>std::complex<float></code> support added. | ||
+ | ** Added z14 support. | ||
+ | * '''SYCL''' | ||
+ | ** Redesigned SYCL implementation for use with the [https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html Tensor] module, which can be enabled by defining <code>EIGEN_USE_SYCL</code>. | ||
+ | ** New generic memory model introduced used by <code>TensorDeviceSycl</code>. | ||
+ | ** Better integration with OpenCL devices. | ||
+ | ** Added many math function specializations. | ||
+ | |||
+ | === Miscellaneous API Changes === | ||
+ | * New <code>setConstant(...)</code> methods for preserving one dimension of a matrix by passing in <code>NoChange</code>. | ||
+ | <source lang="cpp"> | ||
+ | MatrixXf A(10, 5); // 10x5 matrix. | ||
+ | A.setConstant(NoChange, 10, 2); // 10x10 matrix of 2s. | ||
+ | A.setConstant(5, NoChange, 3); // 5x10 matrix of 3s. | ||
+ | A.setZero(NoChange, 20); // 5x20 matrix of 0s. | ||
+ | A.setZero(20, NoChange); // 20x20 matrix of 0s. | ||
+ | A.setOnes(NoChange, 5); // 20x5 matrix of 1s. | ||
+ | A.setOnes(5, NoChange); // 5x5 matrix of 1s. | ||
+ | A.setRandom(NoChange, 10); // 5x10 random matrix. | ||
+ | A.setRandom(10, NoChange); // 10x10 random matrix. | ||
+ | </source> | ||
+ | * Added <code>setUnit(Index i)</code> for vectors that sets the ''i'' th coefficient to one and all others to zero. | ||
+ | <source lang="cpp"> | ||
+ | VectorXf v(5); | ||
+ | v.setUnit(3); // { 0, 0, 0, 1, 0} | ||
+ | </source> | ||
+ | * Added <code>transpose()</code>, <code>adjoint()</code>, <code>conjugate()</code> methods to <code>SelfAdjointView</code>. | ||
+ | * Added <code>shiftLeft<N>()</code> and <code>shiftRight<N>()</code> coefficient-wise arithmetic shift functions to Arrays. | ||
+ | <source lang="cpp"> | ||
+ | ArrayXXi A = ArrayXXi::Random(2, 3); | ||
+ | ArrayXXi B = A.shiftRight<2>(); | ||
+ | ArrayXXi C = A.shiftLeft<6>(); | ||
+ | </source> | ||
+ | * Enabled adding and subtracting of diagonal expressions. | ||
+ | <source lang="cpp"> | ||
+ | VectorXf x = VectorXf::Random(5); | ||
+ | VectorXf y = VectorXf::Random(5); | ||
+ | MatrixXf A = MatrixXf::Identity(5, 5); | ||
+ | A += x.asDiagonal() - y.asDiagonal(); | ||
+ | </source> | ||
+ | * Allow user-defined default cache sizes via defining <code>EIGEN_DEFAULT_L1_CACHE_SIZE</code>, ..., <code>EIGEN_DEFAULT_L3_CACHE_SIZE</code>. | ||
+ | * Added <code>EIGEN_ALIGNOF(X)</code> macro for determining alignment of a provided variable. | ||
+ | * Allow plugins for <code>VectorwiseOp</code> by defining a file <code>EIGEN_VECTORWISEOP_PLUGIN</code> (e.g. <code>-DEIGEN_VECTORWISEOP_PLUGIN=my_vectorwise_op_plugins.h</code>). | ||
+ | * Allow disabling of IO operations by defining <code>EIGEN_NO_IO</code>. | ||
+ | |||
+ | === Improvement to NaN propagation === | ||
+ | |||
+ | * Improvements to NaN correctness for elementwise functions. | ||
+ | * New <code>NaNPropagation</code> template argument to control whether NaNs are propagated or suppressed in elementwise <code>min/max</code> and corresponding reductions on <code>Array</code>, <code>Matrix</code>, and <code>Tensor</code>. Example for max: | ||
+ | <source lang="cpp"> | ||
+ | // Elementwise maximum | ||
+ | Eigen::MatrixXf left, right, r0, r1, r2; | ||
+ | r0 = left.cwiseMax(right); // Implementation defined behavior. | ||
+ | // Propagate NaN if either argument is NaN. | ||
+ | r1 = left.template cwiseMax<PropagateNaN>(right); | ||
+ | // Suppress NaN if at least one argument is not a NaN. | ||
+ | r2 = left.template cwiseMax<PropagateNumbers>(right); | ||
+ | |||
+ | // Max reductions | ||
+ | Eigen::MatrixXf m; | ||
+ | float nan_or_max = m.maxCoeff(); // Implementation defined behavior. | ||
+ | float nan_if_any_or_max = m.template maxCoeff<PropagateNaN>(); | ||
+ | float nan_if_all_or_max = m.template maxCoeff<PropagateNumbers>(); | ||
+ | </source> | ||
+ | |||
+ | == Changes to unsupported modules == | ||
+ | === New low-latency non-blocking ThreadPool module === | ||
+ | * Originally a part of the Tensor module, <code>Eigen::ThreadPool</code> is now separate and more portable, and forms the basis for multi-threading in TensorFlow, for example. Example: | ||
+ | <source lang="cpp"> | ||
+ | #include <Eigen/CXX11/ThreadPool> | ||
+ | |||
+ | const int num_threads = 42; | ||
+ | Eigen::ThreadPool tp(num_threads); | ||
+ | auto do_stuff = []() { ... }; | ||
+ | tp.Schedule(do_stuff); | ||
+ | </source> | ||
+ | |||
+ | === Changes to Tensor module === | ||
+ | * Support for c++03 was officially dropped in Tensor module, since most of the code was written in c++11 anyway. This will prevent building the code for CUDA with older version of <code>nvcc</code>. | ||
+ | * Performance optimizations of Tensor contraction | ||
+ | ** Speed up "outer-product-like" operations by parallelizing over the contraction dimension, using thread_local buffers and recursive work splitting. | ||
+ | ** Improved threading heuristics. | ||
+ | ** Support for fusing element-wise operations into contraction during evaluation. Example: | ||
+ | <source lang="cpp"> | ||
+ | // This example applies std::sqrt to all output elements from a tensor contraction. | ||
+ | // The optional OutputKernel argument to the contraction in this example is a functor over a | ||
+ | // 2-dimensional buffer. The functor is called once for each output block of the contraction | ||
+ | // result, to perform the elementwise sqrt operation while the block is hot in cache. | ||
+ | struct SqrtOutputKernel { | ||
+ | template <typename Index, typename Scalar> | ||
+ | EIGEN_ALWAYS_INLINE void operator()( | ||
+ | const internal::blas_data_mapper<Scalar, Index, ColMajor>& output_mapper, | ||
+ | const TensorContractionParams&, Index, Index, Index num_rows, | ||
+ | Index num_cols) const { | ||
+ | for (int i = 0; i < num_rows; ++i) { | ||
+ | for (int j = 0; j < num_cols; ++j) { | ||
+ | output_mapper(i, j) = std::sqrt(output_mapper(i, j)); | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | }; | ||
+ | |||
+ | Tensor<float, 4, DataLayout> left(30, 50, 8, 31); | ||
+ | Tensor<float, 5, DataLayout> right(8, 31, 7, 20, 10); | ||
+ | Tensor<float, 5, DataLayout> result(30, 50, 7, 20, 10); | ||
+ | Eigen::array<DimPair, 2> dims({{DimPair(2, 0), DimPair(3, 1)}}); | ||
+ | |||
+ | result = left.contract(right, dims, SqrtOutputKernel()); | ||
+ | </source> | ||
+ | |||
+ | * Performance optimizations of other Tensor operator | ||
+ | ** Speedups from improved vectorization, block evaluation, and multi-threading for most operators. | ||
+ | ** Significant speedup to broadcasting. | ||
+ | ** Reduction of index computation overhead, e.g. using fast divisors in TensorGenerator, squeezing dimensions in TensorPadding. | ||
+ | * Complete rewrite of the block (tiling) evaluation framework for tensor expressions lead to significant speedups and reduced number of memory allocations. | ||
+ | * Added new API for asynchronous evaluation of tensor expressions. Example: | ||
+ | <source lang="cpp"> | ||
+ | Tensor<float, 3> in1(200, 30, 70); | ||
+ | Tensor<float, 3> in2(200, 30, 70); | ||
+ | Tensor<float, 3> out(200, 30, 70); | ||
+ | |||
+ | Eigen::ThreadPool tp(internal::random<int>(3, 11)); | ||
+ | Eigen::ThreadPoolDevice thread_pool_device(&tp, internal::random<int>(3, 11)); | ||
+ | |||
+ | Eigen::Barrier b(1); | ||
+ | auto done = [&b]() { b.Notify(); }; | ||
+ | out.device(thread_pool_device, std::move(done)) = in1 + in2 * 3.14f; | ||
+ | b.Wait(); | ||
+ | </source> | ||
+ | * Misc. minor behavior changes & fixes: | ||
+ | ** Fix const correctness for TensorMap. | ||
+ | ** Modify tensor argmin/argmax to always return first occurrence. | ||
+ | ** More numerically stable tree reduction. | ||
+ | ** Improve randomness of the tensor random generator. | ||
+ | ** Update the padding computation for PADDING_SAME to be consistent with TensorFlow. | ||
+ | ** Support static dimensions (aka IndexList) in resizing/reshape/broadcast. | ||
+ | ** Improved accuracy of Tensor FFT. | ||
+ | |||
+ | === Improvements to FFT module === | ||
+ | |||
+ | * Faster and more accurate twiddle factor computation. | ||
+ | |||
+ | === Improvements to EulerAngles === | ||
+ | |||
+ | * EulerAngles can now be directly constructed from 3D vectors | ||
+ | * EulerAngles now provide <code>isApprox()</code> and <code>cast()</code> functions | ||
+ | |||
+ | === Changes to sparse iterative solvers === | ||
+ | * Added new IDRS iterative linear solver. | ||
+ | <source lang="cpp"> | ||
+ | #include <unsupported/Eigen/IterativeSolvers> | ||
+ | A.makeCompressed(); // Recommendation is to compress input before calling sparse solvers. | ||
+ | IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A); | ||
+ | VectorXf x = idrs.solve(b); | ||
+ | bool success = (idrs.info() == ComputationInfo::Success); | ||
+ | </source> | ||
+ | |||
+ | === Improvements to Polynomials === | ||
+ | |||
+ | * PolynomialSolver can now be used with complex numbers | ||
+ | * The solver will automatically choose between <code>EigenSolver</code> and <code>ComplexEigenSolver</code> depending on the scalar type used | ||
+ | |||
+ | == Other relevant changes == | ||
+ | |||
+ | * Eigen now provides an option to test with an external BLAS library | ||
+ | * Eigen can now be used with the [https://en.wikipedia.org/wiki/The_Portland_Group PGI Compiler] | ||
+ | * Printing when using GDB has been improved | ||
+ | * Eigen can now detect if a platform supports <code>int128</code> intrinsics | ||
+ | |||
+ | == Testing == | ||
+ | The full Eigen test suite was built and run successfully (in c++03 and c++11 mode) with the following compiler/platform/OS combinations: | ||
+ | |||
+ | {| class="wikitable" | ||
+ | !Compiler !! Version !! Platform !! Operating system | ||
+ | |- | ||
+ | |Microsoft Visual Studio || 2015 Update 3 || x86-64 || Windows | ||
+ | |- | ||
+ | |Microsoft Visual Studio || Community 2017 - 15.9.38 || x86-64 || Windows | ||
+ | |- | ||
+ | |Microsoft Visual Studio || Community 2019 - 16.11 || x86-64 || Windows | ||
+ | |- | ||
+ | |GCC || 4.8 || x86-64 || Linux | ||
+ | |- | ||
+ | |GCC || 9 || x86-64 || Linux | ||
+ | |- | ||
+ | |GCC || 10 || x86-64 || Linux | ||
+ | |- | ||
+ | |Clang || 6.0 || x86-64 || Linux | ||
+ | |- | ||
+ | |Clang || 10 || x86-64 || Linux | ||
+ | |- | ||
+ | |Clang || 11 || x86-64 || Linux | ||
+ | |- | ||
+ | |GCC || 10 || armv8.2-a || Linux | ||
+ | |- | ||
+ | |Clang || 6 || armv8.2-a || Linux | ||
+ | |- | ||
+ | |Clang || 9 || armv8.2-a || Linux | ||
+ | |- | ||
+ | |Clang || 10 || armv8.2-a || Linux | ||
+ | |- | ||
+ | |Clang || 11 || armv8.2-a || Linux | ||
+ | |- | ||
+ | |AppleClang || 12.0.5 || x86-64 || macOS | ||
+ | |- | ||
+ | |GCC || 10 || ppc64le || Linux | ||
+ | |- | ||
+ | |Clang || 10 || ppc64le || Linux | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | == List of issues fixed in Eigen 3.4 == | ||
+ | |||
+ | {| | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2298 Issue #2298] | ||
+ | | List of dense linear decompositions lacks completeorthogonal decomposition | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2284 Issue #2284] | ||
+ | | JacobiSVD Outputs Invalid U (Reads Past End of Array) | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2267 Issue #2267] | ||
+ | | [3.4 bug] FixedInt<0> error with gcc 4.9.3 | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2263 Issue #2263] | ||
+ | | usage of signed zeros leads to wrong results with -ffast-math | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2251 Issue #2251] | ||
+ | | Method unaryExpr() does not support function pointers in Eigen 3.4rc1 | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2242 Issue #2242] | ||
+ | | No matching function for call to \"...\" in 'Complex.h' and 'GenericPacketMathFunctions.h' | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2229 Issue #2229] | ||
+ | | Copies (& potentially moves?) of Eigen object with large unused MaxRows/ColAtCompileTime are slow (Regression from Eigen 3.2) | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2213 Issue #2213] | ||
+ | | template maxCoeff<PropagateNaN> compilation error with Eigen 3.4. | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2209 Issue #2209] | ||
+ | | unaryExpr deduces wrong return type on MSVC | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2157 Issue #2157] | ||
+ | | forward_adolc test fails since PR !363 | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2119 Issue #2119] | ||
+ | | Move assignment swaps even for non-dynamic storage | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2112 Issue #2112] | ||
+ | | Build failure with boost::multiprecision type | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/2093 Issue #2093] | ||
+ | | Incorrect evaluation of Ref | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1906 Issue #1906] | ||
+ | | Eigen failed with error C2440 with MSVC on windows | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1850 Issue #1850] | ||
+ | | error C4996: 'std::result_of<T>': warning STL4014: std::result_of and std::result_of_t are deprecated in C++17. They are superseded by std::invoke_result and std::invoke_result_t | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1833 Issue #1833] | ||
+ | | c++20 compilation failure | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1826 Issue #1826] | ||
+ | | -Wdeprecated-anon-enum-enum-conversion warnings (c++20) | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1815 Issue #1815] | ||
+ | | IndexedView of a vector should allow linear access | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1805 Issue #1805] | ||
+ | | Uploaded doxygen documentation does not build LaTeX formulae | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1790 Issue #1790] | ||
+ | | packetmath_1 unit test fails | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1788 Issue #1788] | ||
+ | | Rule-of-three/rule-of-five violations | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1776 Issue #1776] | ||
+ | | subvector_stl_iterator::operator-> triggers 'taking address of rvalue' warning | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1774 Issue #1774] | ||
+ | | std::cbegin() returns non-const iterator | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1752 Issue #1752] | ||
+ | | A change to the C++ Standard will break some tests | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1741 Issue #1741] | ||
+ | | Map<>.noalias()=A*B gives wrong result | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1736 Issue #1736] | ||
+ | | Column access of some IndexedView won't compile | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1718 Issue #1718] | ||
+ | | Use of builtin vec_sel is ambiguous when compiling with Clang for PowerPC | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1695 Issue #1695] | ||
+ | | Stuck in loop for a certain input when using mpreal support | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1692 Issue #1692] | ||
+ | | pass enumeration argument to constructor of VectorXd | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1684 Issue #1684] | ||
+ | | array_reverse fails with clang >=6 + AVX + -O2 | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1674 Issue #1674] | ||
+ | | SIMD sin/cos gives wrong results with -ffast-math | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1669 Issue #1669] | ||
+ | | Zero-sized matrices generate assertion failures | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1664 Issue #1664] | ||
+ | | dot product with single column block fails with new static checks | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1652 Issue #1652] | ||
+ | | Corner cases in SIMD sin/cos | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1643 Issue #1643] | ||
+ | | Compilation failure | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1637 Issue #1637] | ||
+ | | Register spilling with recent gcc & clang | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1619 Issue #1619] | ||
+ | | const_iterator vs iterator compilation error | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1615 Issue #1615] | ||
+ | | Performance of (aliased) matrix multiplication with fixed size 3x3 matrices slow | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1611 Issue #1611] | ||
+ | | NEON: plog(+/-0) should return -inf and not NaN | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1585 Issue #1585] | ||
+ | | Matrix product is repeatedly evaluated when iterating over the product expression | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1557 Issue #1557] | ||
+ | | Fail to compute eigenvalues for a simple 3x3 companion matrix for root finding | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1544 Issue #1544] | ||
+ | | SparseQR generates incorrect Q matrix in complex case | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1543 Issue #1543] | ||
+ | | \"Fix linear indexing in generic block evaluation\" breaks Matrix*Diagonal*Vector product | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1493 Issue #1493] | ||
+ | | dense Q extraction and solve is sometimes erroneous for complex matrices | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1453 Issue #1453] | ||
+ | | Strange behavior for Matrix::Map, if only InnerStride is provided | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1409 Issue #1409] | ||
+ | | Add support for C++17 operator new alignment | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1340 Issue #1340] | ||
+ | | Add operator + to sparse matrix iterator | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1318 Issue #1318] | ||
+ | | More robust quaternion from matrix | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1306 Issue #1306] | ||
+ | | Add support for AVX512 to Eigen | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1305 Issue #1305] | ||
+ | | Implementation of additional component-wise unary functions | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1221 Issue #1221] | ||
+ | | I get tons of error since my distribution upgraded to GCC 6.1.1 | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1195 Issue #1195] | ||
+ | | vectorization_logic fails: Matrix3().cwiseQuotient(Matrix3()) expected CompleteUnrolling, got NoUnrolling | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1194 Issue #1194] | ||
+ | | Improve det4x4 | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1049 Issue #1049] | ||
+ | | std::make_shared fails to fulfill structure aliment | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1046 Issue #1046] | ||
+ | | fixed matrix types do not report correct alignment requirements | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1014 Issue #1014] | ||
+ | | Eigenvalues 3x3 matrix | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/1001 Issue #1001] | ||
+ | | infer dimensions of Dynamic-sized temporaries from the entire expression (if possible) | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/977 Issue #977] | ||
+ | | Add stable versions of normalize() and normalized() | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/899 Issue #899] | ||
+ | | SparseQR occasionally fails for under-determined systems | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/864 Issue #864] | ||
+ | | C++11 alias templates for commonly used types | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/751 Issue #751] | ||
+ | | Make AMD Ordering numerically more robust | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/747 Issue #747] | ||
+ | | Allow for negative stride | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/720 Issue #720] | ||
+ | | Gaussian NullaryExpr | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/663 Issue #663] | ||
+ | | Permit NoChange in setZero, setOnes, setConstant, setRandom | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/645 Issue #645] | ||
+ | | GeneralizedEigenSolver: missing computation of eigenvectors | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/632 Issue #632] | ||
+ | | Optimize addition/subtraction of sparse and dense matrices/vectors | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/631 Issue #631] | ||
+ | | (Optionally) throw an exception when using an unsuccessful decomposition | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/564 Issue #564] | ||
+ | | maxCoeff() returns -nan instead of max, while maxCoeff(&maxRow, &maxCol) works | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/556 Issue #556] | ||
+ | | Matrix multiplication crashes using mingw 4.7 | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/505 Issue #505] | ||
+ | | Assert if temporary objects that are still referred to get destructed (was: Misbehaving Product on C++11) | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/445 Issue #445] | ||
+ | | ParametrizedLine should have transform method | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/437 Issue #437] | ||
+ | | [feature request] Add Reshape Operation | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/426 Issue #426] | ||
+ | | Behavior of sum() for Matrix<bool> is unexpected and confusing | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/329 Issue #329] | ||
+ | | Feature request: Ability to get a \"view\" into a sub-matrix by indexing it with a vector or matrix of indices | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/231 Issue #231] | ||
+ | | STL compatible iterators | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/96 Issue #96] | ||
+ | | Clean internal::result_of | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/65 Issue #65] | ||
+ | | Core - optimize partial reductions | ||
+ | |- | ||
+ | | [https://gitlab.com/libeigen/eigen/-/issues/64 Issue #64] | ||
+ | | Tests : precision-oriented tests | ||
+ | |} | ||
+ | |||
+ | == Additional information == | ||
+ | * A curated list of commits, approximately organized by the same topics as the release notes above, and sorted in reverse chronological order can be found [https://docs.google.com/document/d/e/2PACX-1vSGvp4Kv9dJ-gKzJN4CBjppP46flDbe3pJtI9N3m3WkKSoLXmANXuK5gJlw1CPcpCfjAWhgXAtQNzm-/pub here]. |
Latest revision as of 15:26, 14 October 2021
Eigen 3.4 was released on August 18 2021. It can be downloaded from the Download section on the Main Page or from Gitlab.
Notice: that 3.4.x will be the last major release series of Eigen that will support c++03. The master branch will drop c++03 support after this release.
Contents
- 1 Changes to supported modules
- 1.1 Changes that might break existing code
- 1.2 New Major Features in Core
- 1.3 New backends
- 1.4 Improvements to Eigen Core
- 1.5 Elementwise math functions
- 1.6 Dense matrix decompositions and solvers
- 1.7 Sparse matrix support, decompositions and solvers
- 1.8 Type support
- 1.9 Improved Geometry Module
- 1.10 Backend-specific improvements
- 1.11 Miscellaneous API Changes
- 1.12 Improvement to NaN propagation
- 2 Changes to unsupported modules
- 3 Other relevant changes
- 4 Testing
- 5 List of issues fixed in Eigen 3.4
- 6 Additional information
Changes to supported modules
Changes that might break existing code
- Using float or double for indexing matrices, vectors and arrays will now fail to compile, ex.:
MatrixXd A(10,10); float one = 1; double a11 = A(one,1.); // compilation error here
New Major Features in Core
- Add c++11 initializer_list constructors to Matrix and Array [doc]:
MatrixXi a { // construct a 2x3 matrix {1,2,3}, // first row {4,5,6} // second row }; VectorXd v{{1, 2, 3, 4, 5}}; // construct a dynamic-size vector with 5 elements Array<int,1,5> a{1,2, 3, 4, 5}; // initialize a fixed-size 1D array of size 5.
- Add STL-compatible iterators for dense expressions [doc]. Some examples:
VectorXd v = ...; MatrixXd A = ...; // range for loop over all entries of v then A for(auto x : v) { cout << x << " "; } for(auto x : A.reshaped()) { cout << x << " "; } // sort v then each column of A std::sort(v.begin(), v.end()); for(auto c : A.colwise()) std::sort(c.begin(), c.end());
- New versatile API for sub-matrices, slices, and indexed views [doc]. It basically extends
A(.,.)
to let it accept anything that looks-like a sequence of indices with random access. To make it usable this new feature comes with new symbols:Eigen::indexing::all
,Eigen::indexing::last
, and functions generating arithmetic sequences:Eigen::seq(first,last[,incr])
,Eigen::seqN(first,size[,incr])
,Eigen::lastN(size[,incr])
. Here is an example picking even rows but the first and last ones, and a subset of indexed columns:
MatrixXd A = ...; std::vector<int> col_ind{7,3,4,3}; MatrixXd B = A(seq(2,last-2,fix<2>), col_ind);
- Add C++11 template aliases for Matrix, Vector, and Array of common sizes, including generic
Vector<Type,Size>
andRowVector<Type,Size>
aliases [doc].
MatrixX<double> M; // Instead of MatrixXd or Matrix<Dynamic, Dynamic, double> Vector4<MyType> V; // Instead of Vector<4, MyType>
- New support for
bfloat16
. The 16-bit Brain floating point format is now available asEigen::bfloat16
. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert back-and-forth betweenuint16_t
to extract the bit representation, useEigen::numext::bit_cast
.
bfloat16 s(0.25); // explicit construction uint16_t s_bits = numext::bit_cast<uint16_t>(s); // bit representation using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>; MatrixBf16 X = s * MatrixBf16::Random(3, 3);
New backends
- Arm SVE: Eigen now supports Arm's Scalable Vector Extension (SVE). Currently only fixed-length SVE vectors for
uint32_t
andfloat
are available. - MIPS MSA: Eigen now supports the MIPS SIMD Architecture (MSA)
- AMD ROCm/HIP: Eigen now contains a generic GPU backend that unifies support for NVIDIA/CUDA and AMD/HIP.
- Power 10 MMA Backend: Eigen now has initial support for Power 10 matrix multiplication assist instructions for float32 and float64, real and complex.
Improvements to Eigen Core
- Eigen now uses the c++11 alignas keyword for static alignment. Users targeting C++17 only and recent compilers (e.g., GCC>=7, clang>=5, MSVC>=19.12) will thus be able to completely forget about all issues related to static alignment, including
EIGEN_MAKE_ALIGNED_OPERATOR_NEW
. - Various performance improvements for products and Eigen's GEBP and GEMV kernels have been implemented:
- By using half- and quater-packets the performance of matrix multiplications of small to medium sized matrices has been improved
- Eigen's GEMM now falls back to GEMV if it detects that a matrix is a run-time vector
- The performance of matrix products using Arm Neon has been drastically improved (up to 20%)
- Performance of many special cases of matrix products has been improved
- Large speed up from blocked algorithm for
.transposeInPlace
. - Speed up misc. operations by propagating compile-time sizes (col/row-wise reverse, PartialPivLU, and others)
- Faster specialized SIMD kernels for small fixed-size inverse, LU decomposition, and determinant.
- Improved or added vectorization of partial or slice reductions along the outer-dimension, for instance:
colmajor_mat.rowwise().mean()
Elementwise math functions
- Many functions are now implemented and vectorized in generic (backend-agnostic) form.
- Many improvements to correctness, accuracy, and compatibility with c++ standard library.
- Much improved implementation of
ldexp
. - Misc. fixes for corner cases, NaN/Inf inputs and singular points of many functions.
- New implementation of the Payne-Hanek for argument reduction algorithm for
sin
andcos
with huge arguments. - New faithfully rounded algorithm for
pow(x,y)
.
- Much improved implementation of
- Speedups from (new or improved) vectorized versions of
pow, log, sin, cos, arg, pow, log2
, complexsqrt, erf, expm1, logp1, logistic, rint, gamma
andbessel
functions, and more. - Improved special function support (Bessel and gamma functions,
ndtri, erfc
, inverse hyperbolic functions and more) - New elementwise functions for
absolute_difference
,rint
.
Dense matrix decompositions and solvers
- All dense linear solvers (i.e., Cholesky, *LU, *QR, CompleteOrthogonalDecomposition, *SVD) now inherit SolverBase and thus support
.transpose()
,.adjoint()
and.solve()
APIs. - SVD implementations now have an
info()
method for checking convergence.
#include <Eigen/SVD> MatrixXf m = MatrixXf::Random(3,2); JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV); if (svd.info() == ComputationInfo::Success) { // SVD computation was successful. VectorXf x = svd.solve(b); }
- Most decompositions now fail quickly when invalid inputs are detected.
- Optimized the product of a
HouseholderSequence
with the identity, as well as the evaluation of aHouseholderSequence
to a dense matrix using faster blocked product. - Fixed aliasing issues with in-place small matrix inversions.
- Fixed several edge-cases with empty or zero inputs.
Sparse matrix support, decompositions and solvers
- Enabled assignment and addition with diagonal matrix expressions.
SparseMatrix<float> A(10, 10); VectorXf x = VectorXf::Random(10); A = x.asDiagonal(); A += x.asDiagonal();
- Support added for SuiteSparse KLU routines via the
KLUSupport
module. SuiteSparse must be installed to use this module.
#include <Eigen/KLUSupport> A.makeCompressed(); // Recommendation is to compress input before calling sparse solvers. KLU<SparseMatrix<T> > klu(A); if (klu.info() == ComputationInfo::Success) { VectorXf x = klu.solve(b); }
-
SparseCholesky
now works with row-major matrices. - Various bug fixes and performance improvements.
Type support
- Improved support for
half
- Native support added for ARM
__fp16
, CUDA/HIP__half
, andF16C
conversion intrinsics. - Better vectorization support added across all backends.
- Native support added for ARM
- Improved bool support
- Partial vectorization support added for boolean operations.
- Significantly improved performance (x25) for logical operations with
Matrix
orTensor
ofbool
.
- Improved support for custom types
- More custom types work out-of-the-box (see #2201).
Improved Geometry Module
- Behavioral change:
Transform::computeRotationScaling()
andTransform::computeScalingRotation()
are now more continuous across degeneracies (see !349). - New partial vectorization support added for
Quaternion
. - Generic vectorized 4x4 matrix inversion.
Backend-specific improvements
- Arm NEON
- Now provides vectorization for
uint64_t
,int64_t
,uint32_t
,int16_t
,uint16_t
,int16_t
,int8_t
, anduint8_t
- Emulates
bfloat16
support when usingEigen::bfloat16
- Supports emulated and native
float16
when usingEigen::half
- Now provides vectorization for
- SSE/AVX/AVX512
- General performance improvements and bugfixes.
- Enabled AVX512 instructions by default if available.
- New
std::complex
,half
, andbfloat16
vectorization support added. - Many missing packet functions added.
- Altivec/Power
- General performance improvement and bugfixes.
- Enhanced vectorization of real and complex scalars.
- Changes to the
gebp_kernel
specific to Altivec, using VSX implementation of the MMA instructions that gain speed improvements up to 4x for matrix-matrix products. - Dynamic dispatch for GCC greater than 10 enabling selection of MMA or VSX instructions based on
__builtin_cpu_supports
.
- GPU (CUDA and HIP)
- Several optimized math functions added, better support for
std::complex
. - Added option to disable CUDA entirely by defining
EIGEN_NO_CUDA
. - Many more functions can now be used in device code (e.g. comparisons, small matrix inversion).
- Several optimized math functions added, better support for
- ZVector
- Vectorized
float
andstd::complex<float>
support added. - Added z14 support.
- Vectorized
- SYCL
- Redesigned SYCL implementation for use with the Tensor module, which can be enabled by defining
EIGEN_USE_SYCL
. - New generic memory model introduced used by
TensorDeviceSycl
. - Better integration with OpenCL devices.
- Added many math function specializations.
- Redesigned SYCL implementation for use with the Tensor module, which can be enabled by defining
Miscellaneous API Changes
- New
setConstant(...)
methods for preserving one dimension of a matrix by passing inNoChange
.
MatrixXf A(10, 5); // 10x5 matrix. A.setConstant(NoChange, 10, 2); // 10x10 matrix of 2s. A.setConstant(5, NoChange, 3); // 5x10 matrix of 3s. A.setZero(NoChange, 20); // 5x20 matrix of 0s. A.setZero(20, NoChange); // 20x20 matrix of 0s. A.setOnes(NoChange, 5); // 20x5 matrix of 1s. A.setOnes(5, NoChange); // 5x5 matrix of 1s. A.setRandom(NoChange, 10); // 5x10 random matrix. A.setRandom(10, NoChange); // 10x10 random matrix.
- Added
setUnit(Index i)
for vectors that sets the i th coefficient to one and all others to zero.
VectorXf v(5); v.setUnit(3); // { 0, 0, 0, 1, 0}
- Added
transpose()
,adjoint()
,conjugate()
methods toSelfAdjointView
. - Added
shiftLeft<N>()
andshiftRight<N>()
coefficient-wise arithmetic shift functions to Arrays.
ArrayXXi A = ArrayXXi::Random(2, 3); ArrayXXi B = A.shiftRight<2>(); ArrayXXi C = A.shiftLeft<6>();
- Enabled adding and subtracting of diagonal expressions.
VectorXf x = VectorXf::Random(5); VectorXf y = VectorXf::Random(5); MatrixXf A = MatrixXf::Identity(5, 5); A += x.asDiagonal() - y.asDiagonal();
- Allow user-defined default cache sizes via defining
EIGEN_DEFAULT_L1_CACHE_SIZE
, ...,EIGEN_DEFAULT_L3_CACHE_SIZE
. - Added
EIGEN_ALIGNOF(X)
macro for determining alignment of a provided variable. - Allow plugins for
VectorwiseOp
by defining a fileEIGEN_VECTORWISEOP_PLUGIN
(e.g.-DEIGEN_VECTORWISEOP_PLUGIN=my_vectorwise_op_plugins.h
). - Allow disabling of IO operations by defining
EIGEN_NO_IO
.
Improvement to NaN propagation
- Improvements to NaN correctness for elementwise functions.
- New
NaNPropagation
template argument to control whether NaNs are propagated or suppressed in elementwisemin/max
and corresponding reductions onArray
,Matrix
, andTensor
. Example for max:
// Elementwise maximum Eigen::MatrixXf left, right, r0, r1, r2; r0 = left.cwiseMax(right); // Implementation defined behavior. // Propagate NaN if either argument is NaN. r1 = left.template cwiseMax<PropagateNaN>(right); // Suppress NaN if at least one argument is not a NaN. r2 = left.template cwiseMax<PropagateNumbers>(right); // Max reductions Eigen::MatrixXf m; float nan_or_max = m.maxCoeff(); // Implementation defined behavior. float nan_if_any_or_max = m.template maxCoeff<PropagateNaN>(); float nan_if_all_or_max = m.template maxCoeff<PropagateNumbers>();
Changes to unsupported modules
New low-latency non-blocking ThreadPool module
- Originally a part of the Tensor module,
Eigen::ThreadPool
is now separate and more portable, and forms the basis for multi-threading in TensorFlow, for example. Example:
#include <Eigen/CXX11/ThreadPool> const int num_threads = 42; Eigen::ThreadPool tp(num_threads); auto do_stuff = []() { ... }; tp.Schedule(do_stuff);
Changes to Tensor module
- Support for c++03 was officially dropped in Tensor module, since most of the code was written in c++11 anyway. This will prevent building the code for CUDA with older version of
nvcc
. - Performance optimizations of Tensor contraction
- Speed up "outer-product-like" operations by parallelizing over the contraction dimension, using thread_local buffers and recursive work splitting.
- Improved threading heuristics.
- Support for fusing element-wise operations into contraction during evaluation. Example:
// This example applies std::sqrt to all output elements from a tensor contraction. // The optional OutputKernel argument to the contraction in this example is a functor over a // 2-dimensional buffer. The functor is called once for each output block of the contraction // result, to perform the elementwise sqrt operation while the block is hot in cache. struct SqrtOutputKernel { template <typename Index, typename Scalar> EIGEN_ALWAYS_INLINE void operator()( const internal::blas_data_mapper<Scalar, Index, ColMajor>& output_mapper, const TensorContractionParams&, Index, Index, Index num_rows, Index num_cols) const { for (int i = 0; i < num_rows; ++i) { for (int j = 0; j < num_cols; ++j) { output_mapper(i, j) = std::sqrt(output_mapper(i, j)); } } } }; Tensor<float, 4, DataLayout> left(30, 50, 8, 31); Tensor<float, 5, DataLayout> right(8, 31, 7, 20, 10); Tensor<float, 5, DataLayout> result(30, 50, 7, 20, 10); Eigen::array<DimPair, 2> dims({{DimPair(2, 0), DimPair(3, 1)}}); result = left.contract(right, dims, SqrtOutputKernel());
- Performance optimizations of other Tensor operator
- Speedups from improved vectorization, block evaluation, and multi-threading for most operators.
- Significant speedup to broadcasting.
- Reduction of index computation overhead, e.g. using fast divisors in TensorGenerator, squeezing dimensions in TensorPadding.
- Complete rewrite of the block (tiling) evaluation framework for tensor expressions lead to significant speedups and reduced number of memory allocations.
- Added new API for asynchronous evaluation of tensor expressions. Example:
Tensor<float, 3> in1(200, 30, 70); Tensor<float, 3> in2(200, 30, 70); Tensor<float, 3> out(200, 30, 70); Eigen::ThreadPool tp(internal::random<int>(3, 11)); Eigen::ThreadPoolDevice thread_pool_device(&tp, internal::random<int>(3, 11)); Eigen::Barrier b(1); auto done = [&b]() { b.Notify(); }; out.device(thread_pool_device, std::move(done)) = in1 + in2 * 3.14f; b.Wait();
- Misc. minor behavior changes & fixes:
- Fix const correctness for TensorMap.
- Modify tensor argmin/argmax to always return first occurrence.
- More numerically stable tree reduction.
- Improve randomness of the tensor random generator.
- Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
- Support static dimensions (aka IndexList) in resizing/reshape/broadcast.
- Improved accuracy of Tensor FFT.
Improvements to FFT module
- Faster and more accurate twiddle factor computation.
Improvements to EulerAngles
- EulerAngles can now be directly constructed from 3D vectors
- EulerAngles now provide
isApprox()
andcast()
functions
Changes to sparse iterative solvers
- Added new IDRS iterative linear solver.
#include <unsupported/Eigen/IterativeSolvers> A.makeCompressed(); // Recommendation is to compress input before calling sparse solvers. IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A); VectorXf x = idrs.solve(b); bool success = (idrs.info() == ComputationInfo::Success);
Improvements to Polynomials
- PolynomialSolver can now be used with complex numbers
- The solver will automatically choose between
EigenSolver
andComplexEigenSolver
depending on the scalar type used
Other relevant changes
- Eigen now provides an option to test with an external BLAS library
- Eigen can now be used with the PGI Compiler
- Printing when using GDB has been improved
- Eigen can now detect if a platform supports
int128
intrinsics
Testing
The full Eigen test suite was built and run successfully (in c++03 and c++11 mode) with the following compiler/platform/OS combinations:
Compiler | Version | Platform | Operating system |
---|---|---|---|
Microsoft Visual Studio | 2015 Update 3 | x86-64 | Windows |
Microsoft Visual Studio | Community 2017 - 15.9.38 | x86-64 | Windows |
Microsoft Visual Studio | Community 2019 - 16.11 | x86-64 | Windows |
GCC | 4.8 | x86-64 | Linux |
GCC | 9 | x86-64 | Linux |
GCC | 10 | x86-64 | Linux |
Clang | 6.0 | x86-64 | Linux |
Clang | 10 | x86-64 | Linux |
Clang | 11 | x86-64 | Linux |
GCC | 10 | armv8.2-a | Linux |
Clang | 6 | armv8.2-a | Linux |
Clang | 9 | armv8.2-a | Linux |
Clang | 10 | armv8.2-a | Linux |
Clang | 11 | armv8.2-a | Linux |
AppleClang | 12.0.5 | x86-64 | macOS |
GCC | 10 | ppc64le | Linux |
Clang | 10 | ppc64le | Linux |
List of issues fixed in Eigen 3.4
Issue #2298 | List of dense linear decompositions lacks completeorthogonal decomposition |
Issue #2284 | JacobiSVD Outputs Invalid U (Reads Past End of Array) |
Issue #2267 | [3.4 bug] FixedInt<0> error with gcc 4.9.3 |
Issue #2263 | usage of signed zeros leads to wrong results with -ffast-math |
Issue #2251 | Method unaryExpr() does not support function pointers in Eigen 3.4rc1 |
Issue #2242 | No matching function for call to \"...\" in 'Complex.h' and 'GenericPacketMathFunctions.h' |
Issue #2229 | Copies (& potentially moves?) of Eigen object with large unused MaxRows/ColAtCompileTime are slow (Regression from Eigen 3.2) |
Issue #2213 | template maxCoeff<PropagateNaN> compilation error with Eigen 3.4. |
Issue #2209 | unaryExpr deduces wrong return type on MSVC |
Issue #2157 | forward_adolc test fails since PR !363 |
Issue #2119 | Move assignment swaps even for non-dynamic storage |
Issue #2112 | Build failure with boost::multiprecision type |
Issue #2093 | Incorrect evaluation of Ref |
Issue #1906 | Eigen failed with error C2440 with MSVC on windows |
Issue #1850 | error C4996: 'std::result_of<T>': warning STL4014: std::result_of and std::result_of_t are deprecated in C++17. They are superseded by std::invoke_result and std::invoke_result_t |
Issue #1833 | c++20 compilation failure |
Issue #1826 | -Wdeprecated-anon-enum-enum-conversion warnings (c++20) |
Issue #1815 | IndexedView of a vector should allow linear access |
Issue #1805 | Uploaded doxygen documentation does not build LaTeX formulae |
Issue #1790 | packetmath_1 unit test fails |
Issue #1788 | Rule-of-three/rule-of-five violations |
Issue #1776 | subvector_stl_iterator::operator-> triggers 'taking address of rvalue' warning |
Issue #1774 | std::cbegin() returns non-const iterator |
Issue #1752 | A change to the C++ Standard will break some tests |
Issue #1741 | Map<>.noalias()=A*B gives wrong result |
Issue #1736 | Column access of some IndexedView won't compile |
Issue #1718 | Use of builtin vec_sel is ambiguous when compiling with Clang for PowerPC |
Issue #1695 | Stuck in loop for a certain input when using mpreal support |
Issue #1692 | pass enumeration argument to constructor of VectorXd |
Issue #1684 | array_reverse fails with clang >=6 + AVX + -O2 |
Issue #1674 | SIMD sin/cos gives wrong results with -ffast-math |
Issue #1669 | Zero-sized matrices generate assertion failures |
Issue #1664 | dot product with single column block fails with new static checks |
Issue #1652 | Corner cases in SIMD sin/cos |
Issue #1643 | Compilation failure |
Issue #1637 | Register spilling with recent gcc & clang |
Issue #1619 | const_iterator vs iterator compilation error |
Issue #1615 | Performance of (aliased) matrix multiplication with fixed size 3x3 matrices slow |
Issue #1611 | NEON: plog(+/-0) should return -inf and not NaN |
Issue #1585 | Matrix product is repeatedly evaluated when iterating over the product expression |
Issue #1557 | Fail to compute eigenvalues for a simple 3x3 companion matrix for root finding |
Issue #1544 | SparseQR generates incorrect Q matrix in complex case |
Issue #1543 | \"Fix linear indexing in generic block evaluation\" breaks Matrix*Diagonal*Vector product |
Issue #1493 | dense Q extraction and solve is sometimes erroneous for complex matrices |
Issue #1453 | Strange behavior for Matrix::Map, if only InnerStride is provided |
Issue #1409 | Add support for C++17 operator new alignment |
Issue #1340 | Add operator + to sparse matrix iterator |
Issue #1318 | More robust quaternion from matrix |
Issue #1306 | Add support for AVX512 to Eigen |
Issue #1305 | Implementation of additional component-wise unary functions |
Issue #1221 | I get tons of error since my distribution upgraded to GCC 6.1.1 |
Issue #1195 | vectorization_logic fails: Matrix3().cwiseQuotient(Matrix3()) expected CompleteUnrolling, got NoUnrolling |
Issue #1194 | Improve det4x4 |
Issue #1049 | std::make_shared fails to fulfill structure aliment |
Issue #1046 | fixed matrix types do not report correct alignment requirements |
Issue #1014 | Eigenvalues 3x3 matrix |
Issue #1001 | infer dimensions of Dynamic-sized temporaries from the entire expression (if possible) |
Issue #977 | Add stable versions of normalize() and normalized() |
Issue #899 | SparseQR occasionally fails for under-determined systems |
Issue #864 | C++11 alias templates for commonly used types |
Issue #751 | Make AMD Ordering numerically more robust |
Issue #747 | Allow for negative stride |
Issue #720 | Gaussian NullaryExpr |
Issue #663 | Permit NoChange in setZero, setOnes, setConstant, setRandom |
Issue #645 | GeneralizedEigenSolver: missing computation of eigenvectors |
Issue #632 | Optimize addition/subtraction of sparse and dense matrices/vectors |
Issue #631 | (Optionally) throw an exception when using an unsuccessful decomposition |
Issue #564 | maxCoeff() returns -nan instead of max, while maxCoeff(&maxRow, &maxCol) works |
Issue #556 | Matrix multiplication crashes using mingw 4.7 |
Issue #505 | Assert if temporary objects that are still referred to get destructed (was: Misbehaving Product on C++11) |
Issue #445 | ParametrizedLine should have transform method |
Issue #437 | [feature request] Add Reshape Operation |
Issue #426 | Behavior of sum() for Matrix<bool> is unexpected and confusing |
Issue #329 | Feature request: Ability to get a \"view\" into a sub-matrix by indexing it with a vector or matrix of indices |
Issue #231 | STL compatible iterators |
Issue #96 | Clean internal::result_of |
Issue #65 | Core - optimize partial reductions |
Issue #64 | Tests : precision-oriented tests |
Additional information
- A curated list of commits, approximately organized by the same topics as the release notes above, and sorted in reverse chronological order can be found here.