Difference between revisions of "3.4"

From Eigen
Jump to: navigation, search
(New Major Features in Core)
 
(40 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Raw dump of the main novelties and improvements that will be part of the 3.4 release compared to the 3.3 branch:
+
Eigen 3.4 was released on August 18 2021. It can be downloaded from the Download section on the
 +
[https://eigen.tuxfamily.org/index.php?title=Main_Page Main Page] or from [https://gitlab.com/libeigen/eigen/-/releases/3.4.0 Gitlab].
  
=== New features ===
+
'''Notice:''' that 3.4.x will be the last major release series of Eigen that will support c++03. The master branch will drop c++03 support after this release.
  
* New versatile API for sub-matrices, '''slices''', and '''indexed views''' [http://eigen.tuxfamily.org/dox-devel/group__TutorialSlicingIndexing.html [doc]]. It basically extends <code>A(.,.)</code> to let it accept anything that looks-like a sequence of indices with random access. To make it usable this new feature comes with new symbols: <code>Eigen::all</code>, <code>Eigen::last</code>, and functions generating arithmetic sequences: <code>Eigen::seq(first,last[,incr])</code>, <code>Eigen::seqN(first,size[,incr])</code>, <code>Eigen::lastN(size[,incr])</code>. Here is an example picking even rows but the first and last ones, and a subset of indexed columns:
+
== Changes to supported modules ==
 +
 
 +
=== Changes that might break existing code ===
 +
 
 +
* Using float or double for indexing matrices, vectors and arrays will now fail to compile, ex.:
 
<source lang="cpp">
 
<source lang="cpp">
MatrixXd A = ...;
+
MatrixXd A(10,10);
std::vector<int> col_ind{7,3,4,3};
+
float one = 1;
MatrixXd B = A(seq(2,last-2,fix<2>, col_ind);
+
double a11 = A(one,1.); // compilation error here
 
</source>
 
</source>
  
* '''Reshaped''' views through the new members <code>reshaped()</code> and <code>reshaped(rows,cols)</code>. This feature also comes with new symbols: <code>Eigen::AutoOrder</code>, <code>Eigen::AutoSize</code>.  [http://eigen.tuxfamily.org/dox-devel/group__TutorialReshape.html [doc]]
+
=== New Major Features in Core ===
  
* A new helper <code>Eigen::fix<N></code> to pass compile-time integer values to Eigen's functions [http://eigen.tuxfamily.org/dox-devel/group__Core__Module.html#title6 [doc]]. It can be used to pass compile-time sizes to <code>.block(...)</code>, <code>.segment(...)</code>, and all variants, as well as the first, size and increment parameters of the seq, seqN, and lastN functions introduced above. You can also pass "possibly compile-time values" through <code>Eigen::fix<N>(n)</code>. Here is an example comparing the old and new way to call <code>.block</code> with fixed sizes:
+
* Add c++11 '''initializer_list constructors''' to Matrix and Array  [http://eigen.tuxfamily.org/dox-devel/group__TutorialMatrixClass.html#title3 [doc]]:
 
<source lang="cpp">
 
<source lang="cpp">
template<typename MatrixType,int N>
+
MatrixXi a {     // construct a 2x3 matrix
void foo(const MatrixType &A, int i, int j, int n) {
+
      {1,2,3},   // first row
    A.block(i,j,2,3);                        // runtime sizes
+
      {4,5,6}     // second row
    // compile-time nb rows and columns:
+
};
    A.template block<2,3>(i,j);              // 3.3 way
+
VectorXd v{{1, 2, 3, 4, 5}};   // construct a dynamic-size vector with 5 elements
    A.block(i,j,fix<2>,fix<3>);              // new 3.4 way
+
Array<int,1,5> a{1,2, 3, 4, 5}; // initialize a fixed-size 1D array of size 5.
     // compile-time nb rows only:
+
    A.template block<2,Dynamic>(i,j,2,n);    // 3.3 way
+
    A.block(i,j,fix<2>,n);                   // new 3.4 way
+
    // possibly compile-time nb columns
+
    // (use n if N==Dynamic, otherwise we must have n==N):
+
    A.template block<2,N>(i,j,2,n);          // 3.3 way
+
    A.block(i,j,fix<2>,fix<N>(n));           // new 3.4 way
+
}
+
 
</source>
 
</source>
  
* Add STL-compatible iterators for dense expressions. Some examples:
+
* Add STL-compatible '''iterators''' for dense expressions [http://eigen.tuxfamily.org/dox-devel/group__TutorialSTL.html [doc]]. Some examples:
 
<source lang="cpp">
 
<source lang="cpp">
 
VectorXd v = ...;
 
VectorXd v = ...;
Line 43: Line 40:
 
</source>
 
</source>
  
* A new '''namespace indexing''' allowing to exclusively import the subset of functions and symbols that are typically used within <code>A(.,.)</code>, that is: all,seq, seqN, lastN, last, lastp1. [http://eigen.tuxfamily.org/dox-devel/namespaceEigen_1_1indexing.html [doc]]
+
* New versatile API for sub-matrices, '''slices''', and '''indexed views''' [http://eigen.tuxfamily.org/dox-devel/group__TutorialSlicingIndexing.html [doc]]. It basically extends <code>A(.,.)</code> to let it accept anything that looks-like a sequence of indices with random access. To make it usable this new feature comes with new symbols: <code>Eigen::indexing::all</code>, <code>Eigen::indexing::last</code>, and functions generating arithmetic sequences: <code>Eigen::seq(first,last[,incr])</code>, <code>Eigen::seqN(first,size[,incr])</code>, <code>Eigen::lastN(size[,incr])</code>. Here is an example picking even rows but the first and last ones, and a subset of indexed columns:
 +
<source lang="cpp">
 +
MatrixXd A = ...;
 +
std::vector<int> col_ind{7,3,4,3};
 +
MatrixXd B = A(seq(2,last-2,fix<2>), col_ind);
 +
</source>
  
* Misc
+
* Add C++11 '''template aliases''' for Matrix, Vector, and Array of common sizes, including generic <code>Vector<Type,Size></code> and <code>RowVector<Type,Size></code> aliases [http://eigen.tuxfamily.org/dox-devel/group__matrixtypedefs.html [doc]].
** Add templated <code>subVector<Vertical/Horizonal>(Index)</code> aliases to <code>col/row(Index)</code> methods, and <code>subVectors<>()</code> aliases to <code>rows()/cols()</code>.
+
<source lang="cpp">
** Add diagmat +/- diagmat operators (bug 520)
+
MatrixX<double> M;  // Instead of MatrixXd or Matrix<Dynamic, Dynamic, double>
** Add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense. (bug 632)
+
Vector4<MyType> V;  // Instead of Vector<4, MyType>
 +
</source>
  
=== Performance optimizations ===
+
* New support for <code>bfloat16</code>.  The 16-bit [https://en.wikipedia.org/wiki/Bfloat16_floating-point_format Brain floating point format] is now available as <code>Eigen::bfloat16</code>.  The constructor must be called explicitly, but it can otherwise be used as any other scalar type.  To convert back-and-forth between <code>uint16_t</code> to extract the bit representation, use <code>Eigen::numext::bit_cast</code>.
 +
<source lang="cpp">
 +
  bfloat16 s(0.25);                                // explicit construction
 +
  uint16_t s_bits = numext::bit_cast<uint16_t>(s); // bit representation
 +
 
 +
  using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>;
 +
  MatrixBf16 X = s * MatrixBf16::Random(3, 3);
 +
</source>
  
* Vectorization of partial-reductions along outer-dimension, e.g.: colmajor.rowwise().mean()
+
=== New backends ===
* Speed up evaluation of HouseholderSequence to a dense matrix, e.g.,<source lang="cpp">
+
 
MatrixXd Q = A.qr().householderQ();
+
* '''Arm SVE:''' Eigen now supports Arm's [https://developer.arm.com/documentation/101726/0300/Learn-about-the-Scalable-Vector-Extension--SVE-/What-is-the-Scalable-Vector-Extension-  Scalable Vector Extension (SVE)]. Currently only fixed-length SVE vectors for <code>uint32_t</code> and <code>float</code> are available.
* Various optimizations of matrix products for small and medium sizes matrices when using large SIMD registers (e.g., AVX and AVX512).
+
* '''MIPS MSA:''' Eigen now supports the [https://www.mips.com/products/architectures/ase/simd/ MIPS SIMD Architecture (MSA)]
* Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks)
+
* '''AMD ROCm/HIP:''' Eigen now contains a generic GPU backend that unifies support for [https://developer.nvidia.com/cuda-toolkit NVIDIA/CUDA] and [https://rocmdocs.amd.com/en/latest/ AMD/HIP].
 +
* '''Power 10 MMA Backend:''' Eigen now has initial support for [https://arxiv.org/pdf/2104.03142.pdf Power 10 matrix multiplication assist instructions] for float32 and float64, real and complex.
 +
 
 +
=== Improvements to Eigen Core ===
 +
* Eigen now uses the c++11 '''alignas''' keyword for static alignment. Users targeting C++17 only and recent compilers (e.g., GCC>=7, clang>=5, MSVC>=19.12) will thus be able to completely forget about all [http://eigen.tuxfamily.org/dox-devel/group__TopicUnalignedArrayAssert.html issues] related to static alignment, including <code>EIGEN_MAKE_ALIGNED_OPERATOR_NEW</code>.
 +
* Various performance improvements for products and Eigen's GEBP and GEMV kernels have been implemented:
 +
** By using half- and quater-packets the performance of matrix multiplications of small to medium sized matrices has been improved
 +
** Eigen's GEMM now falls back to GEMV if it detects that a matrix is a run-time vector
 +
** The performance of matrix products using Arm Neon has been drastically improved (up to 20%)
 +
** Performance of many special cases of matrix products has been improved
 +
* Large speed up from blocked algorithm for <code>.transposeInPlace</code>.
 +
* Speed up misc. operations by propagating compile-time sizes (col/row-wise reverse, PartialPivLU, and others)
 +
* Faster specialized SIMD kernels for small fixed-size inverse, LU decomposition, and determinant.
 +
* Improved or added vectorization of partial or slice reductions along the outer-dimension, for instance: <code>colmajor_mat.rowwise().mean()</code>
 +
 
 +
=== Elementwise math functions ===
 +
* Many functions are now implemented and vectorized in generic (backend-agnostic) form.
 +
* Many improvements to correctness, accuracy, and compatibility with c++ standard library.
 +
** Much improved implementation of <code>ldexp</code>.
 +
** Misc. fixes for corner cases, NaN/Inf inputs and singular points of many functions.
 +
** New implementation of the Payne-Hanek for argument reduction algorithm for <code>sin</code> and <code>cos</code> with huge arguments.
 +
** New faithfully rounded algorithm for <code>pow(x,y)</code>.
 +
* Speedups from (new or improved) vectorized versions of <code>pow, log, sin, cos, arg, pow, log2</code>, complex <code>sqrt, erf, expm1, logp1, logistic, rint, gamma</code> and <code>bessel</code> functions, and more.
 +
* Improved special function support (Bessel and gamma functions, <code>ndtri, erfc</code>, inverse hyperbolic functions and more)
 +
* New elementwise functions for <code>absolute_difference</code>, <code>rint</code>.
 +
 
 +
=== Dense matrix decompositions and solvers ===
 +
* All dense linear solvers (i.e., Cholesky, *LU, *QR, CompleteOrthogonalDecomposition, *SVD) now inherit SolverBase and thus support <code>.transpose()</code>, <code>.adjoint()</code> and <code>.solve()</code> APIs.
 +
* SVD implementations now have an <code>info()</code> method for checking convergence.
 +
<source lang="cpp">
 +
  #include <Eigen/SVD>
 +
  MatrixXf m = MatrixXf::Random(3,2);
 +
  JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV);
 +
  if (svd.info() == ComputationInfo::Success) {
 +
    // SVD computation was successful.
 +
    VectorXf x = svd.solve(b);
 +
  }
 
</source>
 
</source>
 +
* Most decompositions now fail quickly when invalid inputs are detected.
 +
* Optimized the product of a <code>HouseholderSequence</code> with the identity, as well as the evaluation of a <code>HouseholderSequence</code> to a dense matrix using faster blocked product.
 +
* Fixed aliasing issues with in-place small matrix inversions.
 +
* Fixed several edge-cases with empty or zero inputs.
 +
 +
=== Sparse matrix support, decompositions and solvers ===
 +
* Enabled assignment and addition with diagonal matrix expressions.
 +
<source lang="cpp">
 +
  SparseMatrix<float> A(10, 10);
 +
  VectorXf x = VectorXf::Random(10);
 +
  A = x.asDiagonal();
 +
  A += x.asDiagonal();
 +
</source>
 +
* Support added for SuiteSparse KLU routines via the <code>KLUSupport</code> module.  SuiteSparse must be installed to use this module.
 +
<source lang="cpp">
 +
  #include <Eigen/KLUSupport>
 +
  A.makeCompressed();  // Recommendation is to compress input before calling sparse solvers.
 +
  KLU<SparseMatrix<T> > klu(A);
 +
  if (klu.info() == ComputationInfo::Success) {
 +
    VectorXf x = klu.solve(b);
 +
  }
 +
</source>
 +
* <code>SparseCholesky</code> now works with row-major matrices.
 +
* Various bug fixes and performance improvements.
 +
 +
=== Type support ===
 +
* Improved support for <code>half</code>
 +
** Native support added for ARM <code>__fp16</code>, CUDA/HIP <code>__half</code>, and <code>F16C</code> conversion intrinsics.
 +
** Better vectorization support added across all backends.
 +
* Improved bool support
 +
** Partial vectorization support added for boolean operations.
 +
** Significantly improved performance (x25) for logical operations with <code>Matrix</code> or <code>Tensor</code> of <code>bool</code>.
 +
* Improved support for custom types
 +
** More custom types work out-of-the-box (see [https://gitlab.com/libeigen/eigen/-/issues/2201 #2201]).
 +
 +
=== Improved Geometry Module ===
 +
* '''Behavioral change:''' <code>Transform::computeRotationScaling()</code> and <code>Transform::computeScalingRotation()</code> are now more continuous across degeneracies (see [https://gitlab.com/libeigen/eigen/-/merge_requests/349 !349]).
 +
* New partial vectorization support added for <code>Quaternion</code>.
 +
* Generic vectorized 4x4 matrix inversion.
 +
 +
=== Backend-specific improvements ===
 +
* '''Arm NEON'''
 +
** Now provides vectorization for <code>uint64_t</code>, <code>int64_t</code>, <code>uint32_t</code>, <code>int16_t</code>, <code>uint16_t</code>, <code>int16_t</code>, <code>int8_t</code>, and <code>uint8_t</code>
 +
** Emulates <code>bfloat16</code> support when using <code>Eigen::bfloat16</code>
 +
** Supports emulated and native <code>float16</code> when using <code>Eigen::half</code>
 +
* '''SSE/AVX/AVX512'''
 +
** General performance improvements and bugfixes.
 +
** Enabled AVX512 instructions by default if available.
 +
** New <code>std::complex</code>, <code>half</code>, and <code>bfloat16</code> vectorization support added.
 +
** Many missing packet functions added.
 +
* '''Altivec/Power'''
 +
** General performance improvement and bugfixes.
 +
** Enhanced vectorization of real and complex scalars.
 +
** Changes to the <code>gebp_kernel</code> specific to Altivec, using VSX implementation of the MMA instructions that gain speed improvements up to 4x for matrix-matrix products.
 +
** Dynamic dispatch for GCC greater than 10 enabling selection of MMA or VSX instructions based on <code>__builtin_cpu_supports</code>.
 +
* '''GPU (CUDA and HIP)'''
 +
** Several optimized math functions added, better support for <code>std::complex</code>.
 +
** Added option to disable CUDA entirely by defining <code>EIGEN_NO_CUDA</code>.
 +
** Many more functions can now be used in device code (e.g. comparisons, small matrix inversion).
 +
* '''ZVector'''
 +
** Vectorized <code>float</code> and <code>std::complex<float></code> support added.
 +
** Added z14 support.
 +
* '''SYCL'''
 +
** Redesigned SYCL implementation for use with the [https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html Tensor] module, which can be enabled by defining <code>EIGEN_USE_SYCL</code>.
 +
** New generic memory model introduced used by <code>TensorDeviceSycl</code>.
 +
** Better integration with OpenCL devices.
 +
** Added many math function specializations.
 +
 +
=== Miscellaneous API Changes ===
 +
* New <code>setConstant(...)</code> methods for preserving one dimension of a matrix by passing in <code>NoChange</code>.
 +
<source lang="cpp">
 +
  MatrixXf A(10, 5);              // 10x5  matrix.
 +
  A.setConstant(NoChange, 10, 2);  // 10x10 matrix of 2s.
 +
  A.setConstant(5, NoChange, 3);  //  5x10 matrix of 3s.
 +
  A.setZero(NoChange, 20);        //  5x20 matrix of 0s.
 +
  A.setZero(20, NoChange);        // 20x20 matrix of 0s.
 +
  A.setOnes(NoChange, 5);          // 20x5  matrix of 1s.
 +
  A.setOnes(5, NoChange);          //  5x5  matrix of 1s.
 +
  A.setRandom(NoChange, 10);      //  5x10 random matrix.
 +
  A.setRandom(10, NoChange);      // 10x10 random matrix.
 +
</source>
 +
* Added <code>setUnit(Index i)</code> for vectors that sets the ''i'' th coefficient to one and all others to zero.
 +
<source lang="cpp">
 +
  VectorXf v(5);
 +
  v.setUnit(3);  // { 0, 0, 0, 1, 0}
 +
</source>
 +
* Added <code>transpose()</code>, <code>adjoint()</code>, <code>conjugate()</code> methods to <code>SelfAdjointView</code>.
 +
* Added <code>shiftLeft<N>()</code> and <code>shiftRight<N>()</code> coefficient-wise arithmetic shift functions to Arrays.
 +
<source lang="cpp">
 +
  ArrayXXi A = ArrayXXi::Random(2, 3);
 +
  ArrayXXi B = A.shiftRight<2>();
 +
  ArrayXXi C = A.shiftLeft<6>();
 +
</source>
 +
* Enabled adding and subtracting of diagonal expressions.
 +
<source lang="cpp">
 +
  VectorXf x = VectorXf::Random(5);
 +
  VectorXf y = VectorXf::Random(5);
 +
  MatrixXf A = MatrixXf::Identity(5, 5);
 +
  A += x.asDiagonal() - y.asDiagonal();
 +
</source>
 +
* Allow user-defined default cache sizes via defining <code>EIGEN_DEFAULT_L1_CACHE_SIZE</code>, ..., <code>EIGEN_DEFAULT_L3_CACHE_SIZE</code>.
 +
* Added <code>EIGEN_ALIGNOF(X)</code> macro for determining alignment of a provided variable.
 +
* Allow plugins for <code>VectorwiseOp</code> by defining a file <code>EIGEN_VECTORWISEOP_PLUGIN</code> (e.g. <code>-DEIGEN_VECTORWISEOP_PLUGIN=my_vectorwise_op_plugins.h</code>).
 +
* Allow disabling of IO operations by defining <code>EIGEN_NO_IO</code>.
 +
 +
=== Improvement to NaN propagation ===
 +
 +
* Improvements to NaN correctness for elementwise functions.
 +
* New <code>NaNPropagation</code> template argument to control whether NaNs are propagated or suppressed in elementwise <code>min/max</code> and corresponding reductions on <code>Array</code>, <code>Matrix</code>, and <code>Tensor</code>. Example for max:
 +
<source lang="cpp">
 +
// Elementwise maximum
 +
Eigen::MatrixXf left, right, r0, r1, r2;
 +
r0 = left.cwiseMax(right); // Implementation defined behavior.
 +
// Propagate NaN if either argument is NaN.
 +
r1 = left.template cwiseMax<PropagateNaN>(right);
 +
// Suppress NaN if at least one argument is not a NaN.
 +
r2 = left.template cwiseMax<PropagateNumbers>(right);
 +
 +
// Max reductions
 +
Eigen::MatrixXf m;
 +
float nan_or_max = m.maxCoeff(); // Implementation defined behavior.
 +
float nan_if_any_or_max = m.template maxCoeff<PropagateNaN>();
 +
float nan_if_all_or_max = m.template maxCoeff<PropagateNumbers>();
 +
</source>
 +
 +
== Changes to unsupported modules ==
 +
=== New low-latency non-blocking ThreadPool module ===
 +
* Originally a part of the Tensor module, <code>Eigen::ThreadPool</code> is now separate and more portable, and forms the basis for multi-threading in TensorFlow, for example. Example:
 +
<source lang="cpp">
 +
  #include <Eigen/CXX11/ThreadPool>
 +
 +
  const int num_threads = 42;
 +
  Eigen::ThreadPool tp(num_threads);
 +
  auto do_stuff = []() { ... };
 +
  tp.Schedule(do_stuff);
 +
</source>
 +
 +
=== Changes to Tensor module ===
 +
* Support for c++03 was officially dropped in Tensor module, since most of the code was written in c++11 anyway. This will prevent building the code for CUDA with older version of <code>nvcc</code>.
 +
* Performance optimizations of Tensor contraction
 +
** Speed up "outer-product-like" operations by parallelizing over the contraction dimension, using thread_local buffers and recursive work splitting.
 +
** Improved threading heuristics.
 +
** Support for fusing element-wise operations into contraction during evaluation. Example:
 +
<source lang="cpp">
 +
// This example applies std::sqrt to all output elements from a tensor contraction.
 +
// The optional OutputKernel argument to the contraction in this example is a functor over a
 +
// 2-dimensional buffer. The functor is called once for each output block of the contraction
 +
// result, to perform the elementwise sqrt operation while the block is hot in cache.
 +
struct SqrtOutputKernel {
 +
  template <typename Index, typename Scalar>
 +
  EIGEN_ALWAYS_INLINE void operator()(
 +
      const internal::blas_data_mapper<Scalar, Index, ColMajor>& output_mapper,
 +
      const TensorContractionParams&, Index, Index, Index num_rows,
 +
      Index num_cols) const {
 +
    for (int i = 0; i < num_rows; ++i) {
 +
      for (int j = 0; j < num_cols; ++j) {
 +
        output_mapper(i, j) = std::sqrt(output_mapper(i, j));
 +
      }
 +
    }
 +
  }
 +
};
 +
 +
Tensor<float, 4, DataLayout> left(30, 50, 8, 31);
 +
Tensor<float, 5, DataLayout> right(8, 31, 7, 20, 10);
 +
Tensor<float, 5, DataLayout> result(30, 50, 7, 20, 10);
 +
Eigen::array<DimPair, 2> dims({{DimPair(2, 0), DimPair(3, 1)}});
 +
 +
result = left.contract(right, dims, SqrtOutputKernel());
 +
</source>
 +
 +
* Performance optimizations of other Tensor operator
 +
** Speedups from improved vectorization, block evaluation, and multi-threading for most operators.
 +
** Significant speedup to broadcasting.
 +
** Reduction of index computation overhead, e.g. using fast divisors in TensorGenerator, squeezing dimensions in TensorPadding.
 +
* Complete rewrite of the block (tiling) evaluation framework for tensor expressions lead to significant speedups and reduced number of memory allocations.
 +
* Added new API for asynchronous evaluation of tensor expressions. Example:
 +
<source lang="cpp">
 +
  Tensor<float, 3> in1(200, 30, 70);
 +
  Tensor<float, 3> in2(200, 30, 70);
 +
  Tensor<float, 3> out(200, 30, 70);
 +
 +
  Eigen::ThreadPool tp(internal::random<int>(3, 11));
 +
  Eigen::ThreadPoolDevice thread_pool_device(&tp, internal::random<int>(3, 11));
 +
 +
  Eigen::Barrier b(1);
 +
  auto done = [&b]() { b.Notify(); };
 +
  out.device(thread_pool_device, std::move(done)) = in1 + in2 * 3.14f;
 +
  b.Wait();
 +
</source>
 +
* Misc. minor behavior changes & fixes:
 +
** Fix const correctness for TensorMap.
 +
** Modify tensor argmin/argmax to always return first occurrence.
 +
** More numerically stable tree reduction.
 +
** Improve randomness of the tensor random generator.
 +
** Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
 +
** Support static dimensions (aka IndexList) in resizing/reshape/broadcast.
 +
** Improved accuracy of Tensor FFT.
 +
 +
=== Improvements to FFT module ===
 +
 +
* Faster and more accurate twiddle factor computation.
 +
 +
=== Improvements to EulerAngles ===
 +
 +
* EulerAngles can now be directly constructed from 3D vectors
 +
* EulerAngles now provide <code>isApprox()</code> and <code>cast()</code> functions
 +
 +
=== Changes to sparse iterative solvers ===
 +
* Added new IDRS iterative linear solver.
 +
<source lang="cpp">
 +
  #include <unsupported/Eigen/IterativeSolvers>
 +
  A.makeCompressed();  // Recommendation is to compress input before calling sparse solvers.
 +
  IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A);
 +
  VectorXf x = idrs.solve(b);
 +
  bool success = (idrs.info() == ComputationInfo::Success);
 +
</source>
 +
 +
=== Improvements to Polynomials ===
 +
 +
* PolynomialSolver can now be used with complex numbers
 +
* The solver will automatically choose between <code>EigenSolver</code> and <code>ComplexEigenSolver</code> depending on the scalar type used
 +
 +
== Other relevant changes ==
 +
 +
* Eigen now provides an option to test with an external BLAS library
 +
* Eigen can now be used with the [https://en.wikipedia.org/wiki/The_Portland_Group PGI Compiler]
 +
* Printing when using GDB has been improved
 +
* Eigen can now detect if a platform supports <code>int128</code> intrinsics
 +
 +
== Testing ==
 +
The full Eigen test suite  was built and run successfully (in c++03 and c++11 mode) with the following compiler/platform/OS combinations:
 +
 +
{| class="wikitable"
 +
!Compiler  !! Version                            !! Platform !! Operating system
 +
|-
 +
|Microsoft Visual Studio || 2015 Update 3 || x86-64 || Windows
 +
|-
 +
|Microsoft Visual Studio || Community 2017 - 15.9.38 || x86-64  || Windows
 +
|-
 +
|Microsoft Visual Studio || Community 2019 - 16.11 || x86-64  || Windows
 +
|-
 +
|GCC || 4.8 || x86-64 || Linux
 +
|-
 +
|GCC || 9 || x86-64 || Linux
 +
|-
 +
|GCC || 10 ||  x86-64 || Linux
 +
|-
 +
|Clang || 6.0 ||  x86-64 || Linux
 +
|-
 +
|Clang || 10 ||  x86-64 || Linux
 +
|-
 +
|Clang || 11 || x86-64 || Linux
 +
|-
 +
|GCC || 10 ||  armv8.2-a || Linux
 +
|-
 +
|Clang || 6 ||  armv8.2-a || Linux
 +
|-
 +
|Clang || 9 ||  armv8.2-a || Linux
 +
|-
 +
|Clang || 10 ||  armv8.2-a || Linux
 +
|-
 +
|Clang || 11 ||  armv8.2-a || Linux
 +
|-
 +
|AppleClang || 12.0.5 ||  x86-64 || macOS
 +
|-
 +
|GCC || 10 ||  ppc64le || Linux
 +
|-
 +
|Clang || 10 || ppc64le || Linux
 +
|-
 +
|}
 +
 +
== List of issues fixed in Eigen 3.4 ==
  
===  Hardware support ===
+
{|
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2298 Issue #2298]
 +
| List of dense linear decompositions lacks completeorthogonal decomposition
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2284 Issue #2284]
 +
| JacobiSVD Outputs Invalid U (Reads Past End of Array)
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2267 Issue #2267]
 +
| [3.4 bug] FixedInt<0> error with gcc 4.9.3
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2263 Issue #2263]
 +
| usage of signed zeros leads to wrong results with -ffast-math
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2251 Issue #2251]
 +
| Method unaryExpr() does not support function pointers in Eigen 3.4rc1
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2242 Issue #2242]
 +
| No matching function for call to \"...\" in 'Complex.h' and 'GenericPacketMathFunctions.h'
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2229 Issue #2229]
 +
| Copies (& potentially moves?) of Eigen object with large unused MaxRows/ColAtCompileTime are slow (Regression from Eigen 3.2)
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2213 Issue #2213]
 +
| template maxCoeff<PropagateNaN> compilation error with Eigen 3.4.
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2209 Issue #2209]
 +
| unaryExpr deduces wrong return type on MSVC
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2157 Issue #2157]
 +
| forward_adolc test fails since PR !363
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2119 Issue #2119]
 +
| Move assignment swaps even for non-dynamic storage
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2112 Issue #2112]
 +
| Build failure with boost::multiprecision type
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/2093 Issue #2093]
 +
| Incorrect evaluation of Ref
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1906 Issue #1906]
 +
| Eigen failed with error C2440 with MSVC on windows
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1850 Issue #1850]
 +
| error C4996: 'std::result_of<T>': warning STL4014: std::result_of and std::result_of_t are deprecated in C++17. They are superseded by std::invoke_result and std::invoke_result_t
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1833 Issue #1833]
 +
| c++20 compilation failure
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1826 Issue #1826]
 +
| -Wdeprecated-anon-enum-enum-conversion warnings (c++20)
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1815 Issue #1815]
 +
| IndexedView of a vector should allow linear access
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1805 Issue #1805]
 +
| Uploaded doxygen documentation does not build LaTeX formulae
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1790 Issue #1790]
 +
| packetmath_1 unit test fails
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1788 Issue #1788]
 +
| Rule-of-three/rule-of-five violations
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1776 Issue #1776]
 +
| subvector_stl_iterator::operator-> triggers 'taking address of rvalue' warning
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1774 Issue #1774]
 +
| std::cbegin() returns non-const iterator
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1752 Issue #1752]
 +
| A change to the C++ Standard will break some tests
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1741 Issue #1741]
 +
| Map<>.noalias()=A*B gives wrong result
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1736 Issue #1736]
 +
| Column access of some IndexedView won't compile
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1718 Issue #1718]
 +
| Use of builtin vec_sel is ambiguous when compiling with Clang for PowerPC
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1695 Issue #1695]
 +
| Stuck in loop for a certain input when using mpreal support
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1692 Issue #1692]
 +
| pass enumeration argument to constructor of VectorXd
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1684 Issue #1684]
 +
| array_reverse fails with clang >=6 + AVX + -O2
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1674 Issue #1674]
 +
| SIMD sin/cos gives wrong results with -ffast-math
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1669 Issue #1669]
 +
| Zero-sized matrices generate assertion failures
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1664 Issue #1664]
 +
| dot product with single column block fails with new static checks
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1652 Issue #1652]
 +
| Corner cases in SIMD sin/cos
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1643 Issue #1643]
 +
| Compilation failure
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1637 Issue #1637]
 +
| Register spilling with recent gcc & clang
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1619 Issue #1619]
 +
| const_iterator vs iterator compilation error
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1615 Issue #1615]
 +
| Performance of (aliased) matrix multiplication with fixed size 3x3 matrices slow
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1611 Issue #1611]
 +
| NEON: plog(+/-0) should return -inf and not NaN
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1585 Issue #1585]
 +
| Matrix product is repeatedly evaluated when iterating over the product expression
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1557 Issue #1557]
 +
| Fail to compute eigenvalues for a simple 3x3 companion matrix for root finding
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1544 Issue #1544]
 +
| SparseQR generates incorrect Q matrix in complex case
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1543 Issue #1543]
 +
| \"Fix linear indexing in generic block evaluation\" breaks Matrix*Diagonal*Vector product
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1493 Issue #1493]
 +
| dense Q extraction and solve is sometimes erroneous for complex matrices
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1453 Issue #1453]
 +
| Strange behavior for Matrix::Map, if only InnerStride is provided
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1409 Issue #1409]
 +
| Add support for C++17 operator new alignment
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1340 Issue #1340]
 +
| Add operator + to sparse matrix iterator
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1318 Issue #1318]
 +
| More robust quaternion from matrix
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1306 Issue #1306]
 +
| Add support for AVX512 to Eigen
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1305 Issue #1305]
 +
| Implementation of additional component-wise unary functions
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1221 Issue #1221]
 +
| I get tons of error since my distribution upgraded to GCC 6.1.1
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1195 Issue #1195]
 +
| vectorization_logic fails: Matrix3().cwiseQuotient(Matrix3()) expected CompleteUnrolling, got NoUnrolling
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1194 Issue #1194]
 +
| Improve det4x4
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1049 Issue #1049]
 +
| std::make_shared fails to fulfill structure aliment
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1046 Issue #1046]
 +
| fixed matrix types do not report correct alignment requirements
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1014 Issue #1014]
 +
| Eigenvalues 3x3 matrix
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/1001 Issue #1001]
 +
| infer dimensions of Dynamic-sized temporaries from the entire expression (if possible)
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/977 Issue #977]
 +
| Add stable versions of normalize() and normalized()
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/899 Issue #899]
 +
| SparseQR occasionally fails for under-determined systems
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/864 Issue #864]
 +
| C++11 alias templates for commonly used types
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/751 Issue #751]
 +
| Make AMD Ordering numerically more robust
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/747 Issue #747]
 +
| Allow for negative stride
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/720 Issue #720]
 +
| Gaussian NullaryExpr
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/663 Issue #663]
 +
| Permit NoChange in setZero, setOnes, setConstant, setRandom
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/645 Issue #645]
 +
| GeneralizedEigenSolver: missing computation of eigenvectors
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/632 Issue #632]
 +
| Optimize addition/subtraction of sparse and dense matrices/vectors
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/631 Issue #631]
 +
| (Optionally) throw an exception when using an unsuccessful decomposition
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/564 Issue #564]
 +
| maxCoeff() returns -nan instead of max, while maxCoeff(&maxRow, &maxCol) works
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/556 Issue #556]
 +
| Matrix multiplication crashes using mingw 4.7
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/505 Issue #505]
 +
| Assert if temporary objects that are still referred to get destructed (was: Misbehaving Product on C++11)
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/445 Issue #445]
 +
| ParametrizedLine should have transform method
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/437 Issue #437]
 +
| [feature request] Add Reshape Operation
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/426 Issue #426]
 +
| Behavior of sum() for Matrix<bool> is unexpected and confusing
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/329 Issue #329]
 +
| Feature request: Ability to get a \"view\" into a sub-matrix by indexing it with a vector or matrix of indices
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/231 Issue #231]
 +
| STL compatible iterators
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/96 Issue #96]
 +
| Clean internal::result_of
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/65 Issue #65]
 +
| Core - optimize partial reductions
 +
|-
 +
| [https://gitlab.com/libeigen/eigen/-/issues/64 Issue #64]
 +
| Tests : precision-oriented tests
 +
|}
  
* Generalization of the CUDA support to CUDA/HIP for AMD GPUs.
+
== Additional information ==
* Add explicit support for MSA vectorization engine (MIPS).
+
* A curated list of commits, approximately organized by the same topics as the release notes above, and sorted in reverse chronological order can be found [https://docs.google.com/document/d/e/2PACX-1vSGvp4Kv9dJ-gKzJN4CBjppP46flDbe3pJtI9N3m3WkKSoLXmANXuK5gJlw1CPcpCfjAWhgXAtQNzm-/pub here].
* AVX512 is enabled by default when enabled on compiler side.
+

Latest revision as of 15:26, 14 October 2021

Eigen 3.4 was released on August 18 2021. It can be downloaded from the Download section on the Main Page or from Gitlab.

Notice: that 3.4.x will be the last major release series of Eigen that will support c++03. The master branch will drop c++03 support after this release.

Changes to supported modules

Changes that might break existing code

  • Using float or double for indexing matrices, vectors and arrays will now fail to compile, ex.:
MatrixXd A(10,10);
float one = 1;
double a11 = A(one,1.); // compilation error here

New Major Features in Core

  • Add c++11 initializer_list constructors to Matrix and Array [doc]:
MatrixXi a {      // construct a 2x3 matrix
      {1,2,3},    // first row
      {4,5,6}     // second row
};
VectorXd v{{1, 2, 3, 4, 5}};    // construct a dynamic-size vector with 5 elements
Array<int,1,5> a{1,2, 3, 4, 5}; // initialize a fixed-size 1D array of size 5.
  • Add STL-compatible iterators for dense expressions [doc]. Some examples:
VectorXd v = ...;
MatrixXd A = ...;
// range for loop over all entries of v then A
for(auto x : v) { cout << x << " "; }
for(auto x : A.reshaped()) { cout << x << " "; }
// sort v then each column of A
std::sort(v.begin(), v.end());
for(auto c : A.colwise())
    std::sort(c.begin(), c.end());
  • New versatile API for sub-matrices, slices, and indexed views [doc]. It basically extends A(.,.) to let it accept anything that looks-like a sequence of indices with random access. To make it usable this new feature comes with new symbols: Eigen::indexing::all, Eigen::indexing::last, and functions generating arithmetic sequences: Eigen::seq(first,last[,incr]), Eigen::seqN(first,size[,incr]), Eigen::lastN(size[,incr]). Here is an example picking even rows but the first and last ones, and a subset of indexed columns:
MatrixXd A = ...;
std::vector<int> col_ind{7,3,4,3};
MatrixXd B = A(seq(2,last-2,fix<2>), col_ind);
  • Add C++11 template aliases for Matrix, Vector, and Array of common sizes, including generic Vector<Type,Size> and RowVector<Type,Size> aliases [doc].
MatrixX<double> M;  // Instead of MatrixXd or Matrix<Dynamic, Dynamic, double>
Vector4<MyType> V;  // Instead of Vector<4, MyType>
  • New support for bfloat16. The 16-bit Brain floating point format is now available as Eigen::bfloat16. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert back-and-forth between uint16_t to extract the bit representation, use Eigen::numext::bit_cast.
  bfloat16 s(0.25);                                 // explicit construction
  uint16_t s_bits = numext::bit_cast<uint16_t>(s);  // bit representation
 
  using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>;
  MatrixBf16 X = s * MatrixBf16::Random(3, 3);

New backends

Improvements to Eigen Core

  • Eigen now uses the c++11 alignas keyword for static alignment. Users targeting C++17 only and recent compilers (e.g., GCC>=7, clang>=5, MSVC>=19.12) will thus be able to completely forget about all issues related to static alignment, including EIGEN_MAKE_ALIGNED_OPERATOR_NEW.
  • Various performance improvements for products and Eigen's GEBP and GEMV kernels have been implemented:
    • By using half- and quater-packets the performance of matrix multiplications of small to medium sized matrices has been improved
    • Eigen's GEMM now falls back to GEMV if it detects that a matrix is a run-time vector
    • The performance of matrix products using Arm Neon has been drastically improved (up to 20%)
    • Performance of many special cases of matrix products has been improved
  • Large speed up from blocked algorithm for .transposeInPlace.
  • Speed up misc. operations by propagating compile-time sizes (col/row-wise reverse, PartialPivLU, and others)
  • Faster specialized SIMD kernels for small fixed-size inverse, LU decomposition, and determinant.
  • Improved or added vectorization of partial or slice reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean()

Elementwise math functions

  • Many functions are now implemented and vectorized in generic (backend-agnostic) form.
  • Many improvements to correctness, accuracy, and compatibility with c++ standard library.
    • Much improved implementation of ldexp.
    • Misc. fixes for corner cases, NaN/Inf inputs and singular points of many functions.
    • New implementation of the Payne-Hanek for argument reduction algorithm for sin and cos with huge arguments.
    • New faithfully rounded algorithm for pow(x,y).
  • Speedups from (new or improved) vectorized versions of pow, log, sin, cos, arg, pow, log2, complex sqrt, erf, expm1, logp1, logistic, rint, gamma and bessel functions, and more.
  • Improved special function support (Bessel and gamma functions, ndtri, erfc, inverse hyperbolic functions and more)
  • New elementwise functions for absolute_difference, rint.

Dense matrix decompositions and solvers

  • All dense linear solvers (i.e., Cholesky, *LU, *QR, CompleteOrthogonalDecomposition, *SVD) now inherit SolverBase and thus support .transpose(), .adjoint() and .solve() APIs.
  • SVD implementations now have an info() method for checking convergence.
  #include <Eigen/SVD>
  MatrixXf m = MatrixXf::Random(3,2);
  JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV);
  if (svd.info() == ComputationInfo::Success) {
    // SVD computation was successful.
    VectorXf x = svd.solve(b);
  }
  • Most decompositions now fail quickly when invalid inputs are detected.
  • Optimized the product of a HouseholderSequence with the identity, as well as the evaluation of a HouseholderSequence to a dense matrix using faster blocked product.
  • Fixed aliasing issues with in-place small matrix inversions.
  • Fixed several edge-cases with empty or zero inputs.

Sparse matrix support, decompositions and solvers

  • Enabled assignment and addition with diagonal matrix expressions.
  SparseMatrix<float> A(10, 10);
  VectorXf x = VectorXf::Random(10);
  A = x.asDiagonal();
  A += x.asDiagonal();
  • Support added for SuiteSparse KLU routines via the KLUSupport module. SuiteSparse must be installed to use this module.
  #include <Eigen/KLUSupport>
  A.makeCompressed();   // Recommendation is to compress input before calling sparse solvers.
  KLU<SparseMatrix<T> > klu(A);
  if (klu.info() == ComputationInfo::Success) {
    VectorXf x = klu.solve(b);
  }
  • SparseCholesky now works with row-major matrices.
  • Various bug fixes and performance improvements.

Type support

  • Improved support for half
    • Native support added for ARM __fp16, CUDA/HIP __half, and F16C conversion intrinsics.
    • Better vectorization support added across all backends.
  • Improved bool support
    • Partial vectorization support added for boolean operations.
    • Significantly improved performance (x25) for logical operations with Matrix or Tensor of bool.
  • Improved support for custom types
    • More custom types work out-of-the-box (see #2201).

Improved Geometry Module

  • Behavioral change: Transform::computeRotationScaling() and Transform::computeScalingRotation() are now more continuous across degeneracies (see !349).
  • New partial vectorization support added for Quaternion.
  • Generic vectorized 4x4 matrix inversion.

Backend-specific improvements

  • Arm NEON
    • Now provides vectorization for uint64_t, int64_t, uint32_t, int16_t, uint16_t, int16_t, int8_t, and uint8_t
    • Emulates bfloat16 support when using Eigen::bfloat16
    • Supports emulated and native float16 when using Eigen::half
  • SSE/AVX/AVX512
    • General performance improvements and bugfixes.
    • Enabled AVX512 instructions by default if available.
    • New std::complex, half, and bfloat16 vectorization support added.
    • Many missing packet functions added.
  • Altivec/Power
    • General performance improvement and bugfixes.
    • Enhanced vectorization of real and complex scalars.
    • Changes to the gebp_kernel specific to Altivec, using VSX implementation of the MMA instructions that gain speed improvements up to 4x for matrix-matrix products.
    • Dynamic dispatch for GCC greater than 10 enabling selection of MMA or VSX instructions based on __builtin_cpu_supports.
  • GPU (CUDA and HIP)
    • Several optimized math functions added, better support for std::complex.
    • Added option to disable CUDA entirely by defining EIGEN_NO_CUDA.
    • Many more functions can now be used in device code (e.g. comparisons, small matrix inversion).
  • ZVector
    • Vectorized float and std::complex<float> support added.
    • Added z14 support.
  • SYCL
    • Redesigned SYCL implementation for use with the Tensor module, which can be enabled by defining EIGEN_USE_SYCL.
    • New generic memory model introduced used by TensorDeviceSycl.
    • Better integration with OpenCL devices.
    • Added many math function specializations.

Miscellaneous API Changes

  • New setConstant(...) methods for preserving one dimension of a matrix by passing in NoChange.
  MatrixXf A(10, 5);               // 10x5  matrix.
  A.setConstant(NoChange, 10, 2);  // 10x10 matrix of 2s.
  A.setConstant(5, NoChange, 3);   //  5x10 matrix of 3s.
  A.setZero(NoChange, 20);         //  5x20 matrix of 0s.
  A.setZero(20, NoChange);         // 20x20 matrix of 0s.
  A.setOnes(NoChange, 5);          // 20x5  matrix of 1s.
  A.setOnes(5, NoChange);          //  5x5  matrix of 1s.
  A.setRandom(NoChange, 10);       //  5x10 random matrix.
  A.setRandom(10, NoChange);       // 10x10 random matrix.
  • Added setUnit(Index i) for vectors that sets the i th coefficient to one and all others to zero.
  VectorXf v(5);
  v.setUnit(3);   // { 0, 0, 0, 1, 0}
  • Added transpose(), adjoint(), conjugate() methods to SelfAdjointView.
  • Added shiftLeft<N>() and shiftRight<N>() coefficient-wise arithmetic shift functions to Arrays.
  ArrayXXi A = ArrayXXi::Random(2, 3);
  ArrayXXi B = A.shiftRight<2>();
  ArrayXXi C = A.shiftLeft<6>();
  • Enabled adding and subtracting of diagonal expressions.
  VectorXf x = VectorXf::Random(5);
  VectorXf y = VectorXf::Random(5);
  MatrixXf A = MatrixXf::Identity(5, 5);
  A += x.asDiagonal() - y.asDiagonal();
  • Allow user-defined default cache sizes via defining EIGEN_DEFAULT_L1_CACHE_SIZE, ..., EIGEN_DEFAULT_L3_CACHE_SIZE.
  • Added EIGEN_ALIGNOF(X) macro for determining alignment of a provided variable.
  • Allow plugins for VectorwiseOp by defining a file EIGEN_VECTORWISEOP_PLUGIN (e.g. -DEIGEN_VECTORWISEOP_PLUGIN=my_vectorwise_op_plugins.h).
  • Allow disabling of IO operations by defining EIGEN_NO_IO.

Improvement to NaN propagation

  • Improvements to NaN correctness for elementwise functions.
  • New NaNPropagation template argument to control whether NaNs are propagated or suppressed in elementwise min/max and corresponding reductions on Array, Matrix, and Tensor. Example for max:
// Elementwise maximum
Eigen::MatrixXf left, right, r0, r1, r2;
r0 = left.cwiseMax(right); // Implementation defined behavior.
// Propagate NaN if either argument is NaN.
r1 = left.template cwiseMax<PropagateNaN>(right);
// Suppress NaN if at least one argument is not a NaN.
r2 = left.template cwiseMax<PropagateNumbers>(right);
 
// Max reductions
Eigen::MatrixXf m;
float nan_or_max = m.maxCoeff(); // Implementation defined behavior.
float nan_if_any_or_max = m.template maxCoeff<PropagateNaN>();
float nan_if_all_or_max = m.template maxCoeff<PropagateNumbers>();

Changes to unsupported modules

New low-latency non-blocking ThreadPool module

  • Originally a part of the Tensor module, Eigen::ThreadPool is now separate and more portable, and forms the basis for multi-threading in TensorFlow, for example. Example:
  #include <Eigen/CXX11/ThreadPool>
 
  const int num_threads = 42;
  Eigen::ThreadPool tp(num_threads);
  auto do_stuff = []() { ... };
  tp.Schedule(do_stuff);

Changes to Tensor module

  • Support for c++03 was officially dropped in Tensor module, since most of the code was written in c++11 anyway. This will prevent building the code for CUDA with older version of nvcc.
  • Performance optimizations of Tensor contraction
    • Speed up "outer-product-like" operations by parallelizing over the contraction dimension, using thread_local buffers and recursive work splitting.
    • Improved threading heuristics.
    • Support for fusing element-wise operations into contraction during evaluation. Example:
// This example applies std::sqrt to all output elements from a tensor contraction. 
// The optional OutputKernel argument to the contraction in this example is a functor over a 
// 2-dimensional buffer. The functor is called once for each output block of the contraction 
// result, to perform the elementwise sqrt operation while the block is hot in cache.
struct SqrtOutputKernel {
  template <typename Index, typename Scalar>
  EIGEN_ALWAYS_INLINE void operator()(
      const internal::blas_data_mapper<Scalar, Index, ColMajor>& output_mapper,
      const TensorContractionParams&, Index, Index, Index num_rows,
      Index num_cols) const {
    for (int i = 0; i < num_rows; ++i) {
      for (int j = 0; j < num_cols; ++j) {
        output_mapper(i, j) = std::sqrt(output_mapper(i, j));
      }
    }
  }
};
 
Tensor<float, 4, DataLayout> left(30, 50, 8, 31);
Tensor<float, 5, DataLayout> right(8, 31, 7, 20, 10);
Tensor<float, 5, DataLayout> result(30, 50, 7, 20, 10);
Eigen::array<DimPair, 2> dims({{DimPair(2, 0), DimPair(3, 1)}});
 
result = left.contract(right, dims, SqrtOutputKernel());
  • Performance optimizations of other Tensor operator
    • Speedups from improved vectorization, block evaluation, and multi-threading for most operators.
    • Significant speedup to broadcasting.
    • Reduction of index computation overhead, e.g. using fast divisors in TensorGenerator, squeezing dimensions in TensorPadding.
  • Complete rewrite of the block (tiling) evaluation framework for tensor expressions lead to significant speedups and reduced number of memory allocations.
  • Added new API for asynchronous evaluation of tensor expressions. Example:
  Tensor<float, 3> in1(200, 30, 70);
  Tensor<float, 3> in2(200, 30, 70);
  Tensor<float, 3> out(200, 30, 70);
 
  Eigen::ThreadPool tp(internal::random<int>(3, 11));
  Eigen::ThreadPoolDevice thread_pool_device(&tp, internal::random<int>(3, 11));
 
  Eigen::Barrier b(1);
  auto done = [&b]() { b.Notify(); };
  out.device(thread_pool_device, std::move(done)) = in1 + in2 * 3.14f;
  b.Wait();
  • Misc. minor behavior changes & fixes:
    • Fix const correctness for TensorMap.
    • Modify tensor argmin/argmax to always return first occurrence.
    • More numerically stable tree reduction.
    • Improve randomness of the tensor random generator.
    • Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
    • Support static dimensions (aka IndexList) in resizing/reshape/broadcast.
    • Improved accuracy of Tensor FFT.

Improvements to FFT module

  • Faster and more accurate twiddle factor computation.

Improvements to EulerAngles

  • EulerAngles can now be directly constructed from 3D vectors
  • EulerAngles now provide isApprox() and cast() functions

Changes to sparse iterative solvers

  • Added new IDRS iterative linear solver.
  #include <unsupported/Eigen/IterativeSolvers>
  A.makeCompressed();   // Recommendation is to compress input before calling sparse solvers.
  IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A);
  VectorXf x = idrs.solve(b);
  bool success = (idrs.info() == ComputationInfo::Success);

Improvements to Polynomials

  • PolynomialSolver can now be used with complex numbers
  • The solver will automatically choose between EigenSolver and ComplexEigenSolver depending on the scalar type used

Other relevant changes

  • Eigen now provides an option to test with an external BLAS library
  • Eigen can now be used with the PGI Compiler
  • Printing when using GDB has been improved
  • Eigen can now detect if a platform supports int128 intrinsics

Testing

The full Eigen test suite was built and run successfully (in c++03 and c++11 mode) with the following compiler/platform/OS combinations:

Compiler Version Platform Operating system
Microsoft Visual Studio 2015 Update 3 x86-64 Windows
Microsoft Visual Studio Community 2017 - 15.9.38 x86-64 Windows
Microsoft Visual Studio Community 2019 - 16.11 x86-64 Windows
GCC 4.8 x86-64 Linux
GCC 9 x86-64 Linux
GCC 10 x86-64 Linux
Clang 6.0 x86-64 Linux
Clang 10 x86-64 Linux
Clang 11 x86-64 Linux
GCC 10 armv8.2-a Linux
Clang 6 armv8.2-a Linux
Clang 9 armv8.2-a Linux
Clang 10 armv8.2-a Linux
Clang 11 armv8.2-a Linux
AppleClang 12.0.5 x86-64 macOS
GCC 10 ppc64le Linux
Clang 10 ppc64le Linux

List of issues fixed in Eigen 3.4

Issue #2298 List of dense linear decompositions lacks completeorthogonal decomposition
Issue #2284 JacobiSVD Outputs Invalid U (Reads Past End of Array)
Issue #2267 [3.4 bug] FixedInt<0> error with gcc 4.9.3
Issue #2263 usage of signed zeros leads to wrong results with -ffast-math
Issue #2251 Method unaryExpr() does not support function pointers in Eigen 3.4rc1
Issue #2242 No matching function for call to \"...\" in 'Complex.h' and 'GenericPacketMathFunctions.h'
Issue #2229 Copies (& potentially moves?) of Eigen object with large unused MaxRows/ColAtCompileTime are slow (Regression from Eigen 3.2)
Issue #2213 template maxCoeff<PropagateNaN> compilation error with Eigen 3.4.
Issue #2209 unaryExpr deduces wrong return type on MSVC
Issue #2157 forward_adolc test fails since PR !363
Issue #2119 Move assignment swaps even for non-dynamic storage
Issue #2112 Build failure with boost::multiprecision type
Issue #2093 Incorrect evaluation of Ref
Issue #1906 Eigen failed with error C2440 with MSVC on windows
Issue #1850 error C4996: 'std::result_of<T>': warning STL4014: std::result_of and std::result_of_t are deprecated in C++17. They are superseded by std::invoke_result and std::invoke_result_t
Issue #1833 c++20 compilation failure
Issue #1826 -Wdeprecated-anon-enum-enum-conversion warnings (c++20)
Issue #1815 IndexedView of a vector should allow linear access
Issue #1805 Uploaded doxygen documentation does not build LaTeX formulae
Issue #1790 packetmath_1 unit test fails
Issue #1788 Rule-of-three/rule-of-five violations
Issue #1776 subvector_stl_iterator::operator-> triggers 'taking address of rvalue' warning
Issue #1774 std::cbegin() returns non-const iterator
Issue #1752 A change to the C++ Standard will break some tests
Issue #1741 Map<>.noalias()=A*B gives wrong result
Issue #1736 Column access of some IndexedView won't compile
Issue #1718 Use of builtin vec_sel is ambiguous when compiling with Clang for PowerPC
Issue #1695 Stuck in loop for a certain input when using mpreal support
Issue #1692 pass enumeration argument to constructor of VectorXd
Issue #1684 array_reverse fails with clang >=6 + AVX + -O2
Issue #1674 SIMD sin/cos gives wrong results with -ffast-math
Issue #1669 Zero-sized matrices generate assertion failures
Issue #1664 dot product with single column block fails with new static checks
Issue #1652 Corner cases in SIMD sin/cos
Issue #1643 Compilation failure
Issue #1637 Register spilling with recent gcc & clang
Issue #1619 const_iterator vs iterator compilation error
Issue #1615 Performance of (aliased) matrix multiplication with fixed size 3x3 matrices slow
Issue #1611 NEON: plog(+/-0) should return -inf and not NaN
Issue #1585 Matrix product is repeatedly evaluated when iterating over the product expression
Issue #1557 Fail to compute eigenvalues for a simple 3x3 companion matrix for root finding
Issue #1544 SparseQR generates incorrect Q matrix in complex case
Issue #1543 \"Fix linear indexing in generic block evaluation\" breaks Matrix*Diagonal*Vector product
Issue #1493 dense Q extraction and solve is sometimes erroneous for complex matrices
Issue #1453 Strange behavior for Matrix::Map, if only InnerStride is provided
Issue #1409 Add support for C++17 operator new alignment
Issue #1340 Add operator + to sparse matrix iterator
Issue #1318 More robust quaternion from matrix
Issue #1306 Add support for AVX512 to Eigen
Issue #1305 Implementation of additional component-wise unary functions
Issue #1221 I get tons of error since my distribution upgraded to GCC 6.1.1
Issue #1195 vectorization_logic fails: Matrix3().cwiseQuotient(Matrix3()) expected CompleteUnrolling, got NoUnrolling
Issue #1194 Improve det4x4
Issue #1049 std::make_shared fails to fulfill structure aliment
Issue #1046 fixed matrix types do not report correct alignment requirements
Issue #1014 Eigenvalues 3x3 matrix
Issue #1001 infer dimensions of Dynamic-sized temporaries from the entire expression (if possible)
Issue #977 Add stable versions of normalize() and normalized()
Issue #899 SparseQR occasionally fails for under-determined systems
Issue #864 C++11 alias templates for commonly used types
Issue #751 Make AMD Ordering numerically more robust
Issue #747 Allow for negative stride
Issue #720 Gaussian NullaryExpr
Issue #663 Permit NoChange in setZero, setOnes, setConstant, setRandom
Issue #645 GeneralizedEigenSolver: missing computation of eigenvectors
Issue #632 Optimize addition/subtraction of sparse and dense matrices/vectors
Issue #631 (Optionally) throw an exception when using an unsuccessful decomposition
Issue #564 maxCoeff() returns -nan instead of max, while maxCoeff(&maxRow, &maxCol) works
Issue #556 Matrix multiplication crashes using mingw 4.7
Issue #505 Assert if temporary objects that are still referred to get destructed (was: Misbehaving Product on C++11)
Issue #445 ParametrizedLine should have transform method
Issue #437 [feature request] Add Reshape Operation
Issue #426 Behavior of sum() for Matrix<bool> is unexpected and confusing
Issue #329 Feature request: Ability to get a \"view\" into a sub-matrix by indexing it with a vector or matrix of indices
Issue #231 STL compatible iterators
Issue #96 Clean internal::result_of
Issue #65 Core - optimize partial reductions
Issue #64 Tests : precision-oriented tests

Additional information

  • A curated list of commits, approximately organized by the same topics as the release notes above, and sorted in reverse chronological order can be found here.