Difference between revisions of "User:Rmlarsen/3.4"

From Eigen
Jump to: navigation, search
(Improvements to Eigen Core)
(Elementwise math functions)
 
(One intermediate revision by the same user not shown)
Line 67: Line 67:
 
** The performance of matrix products using Arm Neon has been drastically improved (up to 20%)
 
** The performance of matrix products using Arm Neon has been drastically improved (up to 20%)
 
** Performance of many special cases of matrix products has been improved
 
** Performance of many special cases of matrix products has been improved
* Large speed up from blocked algorithm for <code>transposeInPlace</code>.
+
* Large speed up from blocked algorithm for <code>.transposeInPlace</code>.
 
* Speed up misc. operations by propagating compile-time sizes (col/row-wise reverse, PartialPivLU, and others)
 
* Speed up misc. operations by propagating compile-time sizes (col/row-wise reverse, PartialPivLU, and others)
 
* Faster specialized SIMD kernels for small fixed-size inverse, LU decomposition, and determinant.
 
* Faster specialized SIMD kernels for small fixed-size inverse, LU decomposition, and determinant.
Line 78: Line 78:
 
** Misc. fixes for corner cases, NaN/Inf inputs and singular points of many functions.
 
** Misc. fixes for corner cases, NaN/Inf inputs and singular points of many functions.
 
** New Payne-Hanek argument reduction algorithm for <code>sin</code> and <code>cos</code> with huge arguments.
 
** New Payne-Hanek argument reduction algorithm for <code>sin</code> and <code>cos</code> with huge arguments.
** New vectorized faithfully rounded algorithm for <code>pow(x,y)</code>.
+
** New faithfully rounded algorithm for <code>pow(x,y)</code>.
 
* Speedups from (new or improved) vectorized versions of <code>pow, log, sin, cos, arg, pow, log2</code>, complex <code>sqrt, erf, expm1, logp1, logistic, rint, gamma</code> and <code>bessel</code> functions, and more.
 
* Speedups from (new or improved) vectorized versions of <code>pow, log, sin, cos, arg, pow, log2</code>, complex <code>sqrt, erf, expm1, logp1, logistic, rint, gamma</code> and <code>bessel</code> functions, and more.
 
* Improved special function support (Bessel and gamma functions, <code>ndtri, erfc</code>, inverse hyperbolic functions and more)
 
* Improved special function support (Bessel and gamma functions, <code>ndtri, erfc</code>, inverse hyperbolic functions and more)

Latest revision as of 20:24, 18 August 2021

Eigen 3.4 was released on August 18 2021. It can be downloaded from the Download section on the Main Page.

Changes to supported modules

Changes that might break existing code

  • Using float or double for indexing matrices, vectors and arrays will now fail to compile, ex.:
MatrixXd A(10,10);
float one = 1;
double a11 = A(one,1.); // compilation error here

New Major Features in Core

  • Add c++11 initializer_list constructors to Matrix and Array [doc]:
MatrixXi a {      // construct a 2x3 matrix
      {1,2,3},    // first row
      {4,5,6}     // second row
};
VectorXd v{{1, 2, 3, 4, 5}};    // construct a dynamic-size vector with 5 elements
Array<int,1,5> a{1,2, 3, 4, 5}; // initialize a fixed-size 1D array of size 5.
  • Add STL-compatible iterators for dense expressions [doc]. Some examples:
VectorXd v = ...;
MatrixXd A = ...;
// range for loop over all entries of v then A
for(auto x : v) { cout << x << " "; }
for(auto x : A.reshaped()) { cout << x << " "; }
// sort v then each column of A
std::sort(v.begin(), v.end());
for(auto c : A.colwise())
    std::sort(c.begin(), c.end());
  • Add C++11 template aliases for Matrix, Vector, and Array of common sizes, including generic Vector<Type,Size> and RowVector<Type,Size> aliases [doc].
MatrixX<double> M;  // Instead of MatrixXd or Matrix<Dynamic, Dynamic, double>
Vector4<MyType> V;  // Instead of Vector<4, MyType>
  • New support for bfloat16. The 16-bit Brain floating point format is now available as Eigen::bfloat16. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert back-and-forth between uint16_t to extract the bit representation, use Eigen::numext::bit_cast.
  bfloat16 s(0.25);                                 // explicit construction
  uint16_t s_bits = numext::bit_cast<uint16_t>(s);  // bit representation
 
  using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>;
  MatrixBf16 X = s * MatrixBf16::Random(3, 3);

New backends

Improvements to Eigen Core

  • Eigen now uses the c++11 alignas keyword for static alignment. Users targeting C++17 only and recent compilers (e.g., GCC>=7, clang>=5, MSVC>=19.12) will thus be able to completely forget about all issues related to static alignment, including EIGEN_MAKE_ALIGNED_OPERATOR_NEW.
  • Various performance improvements for products and Eigen's GEBP and GEMV kernels have been implemented:
    • By using half- and quater-packets the performance of matrix multiplications of small to medium sized matrices has been improved
    • Eigen's GEMM now falls back to GEMV if it detects that a matrix is a run-time vector
    • The performance of matrix products using Arm Neon has been drastically improved (up to 20%)
    • Performance of many special cases of matrix products has been improved
  • Large speed up from blocked algorithm for .transposeInPlace.
  • Speed up misc. operations by propagating compile-time sizes (col/row-wise reverse, PartialPivLU, and others)
  • Faster specialized SIMD kernels for small fixed-size inverse, LU decomposition, and determinant.
  • Improved or added vectorization of partial or slice reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean()

Elementwise math functions

  • Many functions are now implemented and vectorized in generic (backend-agnostic) form.
  • Many improvements to correctness, accuracy, and compatibility with c++ standard library.
    • Much improved implementation of ldexp.
    • Misc. fixes for corner cases, NaN/Inf inputs and singular points of many functions.
    • New Payne-Hanek argument reduction algorithm for sin and cos with huge arguments.
    • New faithfully rounded algorithm for pow(x,y).
  • Speedups from (new or improved) vectorized versions of pow, log, sin, cos, arg, pow, log2, complex sqrt, erf, expm1, logp1, logistic, rint, gamma and bessel functions, and more.
  • Improved special function support (Bessel and gamma functions, ndtri, erfc, inverse hyperbolic functions and more)
  • New elementwise functions for absolute_difference, rint.

Dense matrix decompositions and solvers

  • All dense linear solvers (i.e., Cholesky, *LU, *QR, CompleteOrthogonalDecomposition, *SVD) now inherit SolverBase and thus support .transpose(), .adjoint() and .solve() APIs.
  • SVD implementations now have an info() method for checking convergence.
  #include <Eigen/SVD>
  MatrixXf m = MatrixXf::Random(3,2);
  JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV);
  if (svd.info() == ComputationInfo::Success) {
    // SVD computation was successful.
    VectorXf x = svd.solve(b);
  }
  • Most decompositions now fail quickly when invalid inputs are detected.
  • Optimized the product of a HouseholderSequence with the identity, as well as the evaluation of a HouseholderSequence to a dense matrix using faster blocked product.
  • Fixed aliasing issues with in-place small matrix inversions.
  • Fixed several edge-cases with empty or zero inputs.

Sparse matrix support, decompositions and solvers

  • Enabled assignment and addition with diagonal matrix expressions.
  SparseMatrix<float> A(10, 10);
  VectorXf x = VectorXf::Random(10);
  A = x.asDiagonal();
  A += x.asDiagonal();
  • Support added for SuiteSparse KLU routines via the KLUSupport module. SuiteSparse must be installed to use this module.
  #include <Eigen/KLUSupport>
  A.makeCompressed();   // Recommendation is to compress input before calling sparse solvers.
  KLU<SparseMatrix<T> > klu(A);
  if (klu.info() == ComputationInfo::Success) {
    VectorXf x = klu.solve(b);
  }
  • SparseCholesky now works with row-major matrices.
  • Various bug fixes and performance improvements.

Type support

  • Improved support for half
    • Native support added for ARM __fp16, CUDA/HIP __half, F16C.
    • Better vectorization support added across all backends.
  • Improved bool support
    • Partial vectorization support added for boolean operations.
    • Significantly improved performance (x25) for logical operations with Matrix or Tensor of bool.
  • Improved support for custom types
    • More custom types work out-of-the-box (see #2201).

Improved Geometry Module

  • Behavioral change: Transform::computeRotationScaling() and Transform::computeScalingRotation() are now more continuous across degeneracies (see !349).
  • New partial vectorization support added for Quaternion.
  • Generic vectorized 4x4 matrix inversion.

Backend-specific improvements

  • Arm NEON
    • Now provides vectorization for uint64_t, int64_t, uint32_t, int16_t, uint16_t, int16_t, int8_t, and uint8_t
    • Emulates bfloat16 support when using Eigen::bfloat16
    • Supports emulated and native float16 when using Eigen::half
  • SSE/AVX/AVX512
    • General performance improvements and bugfixes.
    • Enabled AVX512 instructions by default if available.
    • New std::complex, half, and bfloat16 vectorization support added.
    • Many missing packet functions added.
  • Altivec/Power
    • General performance improvement and bugfixes.
    • Enhanced vectorization of current real and complex scalars.
    • Changes to the gebp_kernel specific to Altivec, using VSX implementation of the MMA instructions that gain speed improvements up to 4x for matrix-matrix products.
    • Dynamic dispatch for GCC greater than 10 enabling selection of MMA or VSX instructions based on __builtin_cpu_supports.
  • GPU (CUDA and HIP)
    • Several optimized math functions added, better support for std::complex.
    • Added option to disable CUDA entirely by defining EIGEN_NO_CUDA.
    • Many more functions can now be used in device code (e.g. comparisons, small matrix inversion).
  • ZVector
    • Vectorized float and std::complex<float> support added.
    • Added z14 support.
  • SYCL
    • Redesigned SYCL implementation for use with the Tensor module, which can be enabled by defining EIGEN_USE_SYCL.
    • New generic memory model introduced used by TensorDeviceSycl.
    • Better integration with OpenCL devices.
    • Added many math function specializations.

Miscellaneous API Changes

  • New setConstant(...) methods for preserving one dimension of a matrix by passing in NoChange.
  MatrixXf A(10, 5);               // 10x5  matrix.
  A.setConstant(NoChange, 10, 2);  // 10x10 matrix of 2s.
  A.setConstant(5, NoChange, 3);   //  5x10 matrix of 3s.
  A.setZero(NoChange, 20);         //  5x20 matrix of 0s.
  A.setZero(20, NoChange);         // 20x20 matrix of 0s.
  A.setOnes(NoChange, 5);          // 20x5  matrix of 1s.
  A.setOnes(5, NoChange);          //  5x5  matrix of 1s.
  A.setRandom(NoChange, 10);       //  5x10 random matrix.
  A.setRandom(10, NoChange);       // 10x10 random matrix.
  • Added setUnit(Index i) for vectors that sets the i th coefficient to one and all others to zero.
  VectorXf v(5);
  v.setUnit(3);   // { 0, 0, 0, 1, 0}
  • Added transpose(), adjoint(), conjugate() methods to SelfAdjointView.
  • Added shiftLeft<N>() and shiftRight<N>() coefficient-wise arithmetic shift functions to Arrays.
  ArrayXXi A = ArrayXXi::Random(2, 3);
  ArrayXXi B = A.shiftRight<2>();
  ArrayXXi C = A.shiftLeft<6>();
  • Enabled adding and subtracting of diagonal expressions.
  VectorXf x = VectorXf::Random(5);
  VectorXf y = VectorXf::Random(5);
  MatrixXf A = MatrixXf::Identity(5, 5);
  A += x.asDiagonal() - y.asDiagonal();
  • Allow user-defined default cache sizes via defining EIGEN_DEFAULT_L1_CACHE_SIZE, ..., EIGEN_DEFAULT_L3_CACHE_SIZE.
  • Added EIGEN_ALIGNOF(X) macro for determining alignment of a provided variable.
  • Allow plugins for VectorwiseOp by defining a file EIGEN_VECTORWISEOP_PLUGIN (e.g. -DEIGEN_VECTORWISEOP_PLUGIN=my_vectorwise_op_plugins.h).
  • Allow disabling of IO operations by defining EIGEN_NO_IO.

Improvement to NaN propagation

  • Improvements to NaN correctness for elementwise functions.
  • New NaNPropagation template argument to control whether NaNs are propagated or suppressed in elementwise min/max and corresponding reductions on Array, Matrix, and Tensor. Example for max:
// Elementwise maximum
Eigen::MatrixXf left, right, r0, r1, r2;
r0 = left.cwiseMax(right); // Implementation defined behavior.
// Propagate NaN if either argument is NaN.
r1 = left.template cwiseMax<PropagateNaN>(right);
// Suppress NaN if at least one argument is not a NaN.
r2 = left.template cwiseMax<PropagateNumbers>(right);
 
// Max reductions
Eigen::MatrixXf m;
float nan_or_max = m.maxCoeff(); // Implementation defined behavior.
float nan_if_any_or_max = m.template maxCoeff<PropagateNaN>();
float nan_if_all_or_max = m.template maxCoeff<PropagateNumbers>();

Changes to unsupported modules

New low-latency non-blocking ThreadPool module

  • Originally a part of the Tensor module, Eigen::ThreadPool is now separate and more portable, and forms the basis for multi-threading in TensorFlow, for example. Example:
  #include <Eigen/CXX11/ThreadPool>
 
  const int num_threads = 42;
  Eigen::ThreadPool tp(num_threads);
  auto do_stuff = []() { ... };
  tp.Schedule(do_stuff);

Changes to Tensor module

  • Support for c++03 was officially dropped in Tensor module, since most of the code was written in c++11 anyway. This will prevent building the code for CUDA with older version of nvcc.
  • Performance optimizations of Tensor contraction
    • Speed up "outer-product-like" operations by parallelizing over the contraction dimension, using thread_local buffers and recursive work splitting.
    • Improved threading heuristics.
    • Support for fusing element-wise operations into contraction during evaluation. Example:
// This example applies std::sqrt to all output elements from a tensor contraction. 
// The optional OutputKernel argument to the contraction in this example is a functor over a 
// 2-dimensional buffer. The functor is called once for each output block of the contraction 
// result, to perform the elementwise sqrt operation while the block is hot in cache.
struct SqrtOutputKernel {
  template <typename Index, typename Scalar>
  EIGEN_ALWAYS_INLINE void operator()(
      const internal::blas_data_mapper<Scalar, Index, ColMajor>& output_mapper,
      const TensorContractionParams&, Index, Index, Index num_rows,
      Index num_cols) const {
    for (int i = 0; i < num_rows; ++i) {
      for (int j = 0; j < num_cols; ++j) {
        output_mapper(i, j) = std::sqrt(output_mapper(i, j));
      }
    }
  }
};
 
Tensor<float, 4, DataLayout> left(30, 50, 8, 31);
Tensor<float, 5, DataLayout> right(8, 31, 7, 20, 10);
Tensor<float, 5, DataLayout> result(30, 50, 7, 20, 10);
Eigen::array<DimPair, 2> dims({{DimPair(2, 0), DimPair(3, 1)}});
 
result = left.contract(right, dims, SqrtOutputKernel());
  • Performance optimizations of other Tensor operator
    • Speedups from improved vectorization, block evaluation, and multi-threading for most operators.
    • Significant speedup to broadcasting.
    • Reduction of index computation overhead, e.g. using fast divisors in TensorGenerator, squeezing dimensions in TensorPadding.
  • Complete rewrite of the block (tiling) evaluation framework for tensor expressions lead to significant speedups and reduced number of memory allocations.
  • Added new API for asynchronous evaluation of tensor expressions. Example:
  Tensor<float, 3> in1(200, 30, 70);
  Tensor<float, 3> in2(200, 30, 70);
  Tensor<float, 3> out(200, 30, 70);
 
  Eigen::ThreadPool tp(internal::random<int>(3, 11));
  Eigen::ThreadPoolDevice thread_pool_device(&tp, internal::random<int>(3, 11));
 
  Eigen::Barrier b(1);
  auto done = [&b]() { b.Notify(); };
  out.device(thread_pool_device, std::move(done)) = in1 + in2 * 3.14f;
  b.Wait();
  • Misc. minor behavior changes & fixes:
    • Fix const correctness for TensorMap.
    • Modify tensor argmin/argmax to always return first occurrence.
    • More numerically stable tree reduction.
    • Improve randomness of the tensor random generator.
    • Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
    • Support static dimensions (aka IndexList) in resizing/reshape/broadcast.
    • Improved accuracy of Tensor FFT.

Improvements to FFT module

  • Faster and more accurate twiddle factor computation.

Improvements to EulerAngles

  • EulerAngles can now be directly constructed from 3D vectors
  • EulerAngles now provide isApprox() and cast() functions

Changes to sparse iterative solvers

  • Added new IRDS iterative linear solver.
  #include <unsupported/Eigen/IterativeSolvers>
  A.makeCompressed();   // Recommendation is to compress input before calling sparse solvers.
  IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A);
  if (idrs.info() == ComputationInfo::Success) {
    VectorXf x = idrs.solve(b);
  }

Improvements to Polynomials

  • PolynomialSolver can now be used with complex numbers
  • The solver will automatically choose between EigenSolver and ComplexEigenSolver depending on the scalar type used

Other relevant changes

  • Eigen now provides an option to test with an external BLAS library
  • Eigen can now be used with the PGI Compiler
  • Printing when using GDB has been improved
  • Eigen can now detect if a platform supports int128 intrinsics

Testing

The full Eigen test suite was built and run successfully (in c++03 and c++11 mode) with the following compiler/platform/OS combinations:

Compiler Version Platform Operating system
Microsoft Visual Studio 2015 Update 3 x86-64 Windows
Microsoft Visual Studio Community 2017 - 15.9.38 x86-64 Windows
Microsoft Visual Studio Community 2019 - 16.11 x86-64 Windows
GCC 4.8 x86-64 Linux
GCC 9 x86-64 Linux
GCC 10 x86-64 Linux
Clang 6.0 x86-64 Linux
Clang 10 x86-64 Linux
Clang 11 x86-64 Linux
GCC 10 armv8.2-a Linux
Clang 6 armv8.2-a Linux
Clang 9 armv8.2-a Linux
Clang 10 armv8.2-a Linux
Clang 11 armv8.2-a Linux
AppleClang 12.0.5 x86-64 macOS
GCC 10 ppc64le Linux
Clang 10 ppc64le Linux

List of issues fixed in Eigen 3.4

Issue #2298 List of dense linear decompositions lacks completeorthogonal decomposition
Issue #2284 JacobiSVD Outputs Invalid U (Reads Past End of Array)
Issue #2267 [3.4 bug] FixedInt<0> error with gcc 4.9.3
Issue #2263 usage of signed zeros leads to wrong results with -ffast-math
Issue #2251 Method unaryExpr() does not support function pointers in Eigen 3.4rc1
Issue #2242 No matching function for call to \"...\" in 'Complex.h' and 'GenericPacketMathFunctions.h'
Issue #2229 Copies (& potentially moves?) of Eigen object with large unused MaxRows/ColAtCompileTime are slow (Regression from Eigen 3.2)
Issue #2213 template maxCoeff<PropagateNaN> compilation error with Eigen 3.4.
Issue #2209 unaryExpr deduces wrong return type on MSVC
Issue #2157 forward_adolc test fails since PR !363
Issue #2119 Move assignment swaps even for non-dynamic storage
Issue #2112 Build failure with boost::multiprecision type
Issue #2093 Incorrect evaluation of Ref
Issue #1906 Eigen failed with error C2440 with MSVC on windows
Issue #1850 error C4996: 'std::result_of<T>': warning STL4014: std::result_of and std::result_of_t are deprecated in C++17. They are superseded by std::invoke_result and std::invoke_result_t
Issue #1833 c++20 compilation failure
Issue #1826 -Wdeprecated-anon-enum-enum-conversion warnings (c++20)
Issue #1815 IndexedView of a vector should allow linear access
Issue #1805 Uploaded doxygen documentation does not build LaTeX formulae
Issue #1790 packetmath_1 unit test fails
Issue #1788 Rule-of-three/rule-of-five violations
Issue #1776 subvector_stl_iterator::operator-> triggers 'taking address of rvalue' warning
Issue #1774 std::cbegin() returns non-const iterator
Issue #1752 A change to the C++ Standard will break some tests
Issue #1741 Map<>.noalias()=A*B gives wrong result
Issue #1736 Column access of some IndexedView won't compile
Issue #1718 Use of builtin vec_sel is ambiguous when compiling with Clang for PowerPC
Issue #1695 Stuck in loop for a certain input when using mpreal support
Issue #1692 pass enumeration argument to constructor of VectorXd
Issue #1684 array_reverse fails with clang >=6 + AVX + -O2
Issue #1674 SIMD sin/cos gives wrong results with -ffast-math
Issue #1669 Zero-sized matrices generate assertion failures
Issue #1664 dot product with single column block fails with new static checks
Issue #1652 Corner cases in SIMD sin/cos
Issue #1643 Compilation failure
Issue #1637 Register spilling with recent gcc & clang
Issue #1619 const_iterator vs iterator compilation error
Issue #1615 Performance of (aliased) matrix multiplication with fixed size 3x3 matrices slow
Issue #1611 NEON: plog(+/-0) should return -inf and not NaN
Issue #1585 Matrix product is repeatedly evaluated when iterating over the product expression
Issue #1557 Fail to compute eigenvalues for a simple 3x3 companion matrix for root finding
Issue #1544 SparseQR generates incorrect Q matrix in complex case
Issue #1543 \"Fix linear indexing in generic block evaluation\" breaks Matrix*Diagonal*Vector product
Issue #1493 dense Q extraction and solve is sometimes erroneous for complex matrices
Issue #1453 Strange behavior for Matrix::Map, if only InnerStride is provided
Issue #1409 Add support for C++17 operator new alignment
Issue #1340 Add operator + to sparse matrix iterator
Issue #1318 More robust quaternion from matrix
Issue #1306 Add support for AVX512 to Eigen
Issue #1305 Implementation of additional component-wise unary functions
Issue #1221 I get tons of error since my distribution upgraded to GCC 6.1.1
Issue #1195 vectorization_logic fails: Matrix3().cwiseQuotient(Matrix3()) expected CompleteUnrolling, got NoUnrolling
Issue #1194 Improve det4x4
Issue #1049 std::make_shared fails to fulfill structure aliment
Issue #1046 fixed matrix types do not report correct alignment requirements
Issue #1014 Eigenvalues 3x3 matrix
Issue #1001 infer dimensions of Dynamic-sized temporaries from the entire expression (if possible)
Issue #977 Add stable versions of normalize() and normalized()
Issue #899 SparseQR occasionally fails for under-determined systems
Issue #864 C++11 alias templates for commonly used types
Issue #751 Make AMD Ordering numerically more robust
Issue #747 Allow for negative stride
Issue #720 Gaussian NullaryExpr
Issue #663 Permit NoChange in setZero, setOnes, setConstant, setRandom
Issue #645 GeneralizedEigenSolver: missing computation of eigenvectors
Issue #632 Optimize addition/subtraction of sparse and dense matrices/vectors
Issue #631 (Optionally) throw an exception when using an unsuccessful decomposition
Issue #564 maxCoeff() returns -nan instead of max, while maxCoeff(&maxRow, &maxCol) works
Issue #556 Matrix multiplication crashes using mingw 4.7
Issue #505 Assert if temporary objects that are still referred to get destructed (was: Misbehaving Product on C++11)
Issue #445 ParametrizedLine should have transform method
Issue #437 [feature request] Add Reshape Operation
Issue #426 Behavior of sum() for Matrix<bool> is unexpected and confusing
Issue #329 Feature request: Ability to get a \"view\" into a sub-matrix by indexing it with a vector or matrix of indices
Issue #231 STL compatible iterators
Issue #96 Clean internal::result_of
Issue #65 Core - optimize partial reductions
Issue #64 Tests : precision-oriented tests

Additional information

  • A curated list of commits, approximately organized by the same topics as the release notes above, and sorted in reverse chronological order can be found here.