Difference between revisions of "User:Rmlarsen/3.4"
From Eigen
(→Changes to Tensor module) 
(→Backendspecific improvements) 

Line 145:  Line 145:  
** Changes to the gebp_kernel specific to Altivec, using VSX implementation of the MMA instructions that gain speed improvements up to 4x for matrixmatrix products.  ** Changes to the gebp_kernel specific to Altivec, using VSX implementation of the MMA instructions that gain speed improvements up to 4x for matrixmatrix products.  
** Dynamic dispatch for GCC greater than 10 enabling selection of MMA or VSX instructions based on __builtin_cpu_supports.  ** Dynamic dispatch for GCC greater than 10 enabling selection of MMA or VSX instructions based on __builtin_cpu_supports.  
−  * GPU (CUDA and HIP)  +  * '''GPU (CUDA and HIP)''' 
** Several optimized math functions added, better support for <code>std::complex</code>.  ** Several optimized math functions added, better support for <code>std::complex</code>.  
** Added option to disable CUDA entirely by defining <code>EIGEN_NO_CUDA</code>.  ** Added option to disable CUDA entirely by defining <code>EIGEN_NO_CUDA</code>.  
** Many more functions can now be used in device code (e.g. comparisons, small matrix inversion).  ** Many more functions can now be used in device code (e.g. comparisons, small matrix inversion).  
−  * ZVector  +  * '''ZVector''' 
** Vectorized <code>float</code> and <code>std::complex<float></code> support added.  ** Vectorized <code>float</code> and <code>std::complex<float></code> support added.  
** Added z14 support.  ** Added z14 support.  
−  * SYCL  +  * '''SYCL''' 
** Redesigned SYCL implementation for use with the [https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html Tensor] module, which can be enabled by defining <code>EIGEN_USE_SYCL</code>.  ** Redesigned SYCL implementation for use with the [https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html Tensor] module, which can be enabled by defining <code>EIGEN_USE_SYCL</code>.  
** New generic memory model introduced used by <code>TensorDeviceSycl</code>.  ** New generic memory model introduced used by <code>TensorDeviceSycl</code>. 
Revision as of 04:00, 18 August 2021
Contents
 1 Changes to Supported Modules
 1.1 Changes that might impact existing code
 1.2 New Major Features in Core
 1.3 New backends
 1.4 Improvements to Eigen Core
 1.5 Elementwise math functions
 1.6 Dense matrix decompositions and solvers
 1.7 Sparse matrix support, decompositions and solvers
 1.8 Type support
 1.9 Improved Geometry Module
 1.10 Backendspecific improvements
 1.11 Miscellaneous API Changes
 1.12 Improvement to NaN propagation
 2 Changes to unsupported modules
 3 Other relevant changes
Changes to Supported Modules
Changes that might impact existing code
 Using float or double for indexing matrices, vectors and array will now fail to compile, ex.:
MatrixXd A(10,10); float one = 1; double a11 = A(one,1.); // compilation error here
New Major Features in Core
 Add c++11 initializer_list constructors to Matrix and Array [doc]:
MatrixXi a { // construct a 2x3 matrix {1,2,3}, // first row {4,5,6} // second row }; VectorXd v{{1, 2, 3, 4, 5}}; // construct a dynamicsize vector with 5 elements Array<int,1,5> a{1,2, 3, 4, 5}; // initialize a fixedsize 1D array of size 5.
 Add STLcompatible iterators for dense expressions [doc]. Some examples:
VectorXd v = ...; MatrixXd A = ...; // range for loop over all entries of v then A for(auto x : v) { cout << x << " "; } for(auto x : A.reshaped()) { cout << x << " "; } // sort v then each column of A std::sort(v.begin(), v.end()); for(auto c : A.colwise()) std::sort(c.begin(), c.end());
 Add C++11 template aliases for Matrix, Vector, and Array of common sizes, including generic
Vector<Type,Size>
andRowVector<Type,Size>
aliases [doc].
MatrixX<double> M; // Instead of MatrixXd or Matrix<Dynamic, Dynamic, double> Vector4<MyType> V; // Instead of Vector<4, MyType>
 New support for
bfloat16
. The 16bit Brain floating point format[1] is now available asEigen::bfloat16
. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert backandforth betweenuint16_t
to extract the bit representation, useEigen::numext::bit_cast
.
bfloat16 s(0.25); // explicit construction uint16_t s_bits = numext::bit_cast<uint16_t>(s); // bit representation using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>; MatrixBf16 X = s * MatrixBf16::Random(3, 3);
New backends
 Arm SVE: Eigen now supports Arm's Scalable Vector Extension (SVE). Currently only fixedlenght SVE vectors for
uint32_t
andfloat
are available.  MIPS MSA: Eigen now supports the MIPS SIMD Architecture (MSA)
 AMD ROCm HIP: Eigen now contains a generic GPU backend for NVIDIA/AMD that unifies support for CUDA and HIP.
 Power 10 MMA Backend: Eigen now has initial support for Power 10 matrix multiplication assist instructions for float32, float64 real and complex.
Improvements to Eigen Core
 Eigen now uses c++11 the alignas keyword for static alignment. Users targeting C++17 only and recent compilers (e.g., GCC>=7, clang>=5, MSVC>=19.12) will thus be able to completely forget about all issues related to static alignment, including
EIGEN_MAKE_ALIGNED_OPERATOR_NEW
.  Various performance improvements for products and Eigen's GEBP and GEMV kernels have been implemented:
 By using half and quaterpackets the performance of matrix multiplications of small to medium sized matrices has been improved
 Eigen's GEMM now falls back to GEMV if it detects that a matrix is a runtime vector
 The performance of matrix products using Arm Neon has been drastically improved (up to 20%)
 Performance of many special cases of matrix products has been improved
 Large speed up from blocked algorithm for
transposeInPlace
.  Speed up misc. operations by propagating compiletime sizes (col/rowwise reverse, PartialPivLU, and others)
 Faster specialized SIMD kernels for small fixedsize inverse, LU decomposition, and determinant.
 Improve or add vectorization of partial or slice reductions along the outerdimension, for instance:
colmajor_mat.rowwise().mean()
Elementwise math functions
 Many functions are now implemented and vectorized in generic (backendagnostic) form.
 Many improvements to correctness, accuracy, and compatibility with c++ standard library.
 Much improved implementation of
ldexp
.  Misc. fixes for corner cases, NaN/Inf inputs and singular points of many functions.
 New PayneHanek argument reduction algorithm for
sin
andcos
with huge arguments.  New vectorized faithfully rounded algorithm for
pow(x,y)
.
 Much improved implementation of
 Speedups from (new or improved) vectorized versions of
pow, log, sin, cos, arg, pow, log2
, complexsqrt, erf, expm1, logp1, logistic, rint, gamma
andbessel
functions, and more.  Improved special function support (Bessel and gamma functions,
ndtri, erfc
, inverse hyperbolic functions and more)  New elementwise functions for
absolute_difference
,rint
.
Dense matrix decompositions and solvers
 SVD implementations now have an
info()
method for checking convergence.
#include <Eigen/SVD> MatrixXf m = MatrixXf::Random(3,2); JacobiSVD<MatrixXf> svd(m, ComputeThinU  ComputeThinV); if (svd.info() == ComputationInfo::Success) { // SVD computation was successful. VectorXf x = svd.solve(b); }
 Decompositions now fail quickly when invalid inputs are detected.
 Optimized the product of a householdersequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product.
 Fixed aliasing issues with inplace small matrix inversions.
 Fixed several edgecases with empty or zero inputs.
Sparse matrix support, decompositions and solvers
 Enabled assignment and addition with diagonal matrices.
SparseMatrix<float> A(10, 10); VectorXf x = VectorXf::Random(10); A = x.asDiagonal(); A += x.asDiagonal();
 Support added for SuiteSparse KLU routines via the module
KLUSupport
.
#include <Eigen/KLUSupport> A.makeCompressed(); // Recommendation is to compress input before calling sparse solvers. KLU<SparseMatrix<T> > klu(A); if (klu.info() == ComputationInfo::Success) { VectorXf x = klu.solve(b); }

SparseCholesky
now works with rowmajor matrices.  Various bug fixes and performance improvements.
Type support
 Improved support for
half
 Native support added for ARM
__fp16
, CUDA/HIP__half
,F16C
.  Better vectorization support added across all backends.
 Native support added for ARM
 Improved bool support
 Partial vectorization support added for boolean operations.
 Significantly improved performance (x25) for logical operations with
Matrix
orTensor
ofbool
.
 Improved support for custom types ===
 More custom types work outofthebox (see #2201[2]).
Improved Geometry Module
 Behavioral change:
Transform::computeRotationScaling()
andTransform::computeScalingRotation()
are now more continuous across degeneracies (see !349).  New minimal vectorization support added for
Quaternion
.  Vectorized 4x4 matrix inversion.
Backendspecific improvements
 The Arm NEON backend has been significantly improved:
 It now provides vectorization for
uint64_t
,int64_t
,uint32_t
,int16_t
,uint16_t
,int16_t
,int8_t
, anduint8_t
 It now can emulate
bfloat16
support when usingEigen::bfloat16
 It now supports emulated and native `float16` when using
Eigen::float16
 It now provides vectorization for
 SSE/AVX/AVX512
 Enabled AVX512 instructions by default if available.
 New
std::complex
,half
, andbfloat16
vectorization support added.  Many missing packet functions added.
 Altivec/Power
 General performance improvement and bugfixes.
 Enhanced vectorization of current real and complex scalars.
 Changes to the gebp_kernel specific to Altivec, using VSX implementation of the MMA instructions that gain speed improvements up to 4x for matrixmatrix products.
 Dynamic dispatch for GCC greater than 10 enabling selection of MMA or VSX instructions based on __builtin_cpu_supports.
 GPU (CUDA and HIP)
 Several optimized math functions added, better support for
std::complex
.  Added option to disable CUDA entirely by defining
EIGEN_NO_CUDA
.  Many more functions can now be used in device code (e.g. comparisons, small matrix inversion).
 Several optimized math functions added, better support for
 ZVector
 Vectorized
float
andstd::complex<float>
support added.  Added z14 support.
 Vectorized
 SYCL
 Redesigned SYCL implementation for use with the Tensor module, which can be enabled by defining
EIGEN_USE_SYCL
.  New generic memory model introduced used by
TensorDeviceSycl
.  Better integration with OpenCL devices.
 Added many math function specializations.
 Redesigned SYCL implementation for use with the Tensor module, which can be enabled by defining
Miscellaneous API Changes
 New
setConstant(...)
methods for preserving one dimension of a matrix by passing inNoChange_t
.
MatrixXf A(10, 5); // 10x5 matrix. A.setConstant(NoChange_t(), 10, 2); // 10x10 matrix of 2s. A.setConstant(5, NoChange_t(), 3); // 5x10 matrix of 3s. A.setZero(NoChange_t(), 20); // 5x20 matrix of 0s. A.setZero(20, NoChange_t()); // 20x20 matrix of 0s. A.setOnes(NoChange_t(), 5); // 20x5 matrix of 1s. A.setOnes(5, NoChange_t()); // 5x5 matrix of 1s. A.setRandom(NoChange_t(), 10); // 5x10 random matrix. A.setRandom(10, NoChange_t()); // 10x10 random matrix.
 Added
setUnit(Index i)
for vectors that sets the i th coefficient to one and all others to zero.
VectorXf v(5); v.setUnit(3); // { 0, 0, 0, 1, 0}
 Added
transpose()
,adjoint()
,conjugate()
methods toSelfAdjointView
.  Added
shiftLeft<N>()
andshiftRight<N>()
coefficientwise arithmetic shift functions to Arrays.
MatrixXi A = MatrixXi::Random(2, 3); MatrixXi B = A.array().shiftRight<2>(); MatrixXi C = A.array().shiftLeft<6>();
 Enabled adding and subtracting of diagonal expressions.
VectorXf x = VectorXf::Random(5); VectorXf y = VectorXf::Random(5); MatrixXf A = MatrixXf::Identity(5, 5); A += x.asDiagonal()  y.asDiagonal();
 Allow userdefined default cache sizes via defining
EIGEN_DEFAULT_L1_CACHE_SIZE
, ...,EIGEN_DEFAULT_L3_CACHE_SIZE
.  Added
EIGEN_ALIGNOF(X)
macro for determining alignment of a provided variable.  Allow plugins for
VectorwiseOp
by defining a fileEIGEN_VECTORWISEOP_PLUGIN
(e.g.DEIGEN_VECTORWISEOP_PLUGIN=my_vectorwise_op_plugins.h
).  Allow disabling of IO operations by defining
EIGEN_NO_IO
.
Improvement to NaN propagation
 Improvements to NaN correctness for elementwise functions.
 New
NaNPropagation
template argument to control whether NaNs are propagated or suppressed in elementwisemin/max
and corresponding reductions onArray
,Matrix
, andTensor
. Example for max:
// Elementwise maximum Eigen::MatrixXf left, right, r1, r2; // Propagate NaN if either argument is NaN. r1 = left.cwiseMax<PropagateNaN>(right); // Suppress NaN if at least one argument is non NaN. r2 = left.cwiseMax<PropagateNumbers>(right); // Max reductions Eigen::MatrixXf m; float nan_if_any_or_max = m.template maxCoeff<PropagateNaN>(); float nan_if_all_or_max = m.template maxCoeff<PropagateNumbers>();
Changes to unsupported modules
New lowlatency nonblocking ThreadPool module
 Originally a part of the Tensor module,
Eigen::ThreadPool
is now separate and more portable, and forms the basis for multithreading in TensorFlow, for example. Example:
#include <Eigen/CXX11/ThreadPool> const int num_threads = 42; Eigen::ThreadPool tp(num_threads); auto do_stuff = []() { ... }; tp.Schedule(do_stuff);
Changes to Tensor module
 Performance optimizations of Tensor contraction
 Speed up "outerproductlike" operations by parallelizing over the contraction dimension, using thread_local buffers and recursive work splitting.
 Improved threading heuristics.
 Support for fusing elementwise operations into contraction during evaluation. Example:
// Apply Sqrt to all output elements. The optional OutputKernel argument to // contraction in this example is a functor over 2dimensional. The functor // is called for each output block of the results, to perform the elementwise // sqrt operation while the block is hot in cache. struct SqrtOutputKernel { template <typename Index, typename Scalar> EIGEN_ALWAYS_INLINE void operator()( const internal::blas_data_mapper<Scalar, Index, ColMajor>& output_mapper, const TensorContractionParams&, Index, Index, Index num_rows, Index num_cols) const { for (int i = 0; i < num_rows; ++i) { for (int j = 0; j < num_cols; ++j) { output_mapper(i, j) = std::sqrt(output_mapper(i, j)); } } } }; Tensor<float, 4, DataLayout> left(30, 50, 8, 31); Tensor<float, 5, DataLayout> right(8, 31, 7, 20, 10); Tensor<float, 5, DataLayout> result(30, 50, 7, 20, 10); Eigen::array<DimPair, 2> dims({{DimPair(2, 0), DimPair(3, 1)}}); result = left.contract(right, dims, SqrtOutputKernel());
 Performance optimizations of other Tensor operator
 Added vectorization, block evaluation, and multithreading for most operators.
 Significant speedup to broadcasting.
 Reduction of index computation overhead, e.g. using fast divisors in TensorGenerator, squeezing dimensions in TensorPadding.
 Complete rewrite of the block (tiling) evaluation framework for tensor expressions lead to significant speedups and reduced number of memory allocations.
 Added new API for asynchronous evaluation of tensor expressions. Example:
Tensor<float, 3> in1(200, 30, 70); Tensor<float, 3> in2(200, 30, 70); Tensor<float, 3> out(200, 30, 70); Eigen::ThreadPool tp(internal::random<int>(3, 11)); Eigen::ThreadPoolDevice thread_pool_device(&tp, internal::random<int>(3, 11)); Eigen::Barrier b(1); auto done = [&b]() { b.Notify(); }; out.device(thread_pool_device, std::move(done)) = in1 + in2 * 3.14f; b.Wait();
 Support for c++03 was officially dropped in Tensor module, since most of the code was written in c++11 anyway.
 Misc. minor behavior changes & fixes:
 Fix const correctness for TensorMap.
 Modify tensor argmin/argmax to always return first occurrence.
 More numerically stable tree reduction.
 Improve randomness of the tensor random generator.
 Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
 Support static dimensions (aka IndexList) in resizing/reshape/broadcast.
 Improved accuracy of Tensor FFT.
Changes to FFT module
 Faster and more accurate twiddle factor computation.
Improvements to EulerAngles
 EulerAngles can now be directly constructed from 3D vectors
 EulerAngles now provide
isApprox()
andcast()
functions
Changes to sparse iterative solvers
 Added new IRDS iterative linear solver.
#include <unsupported/Eigen/IterativeSolvers> A.makeCompressed(); // Recommendation is to compress input before calling sparse solvers. IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A); if (idrs.info() == ComputationInfo::Success) { VectorXf x = idrs.solve(b); }
Improvements to Polynomials
 PolynomialSolver can now be used with complex numbers
 The used solver will automatically choose between
EigenSolver
andComplexEigenSolver
depending on the scalar type used
Other relevant changes
 Eigen now provides an option to test with an external BLAS library
 Eigen can now be used with the PGI Compiler
 Printing when using GDB has been improved
 Eigen can now detect if a platform supports
int128
intrinsics