Difference between revisions of "User:Rmlarsen/3.4"
From Eigen
(→Improvements to Eigen Core) |
(→Dense matrix decompositions and solvers) |
||
Line 92: | Line 92: | ||
</source> | </source> | ||
* Decompositions now fail quickly when invalid inputs are detected. | * Decompositions now fail quickly when invalid inputs are detected. | ||
− | * Optimized the product of a householder-sequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product. | + | * Optimized the product of a householder-sequence with the identity, and optimize the evaluation of a <code>HouseholderSequence</code> to a dense matrix using faster blocked product. |
* Fixed aliasing issues with in-place small matrix inversions. | * Fixed aliasing issues with in-place small matrix inversions. | ||
* Fixed several edge-cases with empty or zero inputs. | * Fixed several edge-cases with empty or zero inputs. |
Revision as of 15:11, 18 August 2021
Contents
- 1 Changes to Supported Modules
- 1.1 Changes that might break existing code
- 1.2 New Major Features in Core
- 1.3 New backends
- 1.4 Improvements to Eigen Core
- 1.5 Elementwise math functions
- 1.6 Dense matrix decompositions and solvers
- 1.7 Sparse matrix support, decompositions and solvers
- 1.8 Type support
- 1.9 Improved Geometry Module
- 1.10 Backend-specific improvements
- 1.11 Miscellaneous API Changes
- 1.12 Improvement to NaN propagation
- 2 Changes to unsupported modules
- 3 Other relevant changes
Changes to Supported Modules
Changes that might break existing code
- Using float or double for indexing matrices, vectors and array will now fail to compile, ex.:
MatrixXd A(10,10); float one = 1; double a11 = A(one,1.); // compilation error here
New Major Features in Core
- Add c++11 initializer_list constructors to Matrix and Array [doc]:
MatrixXi a { // construct a 2x3 matrix {1,2,3}, // first row {4,5,6} // second row }; VectorXd v{{1, 2, 3, 4, 5}}; // construct a dynamic-size vector with 5 elements Array<int,1,5> a{1,2, 3, 4, 5}; // initialize a fixed-size 1D array of size 5.
- Add STL-compatible iterators for dense expressions [doc]. Some examples:
VectorXd v = ...; MatrixXd A = ...; // range for loop over all entries of v then A for(auto x : v) { cout << x << " "; } for(auto x : A.reshaped()) { cout << x << " "; } // sort v then each column of A std::sort(v.begin(), v.end()); for(auto c : A.colwise()) std::sort(c.begin(), c.end());
- Add C++11 template aliases for Matrix, Vector, and Array of common sizes, including generic
Vector<Type,Size>
andRowVector<Type,Size>
aliases [doc].
MatrixX<double> M; // Instead of MatrixXd or Matrix<Dynamic, Dynamic, double> Vector4<MyType> V; // Instead of Vector<4, MyType>
- New support for
bfloat16
. The 16-bit Brain floating point format is now available asEigen::bfloat16
. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert back-and-forth betweenuint16_t
to extract the bit representation, useEigen::numext::bit_cast
.
bfloat16 s(0.25); // explicit construction uint16_t s_bits = numext::bit_cast<uint16_t>(s); // bit representation using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>; MatrixBf16 X = s * MatrixBf16::Random(3, 3);
New backends
- Arm SVE: Eigen now supports Arm's Scalable Vector Extension (SVE). Currently only fixed-lenght SVE vectors for
uint32_t
andfloat
are available. - MIPS MSA: Eigen now supports the MIPS SIMD Architecture (MSA)
- AMD ROCm HIP: Eigen now contains a generic GPU backend for NVIDIA/AMD that unifies support for CUDA and HIP.
- Power 10 MMA Backend: Eigen now has initial support for Power 10 matrix multiplication assist instructions for float32, float64 real and complex.
Improvements to Eigen Core
- Eigen now uses c++11 the alignas keyword for static alignment. Users targeting C++17 only and recent compilers (e.g., GCC>=7, clang>=5, MSVC>=19.12) will thus be able to completely forget about all issues related to static alignment, including
EIGEN_MAKE_ALIGNED_OPERATOR_NEW
. - Various performance improvements for products and Eigen's GEBP and GEMV kernels have been implemented:
- By using half- and quater-packets the performance of matrix multiplications of small to medium sized matrices has been improved
- Eigen's GEMM now falls back to GEMV if it detects that a matrix is a run-time vector
- The performance of matrix products using Arm Neon has been drastically improved (up to 20%)
- Performance of many special cases of matrix products has been improved
- Large speed up from blocked algorithm for
transposeInPlace
. - Speed up misc. operations by propagating compile-time sizes (col/row-wise reverse, PartialPivLU, and others)
- Faster specialized SIMD kernels for small fixed-size inverse, LU decomposition, and determinant.
- Improved or added vectorization of partial or slice reductions along the outer-dimension, for instance:
colmajor_mat.rowwise().mean()
Elementwise math functions
- Many functions are now implemented and vectorized in generic (backend-agnostic) form.
- Many improvements to correctness, accuracy, and compatibility with c++ standard library.
- Much improved implementation of
ldexp
. - Misc. fixes for corner cases, NaN/Inf inputs and singular points of many functions.
- New Payne-Hanek argument reduction algorithm for
sin
andcos
with huge arguments. - New vectorized faithfully rounded algorithm for
pow(x,y)
.
- Much improved implementation of
- Speedups from (new or improved) vectorized versions of
pow, log, sin, cos, arg, pow, log2
, complexsqrt, erf, expm1, logp1, logistic, rint, gamma
andbessel
functions, and more. - Improved special function support (Bessel and gamma functions,
ndtri, erfc
, inverse hyperbolic functions and more) - New elementwise functions for
absolute_difference
,rint
.
Dense matrix decompositions and solvers
- SVD implementations now have an
info()
method for checking convergence.
#include <Eigen/SVD> MatrixXf m = MatrixXf::Random(3,2); JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV); if (svd.info() == ComputationInfo::Success) { // SVD computation was successful. VectorXf x = svd.solve(b); }
- Decompositions now fail quickly when invalid inputs are detected.
- Optimized the product of a householder-sequence with the identity, and optimize the evaluation of a
HouseholderSequence
to a dense matrix using faster blocked product. - Fixed aliasing issues with in-place small matrix inversions.
- Fixed several edge-cases with empty or zero inputs.
Sparse matrix support, decompositions and solvers
- Enabled assignment and addition with diagonal matrices.
SparseMatrix<float> A(10, 10); VectorXf x = VectorXf::Random(10); A = x.asDiagonal(); A += x.asDiagonal();
- Support added for SuiteSparse KLU routines via the module
KLUSupport
.
#include <Eigen/KLUSupport> A.makeCompressed(); // Recommendation is to compress input before calling sparse solvers. KLU<SparseMatrix<T> > klu(A); if (klu.info() == ComputationInfo::Success) { VectorXf x = klu.solve(b); }
-
SparseCholesky
now works with row-major matrices. - Various bug fixes and performance improvements.
Type support
- Improved support for
half
- Native support added for ARM
__fp16
, CUDA/HIP__half
,F16C
. - Better vectorization support added across all backends.
- Native support added for ARM
- Improved bool support
- Partial vectorization support added for boolean operations.
- Significantly improved performance (x25) for logical operations with
Matrix
orTensor
ofbool
.
- Improved support for custom types ===
- More custom types work out-of-the-box (see #2201[1]).
Improved Geometry Module
- Behavioral change:
Transform::computeRotationScaling()
andTransform::computeScalingRotation()
are now more continuous across degeneracies (see !349). - New minimal vectorization support added for
Quaternion
. - Vectorized 4x4 matrix inversion.
Backend-specific improvements
- Arm NEON
- Now provides vectorization for
uint64_t
,int64_t
,uint32_t
,int16_t
,uint16_t
,int16_t
,int8_t
, anduint8_t
- Emulates
bfloat16
support when usingEigen::bfloat16
- Supports emulated and native
float16
when usingEigen::float16
- Now provides vectorization for
- SSE/AVX/AVX512
- Enabled AVX512 instructions by default if available.
- New
std::complex
,half
, andbfloat16
vectorization support added. - Many missing packet functions added.
- Altivec/Power
- General performance improvement and bugfixes.
- Enhanced vectorization of current real and complex scalars.
- Changes to the gebp_kernel specific to Altivec, using VSX implementation of the MMA instructions that gain speed improvements up to 4x for matrix-matrix products.
- Dynamic dispatch for GCC greater than 10 enabling selection of MMA or VSX instructions based on __builtin_cpu_supports.
- GPU (CUDA and HIP)
- Several optimized math functions added, better support for
std::complex
. - Added option to disable CUDA entirely by defining
EIGEN_NO_CUDA
. - Many more functions can now be used in device code (e.g. comparisons, small matrix inversion).
- Several optimized math functions added, better support for
- ZVector
- Vectorized
float
andstd::complex<float>
support added. - Added z14 support.
- Vectorized
- SYCL
- Redesigned SYCL implementation for use with the Tensor module, which can be enabled by defining
EIGEN_USE_SYCL
. - New generic memory model introduced used by
TensorDeviceSycl
. - Better integration with OpenCL devices.
- Added many math function specializations.
- Redesigned SYCL implementation for use with the Tensor module, which can be enabled by defining
Miscellaneous API Changes
- New
setConstant(...)
methods for preserving one dimension of a matrix by passing inNoChange
.
MatrixXf A(10, 5); // 10x5 matrix. A.setConstant(NoChange, 10, 2); // 10x10 matrix of 2s. A.setConstant(5, NoChange, 3); // 5x10 matrix of 3s. A.setZero(NoChange, 20); // 5x20 matrix of 0s. A.setZero(20, NoChange); // 20x20 matrix of 0s. A.setOnes(NoChange, 5); // 20x5 matrix of 1s. A.setOnes(5, NoChange); // 5x5 matrix of 1s. A.setRandom(NoChange, 10); // 5x10 random matrix. A.setRandom(10, NoChange); // 10x10 random matrix.
- Added
setUnit(Index i)
for vectors that sets the i th coefficient to one and all others to zero.
VectorXf v(5); v.setUnit(3); // { 0, 0, 0, 1, 0}
- Added
transpose()
,adjoint()
,conjugate()
methods toSelfAdjointView
. - Added
shiftLeft<N>()
andshiftRight<N>()
coefficient-wise arithmetic shift functions to Arrays.
ArrayXXi A = ArrayXXi::Random(2, 3); ArrayXXi B = A.shiftRight<2>(); ArrayXXi C = A.shiftLeft<6>();
- Enabled adding and subtracting of diagonal expressions.
VectorXf x = VectorXf::Random(5); VectorXf y = VectorXf::Random(5); MatrixXf A = MatrixXf::Identity(5, 5); A += x.asDiagonal() - y.asDiagonal();
- Allow user-defined default cache sizes via defining
EIGEN_DEFAULT_L1_CACHE_SIZE
, ...,EIGEN_DEFAULT_L3_CACHE_SIZE
. - Added
EIGEN_ALIGNOF(X)
macro for determining alignment of a provided variable. - Allow plugins for
VectorwiseOp
by defining a fileEIGEN_VECTORWISEOP_PLUGIN
(e.g.-DEIGEN_VECTORWISEOP_PLUGIN=my_vectorwise_op_plugins.h
). - Allow disabling of IO operations by defining
EIGEN_NO_IO
.
Improvement to NaN propagation
- Improvements to NaN correctness for elementwise functions.
- New
NaNPropagation
template argument to control whether NaNs are propagated or suppressed in elementwisemin/max
and corresponding reductions onArray
,Matrix
, andTensor
. Example for max:
// Elementwise maximum Eigen::MatrixXf left, right, r1, r2; // Propagate NaN if either argument is NaN. r1 = left.cwiseMax<PropagateNaN>(right); // Suppress NaN if at least one argument is non NaN. r2 = left.cwiseMax<PropagateNumbers>(right); // Max reductions Eigen::MatrixXf m; float nan_if_any_or_max = m.template maxCoeff<PropagateNaN>(); float nan_if_all_or_max = m.template maxCoeff<PropagateNumbers>();
Changes to unsupported modules
New low-latency non-blocking ThreadPool module
- Originally a part of the Tensor module,
Eigen::ThreadPool
is now separate and more portable, and forms the basis for multi-threading in TensorFlow, for example. Example:
#include <Eigen/CXX11/ThreadPool> const int num_threads = 42; Eigen::ThreadPool tp(num_threads); auto do_stuff = []() { ... }; tp.Schedule(do_stuff);
Changes to Tensor module
- Performance optimizations of Tensor contraction
- Speed up "outer-product-like" operations by parallelizing over the contraction dimension, using thread_local buffers and recursive work splitting.
- Improved threading heuristics.
- Support for fusing element-wise operations into contraction during evaluation. Example:
// Apply Sqrt to all output elements. The optional OutputKernel argument to // contraction in this example is a functor over 2-dimensional. The functor // is called for each output block of the results, to perform the elementwise // sqrt operation while the block is hot in cache. struct SqrtOutputKernel { template <typename Index, typename Scalar> EIGEN_ALWAYS_INLINE void operator()( const internal::blas_data_mapper<Scalar, Index, ColMajor>& output_mapper, const TensorContractionParams&, Index, Index, Index num_rows, Index num_cols) const { for (int i = 0; i < num_rows; ++i) { for (int j = 0; j < num_cols; ++j) { output_mapper(i, j) = std::sqrt(output_mapper(i, j)); } } } }; Tensor<float, 4, DataLayout> left(30, 50, 8, 31); Tensor<float, 5, DataLayout> right(8, 31, 7, 20, 10); Tensor<float, 5, DataLayout> result(30, 50, 7, 20, 10); Eigen::array<DimPair, 2> dims({{DimPair(2, 0), DimPair(3, 1)}}); result = left.contract(right, dims, SqrtOutputKernel());
- Performance optimizations of other Tensor operator
- Added vectorization, block evaluation, and multi-threading for most operators.
- Significant speedup to broadcasting.
- Reduction of index computation overhead, e.g. using fast divisors in TensorGenerator, squeezing dimensions in TensorPadding.
- Complete rewrite of the block (tiling) evaluation framework for tensor expressions lead to significant speedups and reduced number of memory allocations.
- Added new API for asynchronous evaluation of tensor expressions. Example:
Tensor<float, 3> in1(200, 30, 70); Tensor<float, 3> in2(200, 30, 70); Tensor<float, 3> out(200, 30, 70); Eigen::ThreadPool tp(internal::random<int>(3, 11)); Eigen::ThreadPoolDevice thread_pool_device(&tp, internal::random<int>(3, 11)); Eigen::Barrier b(1); auto done = [&b]() { b.Notify(); }; out.device(thread_pool_device, std::move(done)) = in1 + in2 * 3.14f; b.Wait();
- Support for c++03 was officially dropped in Tensor module, since most of the code was written in c++11 anyway.
- Misc. minor behavior changes & fixes:
- Fix const correctness for TensorMap.
- Modify tensor argmin/argmax to always return first occurrence.
- More numerically stable tree reduction.
- Improve randomness of the tensor random generator.
- Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
- Support static dimensions (aka IndexList) in resizing/reshape/broadcast.
- Improved accuracy of Tensor FFT.
Changes to FFT module
- Faster and more accurate twiddle factor computation.
Improvements to EulerAngles
- EulerAngles can now be directly constructed from 3D vectors
- EulerAngles now provide
isApprox()
andcast()
functions
Changes to sparse iterative solvers
- Added new IRDS iterative linear solver.
#include <unsupported/Eigen/IterativeSolvers> A.makeCompressed(); // Recommendation is to compress input before calling sparse solvers. IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A); if (idrs.info() == ComputationInfo::Success) { VectorXf x = idrs.solve(b); }
Improvements to Polynomials
- PolynomialSolver can now be used with complex numbers
- The used solver will automatically choose between
EigenSolver
andComplexEigenSolver
depending on the scalar type used
Other relevant changes
- Eigen now provides an option to test with an external BLAS library
- Eigen can now be used with the PGI Compiler
- Printing when using GDB has been improved
- Eigen can now detect if a platform supports
int128
intrinsics