User:Cantonios/3.4
From Eigen
Contents
New Major Features in Core
 New support for
bfloat16
The 16bit Brain floating point format[1] is now available as Eigen::bfloat16
. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert backandforth between uint16_t
to extract the bit representation, use Eigen::numext::bit_cast
.
bfloat16 s(0.25); // explicit construction uint16_t s_bits = numext::bit_cast<uint16_t>(s); // bit representation using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>; MatrixBf16 X = s * MatrixBf16::Random(3, 3);
New backends
 AMD ROCm HIP:
 Unified with CUDA to create a generic GPU backend for NVIDIA/AMD.
Improvements/Cleanups to Core modules
 Dense matrix decompositions and solvers
 SVD implementations now have an
info()
method for checking convergence.
 SVD implementations now have an
MatrixXf m = MatrixXf::Random(3,2); JacobiSVD<MatrixXf> svd(m, ComputeThinU  ComputeThinV); if (svd.info() == ComputationInfo::Success) { // SVD computation was successful. }
 Decompositions now fail quickly for detected invalid inputs.
 Fixed aliasing issues with inplace small matrix inversions.
 Fixed several edgecases with empty or zero inputs.
 Sparse matrix support, decompositions and solvers
 Enable assignment and addition with diagonal matrices.
SparseMatrix<float> A(10, 10); VectorXf x = VectorXf::Random(10); A = x.asDiagonal(); A += x.asDiagonal();
 Added new IRDS iterative linear solver.
A.makeCompressed(); // Recommendation is to compress input before calling sparse solvers. IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A); if (idrs.info() == ComputationInfo::Success) { VectorXf x = idrs.solve(b); }
 Support added for SuiteSparse KLU routines.
A.makeCompressed(); // Recommendation is to compress input before calling sparse solvers. KLU<SparseMatrix<T> > klu(A); if (klu.info() == ComputationInfo::Success) { VectorXf x = klu.solve(b); }
 Various bug fixes and performance improvements in
SparseLU
,SparseQR
,SimplicialLDLT
.
 Various bug fixes and performance improvements in
 Improved support for
half
 Native support for ARM
__fp16
, CUDA/HIP__half
, ClangF16C
.  Better vectorization support across backends.
 Native support for ARM
 Improved bool support
 Partial vectorization support for boolean operations.
 Significantly improved performance (x25) for logical operations with
Matrix
orTensor
ofbool
.
 Improved support for custom types
 More custom types work outofthebox (see #2201[2]).
 Improved Geometry Module

Transform::computeRotationScaling()
andTransform::computeScalingRotation()
are now more continuous across degeneracies (see !349[3]).  New minimal vectorization support.

Backendspecific improvements
 SSE/AVX/AVX512
 Enable AVX512 instructions by default if available.
 New
std::complex
,half
,bfloat16
vectorization support.  Better accuracy for several vectorized math functions including
exp
,log
,pow
,sqrt
.  Many missing packet functions added.
 GPU (CUDA and HIP)
 Several optimized math functions added, better support for
std::complex
.  Option to disable CUDA entirely by defining
EIGEN_NO_CUDA
.  Many more functions can now be used in device code (e.g. comparisons, matrix inversion).
 Several optimized math functions added, better support for
 SYCL
 Redesigned SYCL implementation for use with the Tensor[4] module, which can be enabled by defining
EIGEN_USE_SYCL
.  New generic memory model used by
TensorDeviceSycl
.  Better integration with OpenCL devices.
 Added many math function specializations.
 Redesigned SYCL implementation for use with the Tensor[4] module, which can be enabled by defining