Difference between revisions of "User:Cantonios/3.4"

From Eigen
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 28: Line 28:
 
   }
 
   }
 
</source>
 
</source>
** Decompositions now fail quickly for detected invalid inputs.
+
** Decompositions now fail quickly when invalid inputs are detected.
 
** Fixed aliasing issues with in-place small matrix inversions.
 
** Fixed aliasing issues with in-place small matrix inversions.
 
** Fixed several edge-cases with empty or zero inputs.
 
** Fixed several edge-cases with empty or zero inputs.
 
* Sparse matrix support, decompositions and solvers
 
* Sparse matrix support, decompositions and solvers
** Enable assignment and addition with diagonal matrices.
+
** Enabled assignment and addition with diagonal matrices.
 
<source lang="cpp">
 
<source lang="cpp">
 
   SparseMatrix<float> A(10, 10);
 
   SparseMatrix<float> A(10, 10);
Line 59: Line 59:
  
 
* Improved support for <code>half</code>
 
* Improved support for <code>half</code>
** Native support for ARM <code>__fp16</code>, CUDA/HIP <code>__half</code>, Clang <code>F16C</code>.
+
** Native support added for ARM <code>__fp16</code>, CUDA/HIP <code>__half</code>, Clang <code>F16C</code>.
** Better vectorization support across backends.
+
** Better vectorization support added across all backends.
 
* Improved bool support
 
* Improved bool support
** Partial vectorization support for boolean operations.
+
** Partial vectorization support added for boolean operations.
 
** Significantly improved performance (x25) for logical operations with <code>Matrix</code> or <code>Tensor</code> of <code>bool</code>.
 
** Significantly improved performance (x25) for logical operations with <code>Matrix</code> or <code>Tensor</code> of <code>bool</code>.
 
* Improved support for custom types
 
* Improved support for custom types
Line 68: Line 68:
 
* Improved Geometry Module
 
* Improved Geometry Module
 
** <code>Transform::computeRotationScaling()</code> and <code>Transform::computeScalingRotation()</code> are now more continuous across degeneracies (see !349[https://gitlab.com/libeigen/eigen/-/merge_requests/349]).
 
** <code>Transform::computeRotationScaling()</code> and <code>Transform::computeScalingRotation()</code> are now more continuous across degeneracies (see !349[https://gitlab.com/libeigen/eigen/-/merge_requests/349]).
** New minimal vectorization support.
+
** New minimal vectorization support added for <code>Quaternion</code>.
  
 
=== Backend-specific improvements ===
 
=== Backend-specific improvements ===
 
* SSE/AVX/AVX512
 
* SSE/AVX/AVX512
** Enable AVX512 instructions by default if available.
+
** Enabled AVX512 instructions by default if available.
** New <code>std::complex</code>, <code>half</code>, <code>bfloat16</code> vectorization support.
+
** New <code>std::complex</code>, <code>half</code>, and <code>bfloat16</code> vectorization support added.
 
** Better accuracy for several vectorized math functions including <code>exp</code>, <code>log</code>, <code>pow</code>, <code>sqrt</code>.
 
** Better accuracy for several vectorized math functions including <code>exp</code>, <code>log</code>, <code>pow</code>, <code>sqrt</code>.
 
** Many missing packet functions added.
 
** Many missing packet functions added.
 
* GPU (CUDA and HIP)
 
* GPU (CUDA and HIP)
 
** Several optimized math functions added, better support for <code>std::complex</code>.
 
** Several optimized math functions added, better support for <code>std::complex</code>.
** Option to disable CUDA entirely by defining <code>EIGEN_NO_CUDA</code>.
+
** Added option to disable CUDA entirely by defining <code>EIGEN_NO_CUDA</code>.
 
** Many more functions can now be used in device code (e.g. comparisons, matrix inversion).
 
** Many more functions can now be used in device code (e.g. comparisons, matrix inversion).
 
* ZVector
 
* ZVector
** Vectorized <code>float</code> and <code>std::complex<float></code> support.
+
** Vectorized <code>float</code> and <code>std::complex<float></code> support added.
 
** Added z14 support.
 
** Added z14 support.
 
* SYCL
 
* SYCL
 
** Redesigned SYCL implementation for use with the Tensor[https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html] module, which can be enabled by defining <code>EIGEN_USE_SYCL</code>.
 
** Redesigned SYCL implementation for use with the Tensor[https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html] module, which can be enabled by defining <code>EIGEN_USE_SYCL</code>.
** New generic memory model used by <code>TensorDeviceSycl</code>.
+
** New generic memory model introduced used by <code>TensorDeviceSycl</code>.
 
** Better integration with OpenCL devices.
 
** Better integration with OpenCL devices.
 
** Added many math function specializations.
 
** Added many math function specializations.
Line 98: Line 98:
 
A.setConstant(5, NoChange_t(), 2);  //  5x10 matrix of 3s.
 
A.setConstant(5, NoChange_t(), 2);  //  5x10 matrix of 3s.
 
</source>
 
</source>
* Added <code>setUnit(Index i)</code> for vectors that sets the ''i''th coefficient to one, and all others to zero.
+
* Added <code>setUnit(Index i)</code> for vectors that sets the ''i'' th coefficient to one and all others to zero.
 
* Added <code>transpose()</code>, <code>adjoint()</code>, <code>conjugate()</code> methods to <code>SelfAdjointView</code>.
 
* Added <code>transpose()</code>, <code>adjoint()</code>, <code>conjugate()</code> methods to <code>SelfAdjointView</code>.
* Added <code>shift_left<N></code> and <code>shift_right<N></code> coefficient-wise array functions.
+
* Added <code>shift_left<N>()</code> and <code>shift_right<N>()</code> coefficient-wise array functions.
* Allow adding and subtracting of diagonal matrices.
+
* Enabled adding and subtracting of diagonal matrices.
 
* Allow user-defined default cache sizes via defining <code>EIGEN_DEFAULT_L1_CACHE_SIZE</code>, ..., <code>EIGEN_DEFAULT_L3_CACHE_SIZE</code>.
 
* Allow user-defined default cache sizes via defining <code>EIGEN_DEFAULT_L1_CACHE_SIZE</code>, ..., <code>EIGEN_DEFAULT_L3_CACHE_SIZE</code>.
 
* Added <code>EIGEN_ALIGNOF(X)</code> macro for determining alignment of a provided variable.
 
* Added <code>EIGEN_ALIGNOF(X)</code> macro for determining alignment of a provided variable.
 
* Allow plugins for <code>VectorwiseOp</code> by defining a file <code>EIGEN_VECTORWISEOP_PLUGIN</code> (e.g. <code>-DEIGEN_VECTORWISEOP_PLUGIN=my_vectorwise_op_plugins.h</code>).
 
* Allow plugins for <code>VectorwiseOp</code> by defining a file <code>EIGEN_VECTORWISEOP_PLUGIN</code> (e.g. <code>-DEIGEN_VECTORWISEOP_PLUGIN=my_vectorwise_op_plugins.h</code>).
 
* Allow disabling of IO operations by defining <code>EIGEN_NO_IO</code>.
 
* Allow disabling of IO operations by defining <code>EIGEN_NO_IO</code>.

Latest revision as of 21:51, 17 August 2021

New Major Features in Core

  • New support for bfloat16

The 16-bit Brain floating point format[1] is now available as Eigen::bfloat16. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert back-and-forth between uint16_t to extract the bit representation, use Eigen::numext::bit_cast.

  bfloat16 s(0.25);                                 // explicit construction
  uint16_t s_bits = numext::bit_cast<uint16_t>(s);  // bit representation
 
  using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>;
  MatrixBf16 X = s * MatrixBf16::Random(3, 3);

New backends

  • AMD ROCm HIP:
    • Unified with CUDA to create a generic GPU backend for NVIDIA/AMD.

Improvements/Cleanups to Core modules

  • Dense matrix decompositions and solvers
    • SVD implementations now have an info() method for checking convergence.
  MatrixXf m = MatrixXf::Random(3,2);
  JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV);
  if (svd.info() == ComputationInfo::Success) {
    // SVD computation was successful.
    VectorXf x = svd.solve(b);
  }
    • Decompositions now fail quickly when invalid inputs are detected.
    • Fixed aliasing issues with in-place small matrix inversions.
    • Fixed several edge-cases with empty or zero inputs.
  • Sparse matrix support, decompositions and solvers
    • Enabled assignment and addition with diagonal matrices.
  SparseMatrix<float> A(10, 10);
  VectorXf x = VectorXf::Random(10);
  A = x.asDiagonal();
  A += x.asDiagonal();
    • Added new IRDS iterative linear solver.
  A.makeCompressed();   // Recommendation is to compress input before calling sparse solvers.
  IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A);
  if (idrs.info() == ComputationInfo::Success) {
    VectorXf x = idrs.solve(b);
  }
    • Support added for SuiteSparse KLU routines.
  A.makeCompressed();   // Recommendation is to compress input before calling sparse solvers.
  KLU<SparseMatrix<T> > klu(A);
  if (klu.info() == ComputationInfo::Success) {
    VectorXf x = klu.solve(b);
  }
    • SparseCholesky now works with row-major matrices.
    • Various bug fixes and performance improvements.
  • Improved support for half
    • Native support added for ARM __fp16, CUDA/HIP __half, Clang F16C.
    • Better vectorization support added across all backends.
  • Improved bool support
    • Partial vectorization support added for boolean operations.
    • Significantly improved performance (x25) for logical operations with Matrix or Tensor of bool.
  • Improved support for custom types
    • More custom types work out-of-the-box (see #2201[2]).
  • Improved Geometry Module
    • Transform::computeRotationScaling() and Transform::computeScalingRotation() are now more continuous across degeneracies (see !349[3]).
    • New minimal vectorization support added for Quaternion.

Backend-specific improvements

  • SSE/AVX/AVX512
    • Enabled AVX512 instructions by default if available.
    • New std::complex, half, and bfloat16 vectorization support added.
    • Better accuracy for several vectorized math functions including exp, log, pow, sqrt.
    • Many missing packet functions added.
  • GPU (CUDA and HIP)
    • Several optimized math functions added, better support for std::complex.
    • Added option to disable CUDA entirely by defining EIGEN_NO_CUDA.
    • Many more functions can now be used in device code (e.g. comparisons, matrix inversion).
  • ZVector
    • Vectorized float and std::complex<float> support added.
    • Added z14 support.
  • SYCL
    • Redesigned SYCL implementation for use with the Tensor[4] module, which can be enabled by defining EIGEN_USE_SYCL.
    • New generic memory model introduced used by TensorDeviceSycl.
    • Better integration with OpenCL devices.
    • Added many math function specializations.

Miscellaneous API Changes

  • New setOnes() method for filling a dense matrix with ones.
  • New setConstant(NoChange_t, Index, T) methods for preserving one dimension of a matrix.
MatrixXf A(10, 5);
A.setOnes();                         // 10x5  matrix of 1s
A.setConstant(NoChange_t(), 10, 1);  // 10x10 matrix of 2s.
A.setConstant(5, NoChange_t(), 2);   //  5x10 matrix of 3s.
  • Added setUnit(Index i) for vectors that sets the i th coefficient to one and all others to zero.
  • Added transpose(), adjoint(), conjugate() methods to SelfAdjointView.
  • Added shift_left<N>() and shift_right<N>() coefficient-wise array functions.
  • Enabled adding and subtracting of diagonal matrices.
  • Allow user-defined default cache sizes via defining EIGEN_DEFAULT_L1_CACHE_SIZE, ..., EIGEN_DEFAULT_L3_CACHE_SIZE.
  • Added EIGEN_ALIGNOF(X) macro for determining alignment of a provided variable.
  • Allow plugins for VectorwiseOp by defining a file EIGEN_VECTORWISEOP_PLUGIN (e.g. -DEIGEN_VECTORWISEOP_PLUGIN=my_vectorwise_op_plugins.h).
  • Allow disabling of IO operations by defining EIGEN_NO_IO.