Difference between revisions of "User:Cantonios/3.4"

From Eigen
Jump to: navigation, search
Line 4: Line 4:
  
 
The 16-bit Brain floating point format[https://en.wikipedia.org/wiki/Bfloat16_floating-point_format] is now available as <code>Eigen::bfloat16</code>.  The constructor must be called explicitly, but it can otherwise be used as any other scalar type.  To convert back-and-forth between <code>uint16_t</code> to extract the bit representation, use <code>Eigen::numext::bit_cast</code>.
 
The 16-bit Brain floating point format[https://en.wikipedia.org/wiki/Bfloat16_floating-point_format] is now available as <code>Eigen::bfloat16</code>.  The constructor must be called explicitly, but it can otherwise be used as any other scalar type.  To convert back-and-forth between <code>uint16_t</code> to extract the bit representation, use <code>Eigen::numext::bit_cast</code>.
 
+
<source lang="cpp">
 
   bfloat16 s(0.25);                                // explicit construction
 
   bfloat16 s(0.25);                                // explicit construction
 
   uint16_t s_bits = numext::bit_cast<uint16_t>(s);  // bit representation
 
   uint16_t s_bits = numext::bit_cast<uint16_t>(s);  // bit representation
Line 10: Line 10:
 
   using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>;
 
   using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>;
 
   MatrixBf16 X = s * MatrixBf16::Random(3, 3);
 
   MatrixBf16 X = s * MatrixBf16::Random(3, 3);
 
+
</source>
 
=== New backends ===
 
=== New backends ===
  
Line 17: Line 17:
  
 
=== Improvements/Cleanups to Core modules ===
 
=== Improvements/Cleanups to Core modules ===
 +
 +
* Dense matrix decompositions and solvers
 +
** SVD implementations now have an <code>info()</code> method for checking convergence.
 +
<source lang="cpp">
 +
  MatrixXf m = MatrixXf::Random(3,2);
 +
  JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV);
 +
  if (svd.info() == ComputationInfo::Success) {
 +
    // SVD computation was successful.
 +
  }
 +
</source>
 +
** Decompositions now fail quickly for detected invalid inputs.
 +
** Fixed aliasing issues with in-place small matrix inversions.
 +
** Fixed several edge-cases with empty or zero inputs.
 +
* Sparse matrix support, decompositions and solvers
 +
** Enable assignment and addition with diagonal matrices.
 +
<source lang="cpp">
 +
  SparseMatrix<float> A(10, 10);
 +
  VectorXf x = VectorXf::Random(10);
 +
  A = x.asDiagonal();
 +
  A += x.asDiagonal();
 +
</source>
 +
** Added new IRDS iterative linear solver.
 +
<source lang="cpp">
 +
  A.makeCompressed();  // Recommendation is to compress input before calling sparse solvers.
 +
  IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A);
 +
  if (idrs.info() == ComputationInfo::Success) {
 +
    VectorXf x = idrs.solve(b);
 +
  }
 +
</source>
 +
** Support added for SuiteSparse KLU routines.
 +
<source lang="cpp">
 +
  A.makeCompressed();  // Recommendation is to compress input before calling sparse solvers.
 +
  KLU<SparseMatrix<T> > klu(A);
 +
  if (klu.info() == ComputationInfo::Success) {
 +
    VectorXf x = klu.solve(b);
 +
  }
 +
</source>
 +
** Various bug fixes and performance improvements in <code>SparseLU</code>, <code>SparseQR</code>, <code>SimplicialLDLT</code>.
  
 
* Improved support for <code>half</code>
 
* Improved support for <code>half</code>
 
** Native support for ARM <code>__fp16</code>, CUDA/HIP <code>__half</code>, Clang <code>F16C</code>.
 
** Native support for ARM <code>__fp16</code>, CUDA/HIP <code>__half</code>, Clang <code>F16C</code>.
 
** Better vectorization support across backends.
 
** Better vectorization support across backends.
 +
* Improved bool support
 +
** Partial vectorization support for boolean operations.
 +
** Significantly improved performance (x25) for logical operations with <code>Matrix</code> or <code>Tensor</code> of <code>bool</code>.
 
* Improved support for custom types
 
* Improved support for custom types
 
** More custom types work out-of-the-box (see #2201[https://gitlab.com/libeigen/eigen/-/issues/2201]).
 
** More custom types work out-of-the-box (see #2201[https://gitlab.com/libeigen/eigen/-/issues/2201]).

Revision as of 21:05, 17 August 2021

New Major Features in Core

  • New support for bfloat16

The 16-bit Brain floating point format[1] is now available as Eigen::bfloat16. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert back-and-forth between uint16_t to extract the bit representation, use Eigen::numext::bit_cast.

  bfloat16 s(0.25);                                 // explicit construction
  uint16_t s_bits = numext::bit_cast<uint16_t>(s);  // bit representation
 
  using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>;
  MatrixBf16 X = s * MatrixBf16::Random(3, 3);

New backends

  • AMD ROCm HIP:
    • Unified with CUDA to create a generic GPU backend for NVIDIA/AMD.

Improvements/Cleanups to Core modules

  • Dense matrix decompositions and solvers
    • SVD implementations now have an info() method for checking convergence.
  MatrixXf m = MatrixXf::Random(3,2);
  JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV);
  if (svd.info() == ComputationInfo::Success) {
    // SVD computation was successful.
  }
    • Decompositions now fail quickly for detected invalid inputs.
    • Fixed aliasing issues with in-place small matrix inversions.
    • Fixed several edge-cases with empty or zero inputs.
  • Sparse matrix support, decompositions and solvers
    • Enable assignment and addition with diagonal matrices.
  SparseMatrix<float> A(10, 10);
  VectorXf x = VectorXf::Random(10);
  A = x.asDiagonal();
  A += x.asDiagonal();
    • Added new IRDS iterative linear solver.
  A.makeCompressed();   // Recommendation is to compress input before calling sparse solvers.
  IDRS<SparseMatrix<float>, DiagonalPreconditioner<float> > idrs(A);
  if (idrs.info() == ComputationInfo::Success) {
    VectorXf x = idrs.solve(b);
  }
    • Support added for SuiteSparse KLU routines.
  A.makeCompressed();   // Recommendation is to compress input before calling sparse solvers.
  KLU<SparseMatrix<T> > klu(A);
  if (klu.info() == ComputationInfo::Success) {
    VectorXf x = klu.solve(b);
  }
    • Various bug fixes and performance improvements in SparseLU, SparseQR, SimplicialLDLT.
  • Improved support for half
    • Native support for ARM __fp16, CUDA/HIP __half, Clang F16C.
    • Better vectorization support across backends.
  • Improved bool support
    • Partial vectorization support for boolean operations.
    • Significantly improved performance (x25) for logical operations with Matrix or Tensor of bool.
  • Improved support for custom types
    • More custom types work out-of-the-box (see #2201[2]).
  • Improved Geometry Module
    • Transform::computeRotationScaling() and Transform::computeScalingRotation() are now more continuous across degeneracies (see !349[3]).
    • New minimal vectorization support.

Backend-specific improvements

  • SSE/AVX/AVX512
    • Enable AVX512 instructions by default if available.
    • New std::complex, half, bfloat16 vectorization support.
    • Better accuracy for several vectorized math functions including exp, log, pow, sqrt.
    • Many missing packet functions added.
  • GPU (CUDA and HIP)
    • Several optimized math functions added, better support for std::complex.
    • Option to disable CUDA entirely by defining EIGEN_NO_CUDA.
    • Many more functions can now be used in device code (e.g. comparisons, matrix inversion).
  • SYCL
    • Redesigned SYCL implementation for use with the Tensor[4] module, which can be enabled by defining EIGEN_USE_SYCL.
    • New generic memory model used by TensorDeviceSycl.
    • Better integration with OpenCL devices.
    • Added many math function specializations.