Difference between revisions of "User:Cantonios/3.4"

From Eigen
Jump to: navigation, search
Line 1: Line 1:
 +
=== New Major Features in Core ===
 +
 
* New support for <code>bfloat16</code>
 
* New support for <code>bfloat16</code>
  
Line 9: Line 11:
 
   MatrixBf16 X = s * MatrixBf16::Random(3, 3);
 
   MatrixBf16 X = s * MatrixBf16::Random(3, 3);
  
* New backends
+
=== New backends ===
** HIP: added support for AMD ROCm HIP, unified with the previously existing CUDA code for a generic GPU backend.
+
 
 +
* AMD ROCm HIP:
 +
** Unified with CUDA to create a generic GPU backend for NVIDIA/AMD.
 +
 
 +
=== Improvements/Cleanups to Core modules ===
  
* Improvements/Cleanups to Core modules
+
* Improved support for <code>half</code>
** Improved support for <code>half</code>
+
** Native support for ARM <code>__fp16</code>, CUDA/HIP <code>__half</code>, Clang <code>F16C</code>.
*** Native support for ARM <code>__fp16</code>, CUDA/HIP <code>__half</code>, Clang <code>F16C</code>.
+
** Better vectorization support across backends.
*** Better vectorization support across backends.
+
* Improved support for custom types
** Improved support for custom types
+
** More custom types work out-of-the-box (see #2201[https://gitlab.com/libeigen/eigen/-/issues/2201]).
*** More custom types work out-of-the-box (see #2201[https://gitlab.com/libeigen/eigen/-/issues/2201]).
+
* Improved Geometry Module
** Improved Geometry Module
+
** <code>Transform::computeRotationScaling()</code> and <code>Transform::computeScalingRotation()</code> are now more continuous across degeneracies (see !349[https://gitlab.com/libeigen/eigen/-/merge_requests/349]).
*** <code>Transform::computeRotationScaling()</code> and <code>Transform::computeScalingRotation()</code> are now more continuous across degeneracies (see !349[https://gitlab.com/libeigen/eigen/-/merge_requests/349]).
+
** New minimal vectorization support.
*** New minimal vectorization support.
+
  
* Backend-specific improvements
+
=== Backend-specific improvements ===
** SSE/AVX/AVX512
+
* SSE/AVX/AVX512
*** Enable AVX512 instructions by default if available.
+
** Enable AVX512 instructions by default if available.
*** New <code>std::complex</code>, <code>half</code>, <code>bfloat16</code> vectorization support.
+
** New <code>std::complex</code>, <code>half</code>, <code>bfloat16</code> vectorization support.
*** Better accuracy for several vectorized math functions including <code>exp</code>, <code>log</code>, <code>pow</code>, <code>sqrt</code>.
+
** Better accuracy for several vectorized math functions including <code>exp</code>, <code>log</code>, <code>pow</code>, <code>sqrt</code>.
*** Many missing packet functions added.
+
** Many missing packet functions added.
** GPU (CUDA and HIP)
+
* GPU (CUDA and HIP)
*** Several optimized math functions added, better support for <code>std::complex</code>.
+
** Several optimized math functions added, better support for <code>std::complex</code>.
*** Option to disable CUDA entirely by defining <code>EIGEN_NO_CUDA</code>.
+
** Option to disable CUDA entirely by defining <code>EIGEN_NO_CUDA</code>.
*** Many more functions can now be used in device code (e.g. comparisons, matrix inversion).
+
** Many more functions can now be used in device code (e.g. comparisons, matrix inversion).
** SYCL
+
* SYCL
*** Redesigned SYCL implementation for use with the Tensor[https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html] module, which can be enabled by defining <code>EIGEN_USE_SYCL</code>.
+
** Redesigned SYCL implementation for use with the Tensor[https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html] module, which can be enabled by defining <code>EIGEN_USE_SYCL</code>.
*** New generic memory model used by <code>TensorDeviceSycl</code>.
+
** New generic memory model used by <code>TensorDeviceSycl</code>.
*** Better integration with OpenCL devices.
+
** Better integration with OpenCL devices.
*** Added many math function specializations.
+
** Added many math function specializations.

Revision as of 20:09, 17 August 2021

New Major Features in Core

  • New support for bfloat16

The 16-bit Brain floating point format[1] is now available as Eigen::bfloat16. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert back-and-forth between uint16_t to extract the bit representation, use Eigen::numext::bit_cast.

 bfloat16 s(0.25);                                 // explicit construction
 uint16_t s_bits = numext::bit_cast<uint16_t>(s);  // bit representation
 
 using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>;
 MatrixBf16 X = s * MatrixBf16::Random(3, 3);

New backends

  • AMD ROCm HIP:
    • Unified with CUDA to create a generic GPU backend for NVIDIA/AMD.

Improvements/Cleanups to Core modules

  • Improved support for half
    • Native support for ARM __fp16, CUDA/HIP __half, Clang F16C.
    • Better vectorization support across backends.
  • Improved support for custom types
    • More custom types work out-of-the-box (see #2201[2]).
  • Improved Geometry Module
    • Transform::computeRotationScaling() and Transform::computeScalingRotation() are now more continuous across degeneracies (see !349[3]).
    • New minimal vectorization support.

Backend-specific improvements

  • SSE/AVX/AVX512
    • Enable AVX512 instructions by default if available.
    • New std::complex, half, bfloat16 vectorization support.
    • Better accuracy for several vectorized math functions including exp, log, pow, sqrt.
    • Many missing packet functions added.
  • GPU (CUDA and HIP)
    • Several optimized math functions added, better support for std::complex.
    • Option to disable CUDA entirely by defining EIGEN_NO_CUDA.
    • Many more functions can now be used in device code (e.g. comparisons, matrix inversion).
  • SYCL
    • Redesigned SYCL implementation for use with the Tensor[4] module, which can be enabled by defining EIGEN_USE_SYCL.
    • New generic memory model used by TensorDeviceSycl.
    • Better integration with OpenCL devices.
    • Added many math function specializations.