Difference between revisions of "User:Cantonios/3.4"
From Eigen
Line 1:  Line 1:  
+  === New Major Features in Core ===  
+  
* New support for <code>bfloat16</code>  * New support for <code>bfloat16</code>  
Line 9:  Line 11:  
MatrixBf16 X = s * MatrixBf16::Random(3, 3);  MatrixBf16 X = s * MatrixBf16::Random(3, 3);  
−  +  === New backends ===  
−  *  +  
+  * AMD ROCm HIP:  
+  ** Unified with CUDA to create a generic GPU backend for NVIDIA/AMD.  
+  
+  === Improvements/Cleanups to Core modules ===  
−  +  * Improved support for <code>half</code>  
−  +  ** Native support for ARM <code>__fp16</code>, CUDA/HIP <code>__half</code>, Clang <code>F16C</code>.  
−  +  ** Better vectorization support across backends.  
−  +  * Improved support for custom types  
−  +  ** More custom types work outofthebox (see #2201[https://gitlab.com/libeigen/eigen//issues/2201]).  
−  +  * Improved Geometry Module  
−  +  ** <code>Transform::computeRotationScaling()</code> and <code>Transform::computeScalingRotation()</code> are now more continuous across degeneracies (see !349[https://gitlab.com/libeigen/eigen//merge_requests/349]).  
−  +  ** New minimal vectorization support.  
−  +  
−  +  === Backendspecific improvements ===  
−  +  * SSE/AVX/AVX512  
−  +  ** Enable AVX512 instructions by default if available.  
−  +  ** New <code>std::complex</code>, <code>half</code>, <code>bfloat16</code> vectorization support.  
−  +  ** Better accuracy for several vectorized math functions including <code>exp</code>, <code>log</code>, <code>pow</code>, <code>sqrt</code>.  
−  +  ** Many missing packet functions added.  
−  +  * GPU (CUDA and HIP)  
−  +  ** Several optimized math functions added, better support for <code>std::complex</code>.  
−  +  ** Option to disable CUDA entirely by defining <code>EIGEN_NO_CUDA</code>.  
−  +  ** Many more functions can now be used in device code (e.g. comparisons, matrix inversion).  
−  +  * SYCL  
−  +  ** Redesigned SYCL implementation for use with the Tensor[https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html] module, which can be enabled by defining <code>EIGEN_USE_SYCL</code>.  
−  +  ** New generic memory model used by <code>TensorDeviceSycl</code>.  
−  +  ** Better integration with OpenCL devices.  
−  +  ** Added many math function specializations. 
Revision as of 20:09, 17 August 2021
Contents
New Major Features in Core
 New support for
bfloat16
The 16bit Brain floating point format[1] is now available as Eigen::bfloat16
. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert backandforth between uint16_t
to extract the bit representation, use Eigen::numext::bit_cast
.
bfloat16 s(0.25); // explicit construction uint16_t s_bits = numext::bit_cast<uint16_t>(s); // bit representation using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>; MatrixBf16 X = s * MatrixBf16::Random(3, 3);
New backends
 AMD ROCm HIP:
 Unified with CUDA to create a generic GPU backend for NVIDIA/AMD.
Improvements/Cleanups to Core modules
 Improved support for
half
 Native support for ARM
__fp16
, CUDA/HIP__half
, ClangF16C
.  Better vectorization support across backends.
 Native support for ARM
 Improved support for custom types
 More custom types work outofthebox (see #2201[2]).
 Improved Geometry Module

Transform::computeRotationScaling()
andTransform::computeScalingRotation()
are now more continuous across degeneracies (see !349[3]).  New minimal vectorization support.

Backendspecific improvements
 SSE/AVX/AVX512
 Enable AVX512 instructions by default if available.
 New
std::complex
,half
,bfloat16
vectorization support.  Better accuracy for several vectorized math functions including
exp
,log
,pow
,sqrt
.  Many missing packet functions added.
 GPU (CUDA and HIP)
 Several optimized math functions added, better support for
std::complex
.  Option to disable CUDA entirely by defining
EIGEN_NO_CUDA
.  Many more functions can now be used in device code (e.g. comparisons, matrix inversion).
 Several optimized math functions added, better support for
 SYCL
 Redesigned SYCL implementation for use with the Tensor[4] module, which can be enabled by defining
EIGEN_USE_SYCL
.  New generic memory model used by
TensorDeviceSycl
.  Better integration with OpenCL devices.
 Added many math function specializations.
 Redesigned SYCL implementation for use with the Tensor[4] module, which can be enabled by defining