Difference between revisions of "User:Cantonios/3.4"
From Eigen
Line 1: | Line 1: | ||
+ | === New Major Features in Core === | ||
+ | |||
* New support for <code>bfloat16</code> | * New support for <code>bfloat16</code> | ||
Line 9: | Line 11: | ||
MatrixBf16 X = s * MatrixBf16::Random(3, 3); | MatrixBf16 X = s * MatrixBf16::Random(3, 3); | ||
− | + | === New backends === | |
− | * | + | |
+ | * AMD ROCm HIP: | ||
+ | ** Unified with CUDA to create a generic GPU backend for NVIDIA/AMD. | ||
+ | |||
+ | === Improvements/Cleanups to Core modules === | ||
− | + | * Improved support for <code>half</code> | |
− | + | ** Native support for ARM <code>__fp16</code>, CUDA/HIP <code>__half</code>, Clang <code>F16C</code>. | |
− | + | ** Better vectorization support across backends. | |
− | + | * Improved support for custom types | |
− | + | ** More custom types work out-of-the-box (see #2201[https://gitlab.com/libeigen/eigen/-/issues/2201]). | |
− | + | * Improved Geometry Module | |
− | + | ** <code>Transform::computeRotationScaling()</code> and <code>Transform::computeScalingRotation()</code> are now more continuous across degeneracies (see !349[https://gitlab.com/libeigen/eigen/-/merge_requests/349]). | |
− | + | ** New minimal vectorization support. | |
− | + | ||
− | + | === Backend-specific improvements === | |
− | + | * SSE/AVX/AVX512 | |
− | + | ** Enable AVX512 instructions by default if available. | |
− | + | ** New <code>std::complex</code>, <code>half</code>, <code>bfloat16</code> vectorization support. | |
− | + | ** Better accuracy for several vectorized math functions including <code>exp</code>, <code>log</code>, <code>pow</code>, <code>sqrt</code>. | |
− | + | ** Many missing packet functions added. | |
− | + | * GPU (CUDA and HIP) | |
− | + | ** Several optimized math functions added, better support for <code>std::complex</code>. | |
− | + | ** Option to disable CUDA entirely by defining <code>EIGEN_NO_CUDA</code>. | |
− | + | ** Many more functions can now be used in device code (e.g. comparisons, matrix inversion). | |
− | + | * SYCL | |
− | + | ** Redesigned SYCL implementation for use with the Tensor[https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html] module, which can be enabled by defining <code>EIGEN_USE_SYCL</code>. | |
− | + | ** New generic memory model used by <code>TensorDeviceSycl</code>. | |
− | + | ** Better integration with OpenCL devices. | |
− | + | ** Added many math function specializations. |
Revision as of 20:09, 17 August 2021
Contents
New Major Features in Core
- New support for
bfloat16
The 16-bit Brain floating point format[1] is now available as Eigen::bfloat16
. The constructor must be called explicitly, but it can otherwise be used as any other scalar type. To convert back-and-forth between uint16_t
to extract the bit representation, use Eigen::numext::bit_cast
.
bfloat16 s(0.25); // explicit construction uint16_t s_bits = numext::bit_cast<uint16_t>(s); // bit representation using MatrixBf16 = Matrix<bfloat16, Dynamic, Dynamic>; MatrixBf16 X = s * MatrixBf16::Random(3, 3);
New backends
- AMD ROCm HIP:
- Unified with CUDA to create a generic GPU backend for NVIDIA/AMD.
Improvements/Cleanups to Core modules
- Improved support for
half
- Native support for ARM
__fp16
, CUDA/HIP__half
, ClangF16C
. - Better vectorization support across backends.
- Native support for ARM
- Improved support for custom types
- More custom types work out-of-the-box (see #2201[2]).
- Improved Geometry Module
-
Transform::computeRotationScaling()
andTransform::computeScalingRotation()
are now more continuous across degeneracies (see !349[3]). - New minimal vectorization support.
-
Backend-specific improvements
- SSE/AVX/AVX512
- Enable AVX512 instructions by default if available.
- New
std::complex
,half
,bfloat16
vectorization support. - Better accuracy for several vectorized math functions including
exp
,log
,pow
,sqrt
. - Many missing packet functions added.
- GPU (CUDA and HIP)
- Several optimized math functions added, better support for
std::complex
. - Option to disable CUDA entirely by defining
EIGEN_NO_CUDA
. - Many more functions can now be used in device code (e.g. comparisons, matrix inversion).
- Several optimized math functions added, better support for
- SYCL
- Redesigned SYCL implementation for use with the Tensor[4] module, which can be enabled by defining
EIGEN_USE_SYCL
. - New generic memory model used by
TensorDeviceSycl
. - Better integration with OpenCL devices.
- Added many math function specializations.
- Redesigned SYCL implementation for use with the Tensor[4] module, which can be enabled by defining