Difference between revisions of "3.4"
From Eigen
(→Performance optimizations) |
(→Hardware supports) |
||
Line 40: | Line 40: | ||
</source> | </source> | ||
− | === Hardware | + | === Hardware support === |
* Generalization of the CUDA support to CUDA/HIP for AMD GPUs. | * Generalization of the CUDA support to CUDA/HIP for AMD GPUs. | ||
* Add explicit support for MSA vectorization engine (MIPS). | * Add explicit support for MSA vectorization engine (MIPS). | ||
* AVX512 is enabled by default when enabled on compiler side. | * AVX512 is enabled by default when enabled on compiler side. |
Revision as of 11:37, 11 November 2018
Raw dump of the main novelties and improvements that will be part of the 3.4 release compared to the 3.3 branch:
New features
- New versatile API for sub-matrices, slices, and indexed views [doc]. It basically extends
A(.,.)
to let it accept anything that looks-like a sequence of indices with random access. To make it usable this new feature comes with new symbols:Eigen::all
,Eigen::last
, and functions generating arithmetic sequences:Eigen::seq(first,last[,incr])
,Eigen::seqN(first,size[,incr])
,Eigen::lastN(size[,incr])
. Here is an example picking even rows but the first and last ones, and a subset of indexed columns:
MatrixXd A = ...; std::vector<int> col_ind{7,3,4,3}; MatrixXd B = A(seq(2,last-2,fix<2>, col_ind);
- Reshaped views through the new members
reshaped()
andreshaped(rows,cols)
. This feature also comes with new symbols:Eigen::AutoOrder
,Eigen::AutoSize
. [doc]
- A new helper
Eigen::fix<N>
to pass compile-time integer values to Eigen's functions [doc]. It can be used to pass compile-time sizes to.block(...)
,.segment(...)
, and all variants, as well as the first, size and increment parameters of the seq, seqN, and lastN functions introduced above. You can also pass "possibly compile-time values" throughEigen::fix<N>(n)
. Here is an example comparing the old and new way to call.block
with fixed sizes:
template<typename MatrixType,int N> void foo(const MatrixType &A, int i, int j, int n) { A.block(i,j,2,3); // runtime sizes // compile-time nb rows and columns: A.template block<2,3>(i,j); // 3.3 way A.block(i,j,fix<2>,fix<3>); // new 3.4 way // compile-time nb rows only: A.template block<2,Dynamic>(i,j,2,n); // 3.3 way A.block(i,j,fix<2>,n); // new 3.4 way // possibly compile-time nb columns // (use n if N==Dynamic, otherwise we must have n==N): A.template block<2,N>(i,j,2,n); // 3.3 way A.block(i,j,fix<2>,fix<N>(n)); // new 3.4 way }
- A new namespace indexing allowing to exclusively import the subset of functions and symbols that are typically used within
A(.,.)
, that is: all,seq, seqN, lastN, last, lastp1. [doc]
Performance optimizations
- Vectorization of partial-reductions along outer-dimension, e.g.: colmajor.rowwise().mean()
- Speed up evaluation of HouseholderSequence to a dense matrix, e.g.,
MatrixXd Q = A.qr().householderQ(); * Various optimizations of matrix products for small and medium sizes matrices when using large SIMD registers (e.g., AVX and AVX512).
Hardware support
- Generalization of the CUDA support to CUDA/HIP for AMD GPUs.
- Add explicit support for MSA vectorization engine (MIPS).
- AVX512 is enabled by default when enabled on compiler side.