Difference between revisions of "3.4"
From Eigen
(→Hardware support) |
(→Performance optimizations) |
||
Line 65: | Line 65: | ||
* Optimize extraction of factor Q in SparseQR. | * Optimize extraction of factor Q in SparseQR. | ||
* SIMD implementations of math functions (exp,log,sin,cos) have been unified as a generic implementation compatible over all supported SIMD engines (SSE,AVX,AVX512,NEON,Altivec,VSX,MSA). | * SIMD implementations of math functions (exp,log,sin,cos) have been unified as a generic implementation compatible over all supported SIMD engines (SSE,AVX,AVX512,NEON,Altivec,VSX,MSA). | ||
+ | * Workaround a performance regression in matrix product with gcc>=6.0 and SSE/AVX only (no-fma). We are still working on a similar issue with clang>=6.0 and AVX+FMA. | ||
=== Hardware support === | === Hardware support === |
Revision as of 08:33, 7 December 2018
Raw dump of the main novelties and improvements that will be part of the 3.4 release compared to the 3.3 branch:
New features
- New versatile API for sub-matrices, slices, and indexed views [doc]. It basically extends
A(.,.)
to let it accept anything that looks-like a sequence of indices with random access. To make it usable this new feature comes with new symbols:Eigen::all
,Eigen::last
, and functions generating arithmetic sequences:Eigen::seq(first,last[,incr])
,Eigen::seqN(first,size[,incr])
,Eigen::lastN(size[,incr])
. Here is an example picking even rows but the first and last ones, and a subset of indexed columns:
MatrixXd A = ...; std::vector<int> col_ind{7,3,4,3}; MatrixXd B = A(seq(2,last-2,fix<2>), col_ind);
- Reshaped views through the new members
reshaped()
andreshaped(rows,cols)
. This feature also comes with new symbols:Eigen::AutoOrder
,Eigen::AutoSize
. [doc]
- A new helper
Eigen::fix<N>
to pass compile-time integer values to Eigen's functions [doc]. It can be used to pass compile-time sizes to.block(...)
,.segment(...)
, and all variants, as well as the first, size and increment parameters of the seq, seqN, and lastN functions introduced above. You can also pass "possibly compile-time values" throughEigen::fix<N>(n)
. Here is an example comparing the old and new way to call.block
with fixed sizes:
template<typename MatrixType,int N> void foo(const MatrixType &A, int i, int j, int n) { A.block(i,j,2,3); // runtime sizes // compile-time nb rows and columns: A.template block<2,3>(i,j); // 3.3 way A.block(i,j,fix<2>,fix<3>); // new 3.4 way // compile-time nb rows only: A.template block<2,Dynamic>(i,j,2,n); // 3.3 way A.block(i,j,fix<2>,n); // new 3.4 way // possibly compile-time nb columns // (use n if N==Dynamic, otherwise we must have n==N): A.template block<2,N>(i,j,2,n); // 3.3 way A.block(i,j,fix<2>,fix<N>(n)); // new 3.4 way }
- Add STL-compatible iterators for dense expressions [doc]. Some examples:
VectorXd v = ...; MatrixXd A = ...; // range for loop over all entries of v then A for(auto x : v) { cout << x << " "; } for(auto x : A.reshaped()) { cout << x << " "; } // sort v then each column of A std::sort(v.begin(), v.end()); for(auto c : A.colwise()) std::sort(c.begin(), c.end());
- A new namespace indexing allowing to exclusively import the subset of functions and symbols that are typically used within
A(.,.)
, that is: all,seq, seqN, lastN, last, lastp1. [doc]
- Misc
- Add templated
subVector<Vertical/Horizonal>(Index)
aliases tocol/row(Index)
methods, andsubVectors<>()
aliases torows()/cols()
. - Add
innerVector()
andinnerVectors()
methods. - Add diagmat +/- diagmat operators (bug 520)
- Add specializations for
res ?= dense +/- sparse
andres ?= sparse +/- dense
. (see bug 632) - Add support for SuiteSparse's KLU sparse direct solver (LU-based solver tailored for problems coming from circuit simulation).
- Add templated
Performance optimizations
- Vectorization of partial-reductions along outer-dimension, e.g.: colmajor.rowwise().mean()
- Speed up evaluation of HouseholderSequence to a dense matrix, e.g.,
MatrixXd Q = A.qr().householderQ();
- Various optimizations of matrix products for small and medium sizes when using large SIMD registers (e.g., AVX and AVX512).
- Optimize evaluation of small products of the form
s*A*B
by rewriting them as:s*(A.lazyProduct(B))
to save a costly temporary. Measured speedup from 2x to 5x (see bug 1562). - Improve multi-threading heuristic for matrix products with a small number of columns.
- 20% speedup of matrix products on ARM64
- Speed-up reductions of sub-matrices.
- Optimize extraction of factor Q in SparseQR.
- SIMD implementations of math functions (exp,log,sin,cos) have been unified as a generic implementation compatible over all supported SIMD engines (SSE,AVX,AVX512,NEON,Altivec,VSX,MSA).
- Workaround a performance regression in matrix product with gcc>=6.0 and SSE/AVX only (no-fma). We are still working on a similar issue with clang>=6.0 and AVX+FMA.
Hardware support
- AVX512 support is now complete (including complex scalars) and enabled by default when enabled on compiler side.
- Generalization of the CUDA support to CUDA/HIP for AMD GPUs.
- Add explicit SIMD support for MSA instruction set (MIPS).
- Add explicit SIMD support for ZVector instruction set (IBM).