Difference between revisions of "3.4"
From Eigen
(→Performance optimizations) |
|||
Line 47: | Line 47: | ||
* Misc | * Misc | ||
** Add templated <code>subVector<Vertical/Horizonal>(Index)</code> aliases to <code>col/row(Index)</code> methods, and <code>subVectors<>()</code> aliases to <code>rows()/cols()</code>. | ** Add templated <code>subVector<Vertical/Horizonal>(Index)</code> aliases to <code>col/row(Index)</code> methods, and <code>subVectors<>()</code> aliases to <code>rows()/cols()</code>. | ||
+ | ** Add <code>innerVector()</code> and <code>innerVectors()</code> methods. | ||
** Add diagmat +/- diagmat operators (bug 520) | ** Add diagmat +/- diagmat operators (bug 520) | ||
− | ** Add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense. (bug 632) | + | ** Add specializations for <code>res ?= dense +/- sparse</code> and <code>res ?= sparse +/- dense</code>. (see bug 632) |
+ | ** Add support for SuiteSparse's KLU sparse direct solver (LU-based solver tailored for problems coming from circuit simulation). | ||
=== Performance optimizations === | === Performance optimizations === | ||
Line 56: | Line 58: | ||
MatrixXd Q = A.qr().householderQ(); | MatrixXd Q = A.qr().householderQ(); | ||
</source> | </source> | ||
− | * Various optimizations of matrix products for small and medium sizes | + | * Various optimizations of matrix products for small and medium sizes when using large SIMD registers (e.g., AVX and AVX512). |
− | * | + | * Optimize evaluation of small products of the form <code>s*A*B</code> by rewriting them as: <code>s*(A.lazyProduct(B))</code> to save a costly temporary. Measured speedup from 2x to 5x (see bug 1562). |
− | * Improve | + | * Improve multi-threading heuristic for matrix products with a small number of columns. |
− | * Optimize extraction of Q in SparseQR | + | * Speed-up reductions of sub-matrices. |
+ | * Optimize extraction of factor Q in SparseQR. | ||
=== Hardware support === | === Hardware support === | ||
* Generalization of the CUDA support to CUDA/HIP for AMD GPUs. | * Generalization of the CUDA support to CUDA/HIP for AMD GPUs. | ||
− | * Add explicit support for MSA | + | * Add explicit SIMD support for MSA instruction set (MIPS). |
+ | * Add explicit SIMD support for ZVector instruction set (IBM). | ||
* AVX512 is enabled by default when enabled on compiler side. | * AVX512 is enabled by default when enabled on compiler side. |
Revision as of 20:29, 12 November 2018
Raw dump of the main novelties and improvements that will be part of the 3.4 release compared to the 3.3 branch:
New features
- New versatile API for sub-matrices, slices, and indexed views [doc]. It basically extends
A(.,.)
to let it accept anything that looks-like a sequence of indices with random access. To make it usable this new feature comes with new symbols:Eigen::all
,Eigen::last
, and functions generating arithmetic sequences:Eigen::seq(first,last[,incr])
,Eigen::seqN(first,size[,incr])
,Eigen::lastN(size[,incr])
. Here is an example picking even rows but the first and last ones, and a subset of indexed columns:
MatrixXd A = ...; std::vector<int> col_ind{7,3,4,3}; MatrixXd B = A(seq(2,last-2,fix<2>, col_ind);
- Reshaped views through the new members
reshaped()
andreshaped(rows,cols)
. This feature also comes with new symbols:Eigen::AutoOrder
,Eigen::AutoSize
. [doc]
- A new helper
Eigen::fix<N>
to pass compile-time integer values to Eigen's functions [doc]. It can be used to pass compile-time sizes to.block(...)
,.segment(...)
, and all variants, as well as the first, size and increment parameters of the seq, seqN, and lastN functions introduced above. You can also pass "possibly compile-time values" throughEigen::fix<N>(n)
. Here is an example comparing the old and new way to call.block
with fixed sizes:
template<typename MatrixType,int N> void foo(const MatrixType &A, int i, int j, int n) { A.block(i,j,2,3); // runtime sizes // compile-time nb rows and columns: A.template block<2,3>(i,j); // 3.3 way A.block(i,j,fix<2>,fix<3>); // new 3.4 way // compile-time nb rows only: A.template block<2,Dynamic>(i,j,2,n); // 3.3 way A.block(i,j,fix<2>,n); // new 3.4 way // possibly compile-time nb columns // (use n if N==Dynamic, otherwise we must have n==N): A.template block<2,N>(i,j,2,n); // 3.3 way A.block(i,j,fix<2>,fix<N>(n)); // new 3.4 way }
- Add STL-compatible iterators for dense expressions. Some examples:
VectorXd v = ...; MatrixXd A = ...; // range for loop over all entries of v then A for(auto x : v) { cout << x << " "; } for(auto x : A.reshaped()) { cout << x << " "; } // sort v then each column of A std::sort(v.begin(), v.end()); for(auto c : A.colwise()) std::sort(c.begin(), c.end());
- A new namespace indexing allowing to exclusively import the subset of functions and symbols that are typically used within
A(.,.)
, that is: all,seq, seqN, lastN, last, lastp1. [doc]
- Misc
- Add templated
subVector<Vertical/Horizonal>(Index)
aliases tocol/row(Index)
methods, andsubVectors<>()
aliases torows()/cols()
. - Add
innerVector()
andinnerVectors()
methods. - Add diagmat +/- diagmat operators (bug 520)
- Add specializations for
res ?= dense +/- sparse
andres ?= sparse +/- dense
. (see bug 632) - Add support for SuiteSparse's KLU sparse direct solver (LU-based solver tailored for problems coming from circuit simulation).
- Add templated
Performance optimizations
- Vectorization of partial-reductions along outer-dimension, e.g.: colmajor.rowwise().mean()
- Speed up evaluation of HouseholderSequence to a dense matrix, e.g.,
MatrixXd Q = A.qr().householderQ();
- Various optimizations of matrix products for small and medium sizes when using large SIMD registers (e.g., AVX and AVX512).
- Optimize evaluation of small products of the form
s*A*B
by rewriting them as:s*(A.lazyProduct(B))
to save a costly temporary. Measured speedup from 2x to 5x (see bug 1562). - Improve multi-threading heuristic for matrix products with a small number of columns.
- Speed-up reductions of sub-matrices.
- Optimize extraction of factor Q in SparseQR.
Hardware support
- Generalization of the CUDA support to CUDA/HIP for AMD GPUs.
- Add explicit SIMD support for MSA instruction set (MIPS).
- Add explicit SIMD support for ZVector instruction set (IBM).
- AVX512 is enabled by default when enabled on compiler side.