Difference between revisions of "3.4"

From Eigen
Jump to: navigation, search
(Performance optimizations)
Line 47: Line 47:
 
* Misc
 
* Misc
 
** Add templated <code>subVector<Vertical/Horizonal>(Index)</code> aliases to <code>col/row(Index)</code> methods, and <code>subVectors<>()</code> aliases to <code>rows()/cols()</code>.
 
** Add templated <code>subVector<Vertical/Horizonal>(Index)</code> aliases to <code>col/row(Index)</code> methods, and <code>subVectors<>()</code> aliases to <code>rows()/cols()</code>.
 +
** Add <code>innerVector()</code> and <code>innerVectors()</code> methods.
 
** Add diagmat +/- diagmat operators (bug 520)
 
** Add diagmat +/- diagmat operators (bug 520)
** Add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense. (bug 632)
+
** Add specializations for <code>res ?= dense +/- sparse</code> and <code>res ?= sparse +/- dense</code>. (see bug 632)
 +
** Add support for SuiteSparse's KLU sparse direct solver (LU-based solver tailored for problems coming from circuit simulation).
  
 
===  Performance optimizations ===
 
===  Performance optimizations ===
Line 56: Line 58:
 
MatrixXd Q = A.qr().householderQ();
 
MatrixXd Q = A.qr().householderQ();
 
</source>
 
</source>
* Various optimizations of matrix products for small and medium sizes matrices when using large SIMD registers (e.g., AVX and AVX512).
+
* Various optimizations of matrix products for small and medium sizes when using large SIMD registers (e.g., AVX and AVX512).
* Bug 1562: optimize evaluation of small products of the form s*A*B by rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x..
+
* Optimize evaluation of small products of the form <code>s*A*B</code> by rewriting them as: <code>s*(A.lazyProduct(B))</code> to save a costly temporary. Measured speedup from 2x to 5x (see bug 1562).
* Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks)
+
* Improve multi-threading heuristic for matrix products with a small number of columns.
* Optimize extraction of Q in SparseQR by exploiting the structure of the identity matrix.
+
* Speed-up reductions of sub-matrices.
 +
* Optimize extraction of factor Q in SparseQR.
  
 
===  Hardware support ===
 
===  Hardware support ===
  
 
* Generalization of the CUDA support to CUDA/HIP for AMD GPUs.
 
* Generalization of the CUDA support to CUDA/HIP for AMD GPUs.
* Add explicit support for MSA vectorization engine (MIPS).
+
* Add explicit SIMD support for MSA instruction set (MIPS).
 +
* Add explicit SIMD support for ZVector instruction set (IBM).
 
* AVX512 is enabled by default when enabled on compiler side.
 
* AVX512 is enabled by default when enabled on compiler side.

Revision as of 20:29, 12 November 2018

Raw dump of the main novelties and improvements that will be part of the 3.4 release compared to the 3.3 branch:

New features

  • New versatile API for sub-matrices, slices, and indexed views [doc]. It basically extends A(.,.) to let it accept anything that looks-like a sequence of indices with random access. To make it usable this new feature comes with new symbols: Eigen::all, Eigen::last, and functions generating arithmetic sequences: Eigen::seq(first,last[,incr]), Eigen::seqN(first,size[,incr]), Eigen::lastN(size[,incr]). Here is an example picking even rows but the first and last ones, and a subset of indexed columns:
MatrixXd A = ...;
std::vector<int> col_ind{7,3,4,3};
MatrixXd B = A(seq(2,last-2,fix<2>, col_ind);
  • Reshaped views through the new members reshaped() and reshaped(rows,cols). This feature also comes with new symbols: Eigen::AutoOrder, Eigen::AutoSize. [doc]
  • A new helper Eigen::fix<N> to pass compile-time integer values to Eigen's functions [doc]. It can be used to pass compile-time sizes to .block(...), .segment(...), and all variants, as well as the first, size and increment parameters of the seq, seqN, and lastN functions introduced above. You can also pass "possibly compile-time values" through Eigen::fix<N>(n). Here is an example comparing the old and new way to call .block with fixed sizes:
template<typename MatrixType,int N>
void foo(const MatrixType &A, int i, int j, int n) {
    A.block(i,j,2,3);                         // runtime sizes
    // compile-time nb rows and columns:
    A.template block<2,3>(i,j);               // 3.3 way
    A.block(i,j,fix<2>,fix<3>);               // new 3.4 way
    // compile-time nb rows only:
    A.template block<2,Dynamic>(i,j,2,n);     // 3.3 way
    A.block(i,j,fix<2>,n);                    // new 3.4 way
    // possibly compile-time nb columns
    // (use n if N==Dynamic, otherwise we must have n==N):
    A.template block<2,N>(i,j,2,n);           // 3.3 way
    A.block(i,j,fix<2>,fix<N>(n));            // new 3.4 way
}
  • Add STL-compatible iterators for dense expressions. Some examples:
VectorXd v = ...;
MatrixXd A = ...;
// range for loop over all entries of v then A
for(auto x : v) { cout << x << " "; }
for(auto x : A.reshaped()) { cout << x << " "; }
// sort v then each column of A
std::sort(v.begin(), v.end());
for(auto c : A.colwise())
    std::sort(c.begin(), c.end());
  • A new namespace indexing allowing to exclusively import the subset of functions and symbols that are typically used within A(.,.), that is: all,seq, seqN, lastN, last, lastp1. [doc]
  • Misc
    • Add templated subVector<Vertical/Horizonal>(Index) aliases to col/row(Index) methods, and subVectors<>() aliases to rows()/cols().
    • Add innerVector() and innerVectors() methods.
    • Add diagmat +/- diagmat operators (bug 520)
    • Add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense. (see bug 632)
    • Add support for SuiteSparse's KLU sparse direct solver (LU-based solver tailored for problems coming from circuit simulation).

Performance optimizations

  • Vectorization of partial-reductions along outer-dimension, e.g.: colmajor.rowwise().mean()
  • Speed up evaluation of HouseholderSequence to a dense matrix, e.g.,
    MatrixXd Q = A.qr().householderQ();
  • Various optimizations of matrix products for small and medium sizes when using large SIMD registers (e.g., AVX and AVX512).
  • Optimize evaluation of small products of the form s*A*B by rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x (see bug 1562).
  • Improve multi-threading heuristic for matrix products with a small number of columns.
  • Speed-up reductions of sub-matrices.
  • Optimize extraction of factor Q in SparseQR.

Hardware support

  • Generalization of the CUDA support to CUDA/HIP for AMD GPUs.
  • Add explicit SIMD support for MSA instruction set (MIPS).
  • Add explicit SIMD support for ZVector instruction set (IBM).
  • AVX512 is enabled by default when enabled on compiler side.