Eigen2 benchmark Intel

From Eigen
Jump to: navigation, search

Out of curiosity, I have performed BTL tests with Eigen2 compiled with 4 different compilers on Intel Pentium D CPU:

  • GCC 4.3.3: -O3 -march=native -DNDEBUG
  • GCC 4.1.3: -O3 -march=nocona -msse2 -msse3 -DNDEBUG
  • GCC 4.4.0: -O3 -march=native -DNDEBUG
  • Intel(R) C++ 11.0: -O3 -DNDEBUG -no-ipo -xHOST -ip -static -no-prec-div

Although from on my experience the -ipo option (interprocedural optimization) provides good performance benefits, it was explicitly disabled for Intel, because it failed to work (numerically).

Rookie conclusions:

  1. The benefit of using newer GCC versions is pretty clear.
  2. In most cases gcc 4.4 is comparable with gcc 4.3, but in some it's almost 2 times faster. (For my experience gcc 4.2 performs as well as 4.4, and gcc 4.3 is known to miss an optimization in some matrix-scalar products: the copy of the scalar to a four scalar register is not removed out of the inner loop)
  3. Except (anomalous) LU decomposition, gcc 4.1 is nowhere near newer versions of gcc: this is in part because Eigen automatically disable vectorization for gcc < 4.2, but the difference is still huge without that as soon as complex expressions are involved.
  4. Intel C++ does not provide any performance benefits here. This is somewhat surprising as I was expecting at least some advantage on this CPU. That could be due to disabled IPO, though. However, speaking from experience I had with Intel Fortran, -ipo would give about 10-15% speedup. But this can be totally unrelated to C++.

Axpy compare intel.png

Axpby compare intel.png

Atv compare intel.png

Matrix vector compare intel.png

Matrix matrix compare intel.png

Symv compare intel.png

Syr2 compare intel.png

Aat compare intel.png

Ata compare intel.png

Trisolve compare intel.png

Cholesky compare intel.png

Hessenberg compare intel.png

Tridiagonalization compare intel.png

Lu decomp compare intel.png