Eigen2 benchmark Intel

From Eigen
Jump to: navigation, search

Out of curiosity, I have performed BTL tests with Eigen2 compiled with 4 different compilers on Intel Pentium D CPU:

  • GCC 4.3.3: -O3 -march=native -DNDEBUG
  • GCC 4.1.3: -O3 -march=nocona -msse2 -msse3 -DNDEBUG
  • GCC 4.4.0: -O3 -march=native -DNDEBUG
  • Intel(R) C++ 11.0: -O3 -DNDEBUG -no-ipo -xHOST -ip -static -no-prec-div

Although from on my experience the -ipo option (interprocedural optimization) provides good performance benefits, it was explicitly disabled for Intel, because it failed to work (numerically).


Rookie conclusions:

  1. The benefit of using newer GCC versions is pretty clear.
  2. In most cases gcc 4.4 is comparable with gcc 4.3, but in some it's almost 2 times faster. (For my experience gcc 4.2 performs as well as 4.4, and gcc 4.3 is known to miss an optimization in some matrix-scalar products: the copy of the scalar to a four scalar register is not removed out of the inner loop)
  3. Except (anomalous) LU decomposition, gcc 4.1 is nowhere near newer versions of gcc: this is in part because Eigen automatically disable vectorization for gcc < 4.2, but the difference is still huge without that as soon as complex expressions are involved.
  4. Intel C++ does not provide any performance benefits here. This is somewhat surprising as I was expecting at least some advantage on this CPU. That could be due to disabled IPO, though. However, speaking from experience I had with Intel Fortran, -ipo would give about 10-15% speedup. But this can be totally unrelated to C++.



Axpy compare intel.png


Axpby compare intel.png


Atv compare intel.png


Matrix vector compare intel.png


Matrix matrix compare intel.png


Symv compare intel.png


Syr2 compare intel.png


Aat compare intel.png


Ata compare intel.png


Trisolve compare intel.png


Cholesky compare intel.png


Hessenberg compare intel.png


Tridiagonalization compare intel.png


Lu decomp compare intel.png