Benchmark

From Eigen
Jump to: navigation, search

This page is rather out of date. Please help us to generate and maintain up-to-date benchmarks.

See also the performance monitoring page for benchmark of Eigen along time.

The following benchmark results have been generated using a (heavily) modified version of the Benchmark for Templated Libraries (BTL) from Laurent Plagne. Our modified version can be found in the mercurial repository under eigen/bench/btl. We did our best to make the best use of each library, however, any hints on making a lib working better are welcome. All libs have been configured to use dynamic-size column-major matrices and only one thread. Try it yourself.

Higher is better. By MFLOPS we mean millions of (effective) arithmetic operations per second. The reason why the values are typically low for small sizes, is that in this benchmark we deal with dynamic-size matrices which are relatively inefficient for small sizes. The reason why some libraries/benchmarks show a decline for large sizes, is that for such large matrices issues of CPU cache friendliness become predominant.

Previous benchmarks:

  • August 2008: Eigen 2, includes Eigen w/o vectorization, MKL, Goto, Atlas, ublas, mtl4, blitz, and gmm++.
  • March 2009: Early version of eigen3, includes Eigen w/o vectorization, MKL, Goto, Atlas, and ACML.

Here is the list of the libraries included in the following benchmarks:

  • eigen3: ourselves, with the default options (SSE2 vectorization enabled).
  • eigen2: the previous stable version of Eigen, with the default options (SSE2 vectorization enabled).
  • INTEL_MKL: The Intel Math Kernel Library, which includes a BLAS/LAPACK (11.0). Closed-source.
  • ACML: The AMD's core math library, which includes a BLAS/LAPACK (4.2.0). Closed-source.
  • GOTO: The GOTO BLAS library (2-1.13). This library have been compiled by hand specifically for the penryn architecture.
  • ATLAS: The math-atlas BLAS library (3.8.3). This library has been compiled by hand specifically for the penryn architecture.

23 March 2011

Configuration

  • model name : Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz ( x86_64 )
  • compiler: c++ (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292]


axpy

axpby

matrix_vector

atv

matrix_matrix

aat

trisolve_vector

trisolve_matrix

cholesky

partial_lu_decomp

tridiagonalization

hessenberg

symv

syr2

ger

rot

complete_lu_decomp