Benchmark
The following benchmark results have been generated using a (heavily) modified version of the Benchmark for Templated Libraries (BTL) from Laurent Plagne. Our modified version can be found in the mercurial repository under eigen/bench/btl. We did our best to make the best use of each library, however, any hints on making a lib working better are welcome. All libs have been configured to use dynamic-size column-major matrices and only one thread. Try it yourself.
Higher is better. By MFLOPS we mean millions of (effective) arithmetic operations per second. The reason why the values are typically low for small sizes, is that in this benchmark we deal with dynamic-size matrices which are relatively inefficient for small sizes. The reason why some libraries/benchmarks show a decline for large sizes, is that for such large matrices issues of CPU cache friendliness become predominant.
Previous benchmarks:
- August 2008: Eigen 2, includes Eigen w/o vectorization, MKL, Goto, Atlas, ublas, mtl4, blitz, and gmm++.
- March 2009: Early version of eigen3, includes Eigen w/o vectorization, MKL, Goto, Atlas, and ACML.
Here is the list of the libraries included in the following benchmarks:
- eigen3: ourselves, with the default options (SSE2 vectorization enabled).
- eigen2: the previous stable version of Eigen, with the default options (SSE2 vectorization enabled).
- INTEL_MKL: The Intel Math Kernel Library, which includes a BLAS/LAPACK (11.0). Closed-source.
- ACML: The AMD's core math library, which includes a BLAS/LAPACK (4.2.0). Closed-source.
- GOTO: The GOTO BLAS library (2-1.13). This library have been compiled by hand specifically for the penryn architecture.
- ATLAS: The math-atlas BLAS library (3.8.3). This library has been compiled by hand specifically for the penryn architecture.
23 March 2011
Configuration
- model name : Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz ( x86_64 )
- compiler: c++ (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292]