# Talk:Benchmark

# Comment by Tvisitor

A few observations:

- some (if not all?) of the benchmark calculations are done in single precission, ideally that should be changed to double
- to calculate flops I would not count + and - operations but only * and /

so for mat*mat operation we would get n*n*n and not 2*n*n*n as used in the source code of the benchmark

double nb_op_base( void ){ return 2.0*_size*_size*_size; }

- ATLAS outperforms eigen on level 3 BLAS: on my system (Core i5 750) I get for mat*mat operation (n*n*n flops, double prec)

n=2000: ATLAS 5250 mflops, eigen 4840 mflops, non-optimised BLAS 860 mflops

- eigen is
**extremely slow**for solving equation systems using LU (2/3*n*n*n flops, double prec)

n=2000: ATLAS threaded 12000 mflops, ATLAS 8870 mflops, eigen 840 mflops, non-optimised BLAS 1960 mflops n=5000: ATLAS threaded 21100 mflops, ATLAS 9700 mflops, eigen 800 mflops

So I'm very disappointed with the LU decomposition performance but otherwise brilliant package!

# Comment by Bjacob

First of all, I don't check wiki updates very often so it's only by chance that I found this! Please use the mailing list or forum.

Are you talking about Eigen 2 or 3 (the development branch)? This benchmark refers to an old state of the development branch, so it's somewhere halfway between Eigen 2 and 3.

To answer your points:

- Lots of people use single precision so I don't know that it should be changed to double. Ideally we'd benchmark all types but that would take long.
- About the flop count, + and - are not any cheaper than * on modern CPUs. / is more expensive but there are much fewer / operations so almost every algorithm spends most of its time in + and *.
- about ATLAS outperforming Eigen on level 3 blas, please try the development branch (which this benchmark is about). These days, both ATLAS and Eigen are very fast i.e. very close to the fastest libs (MKL, Goto)
- About LU performance, are you talking about full-pivoting or partial-pivoting LU ? Don't compare apples and oranges. If you want partial-pivoting LU (it's much faster but less general/reliable) you have it in the development branch.

Please take this discussion to the mailing list if you want to continue it.

# Comment by Tvisitor

Thanks, you are right, I've used dgesv() which does partial pivoting and version 2 of eigen and A.lu().solve which probably did full pivoting so my mflops calculation for eigen was indeed incorrect. With the current development version and A.lu().solve which now does partial pivoting if I understand correctly, I get very fast results indeed:

n=2000: eigen 7700 mflops n=5000: eigen 8700 mflops

Great work!

And sorry, you're right as far as FLOP counts are concerned as well, my fault.

# Comment by Bjacob

Indeed, lu() does full-pivoting in Eigen2 while it does partial-pivoting in Eigen3. In Eigen3 we also have more explicitly named variants fullPivLu() and partialPivLu().