Summary: | bigMat.block(...).array()*mat.array() is slower than mat.array()*mat.array() | ||||||
---|---|---|---|---|---|---|---|
Product: | Eigen | Reporter: | Philippe Marti <philippe.marti> | ||||
Component: | Core - expression templates | Assignee: | Nobody <eigen.nobody> | ||||
Status: | NEW --- | ||||||
Severity: | Unknown | CC: | gael.guennebaud, jacob.benoit.1 | ||||
Priority: | --- | ||||||
Version: | 3.0 | ||||||
Hardware: | x86 - 64-bit | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Attachments: |
|
Description
Philippe Marti
2011-06-23 14:36:02 UTC
Created attachment 208 [details]
Program to get timings
On my computer (Intel Core i5-2410M, 2.30GHz) and using gcc with -O2, the following timings are produced by the attached program:
Multiplication 1 takes 2.82 usec
Multiplication 2 takes 1.25 usec
Multiplication 3 takes 2.73 usec
Here, 'multiplication 1' refers to the multiplication with block(), 'multiplication 2' refers to the multiplication without block() and 'multiplication 3' refers to the multiplication with block() and transpose().
It's not surprising to me that the second formulation (without the block) is faster. The second formulation is implemented to something like this (ignoring vectorization and some additional optimizations):
for (int i = 0; i < 100*20; ++i)
*(sol + i) = *(mat1 + i) * *(mat2 + i);
The first formulation yields something like this:
for (int row = 0; i < 100; ++row)
for (int col = 0; col < 20; ++col)
*(sol + row * 20 + col) = *(bigMat + row * 20 + col) * *(mat2 + row * 20 + col);
The point is that the first formulation requires a double loop while the second one is translated in a single loop. The single loop has less branching so it's faster.
This also explains why the difference gets (relatively) smaller if the matrices get bigger.
That being said, the difference is bigger than I'd expected. I'll leave it open for our performance gurus to decide whether there is an issue here that needs to be fixed or not.
-- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/303. |