Summary: | Poor fixed-size vectorizable performance with MSVC 2010 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Eigen | Reporter: | Colm <metmywaterloo> | ||||||
Component: | Core - vectorization | Assignee: | Nobody <eigen.nobody> | ||||||
Status: | NEW --- | ||||||||
Severity: | Unknown | CC: | gael.guennebaud, hauke.heibel, jacob.benoit.1 | ||||||
Priority: | --- | ||||||||
Version: | 3.0 | ||||||||
Hardware: | x86 - 64-bit | ||||||||
OS: | Windows | ||||||||
Whiteboard: | |||||||||
Attachments: |
|
Description
Colm
2011-10-04 18:06:42 UTC
Be careful with this kind of simple benchmarks where the compiler could too aggressively optimize your code. For instance the compiler could completely remove the for loop or take advantage that the values of the objects are known at compile time. It is better to write a small function with EIGEN_DONT_INLINE: EIGEN_DONT_INLINE void foo(Matrix4d& A, ....) { m_tmp2 = b[7]*A + b[5]*A4 + b[3]*A2 + b[1]*A_Id; } and then call this function multiple times to bench it. However I'm not sure EIGEN_DONT_INLINE does anything with MSVC, so perhaps use a separate .cpp file to implement it and make sure this function won't be inlined. ok, ICC seems to perform poorly as well. I check the assembler, and the reason is poor inlining. I bet this is the same reason with MSVC. To check the asm, I add an enclosing pair: EIGEN_ASM_COMMENT("mybegin"); ... EIGEN_ASM_COMMENT("myend"); around the critical expression to facilitate the search of the relevant asm lines. You could try to figure out which function are poorly inlined with MSVC, and declare them with EIGEN_STRONG_INLINE, and get back to us. EDIT: actually ICC does not inline the assignment but the rest is properly inlined, the performance issue is not here for ICC. The reason is an abusive use of the movddup instruction which is called multiple times (4) on the same variable while it should be called only once per b[i]. could you check the assembler produced by MSVC? Created attachment 216 [details]
GCC Assembler output for test case.
Created attachment 217 [details]
Portion of MSVC assembler output
(In reply to comment #3) > EDIT: > > actually ICC does not inline the assignment but the rest is properly inlined, > the performance issue is not here for ICC. The reason is an abusive use of the > movddup instruction which is called multiple times (4) on the same variable > while it should be called only once per b[i]. > > could you check the assembler produced by MSVC? Wow, thanks for the fast response. Sorry, I'm not very useful with the assembler code. I spent some time poking around and the best I can come up with is that gcc is creating these nice "assign_LinearTraversal_CompleteUnrolling" functions whereas MSVC is not. However, I'm clueless on why. Poking around in Assign.h everything seems to be EIGEN_STRONG_INLINE. -- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/357. |