This bugzilla service is closed. All entries have been migrated to https://gitlab.com/libeigen/eigen

Bug 357

Summary: Poor fixed-size vectorizable performance with MSVC 2010
Product: Eigen Reporter: Colm <metmywaterloo>
Component: Core - vectorizationAssignee: Nobody <eigen.nobody>
Status: NEW ---    
Severity: Unknown CC: gael.guennebaud, hauke.heibel, jacob.benoit.1
Priority: ---    
Version: 3.0   
Hardware: x86 - 64-bit   
OS: Windows   
Whiteboard:
Attachments:
Description Flags
GCC Assembler output for test case.
none
Portion of MSVC assembler output none

Description Colm 2011-10-04 18:06:42 UTC
I was having trouble with the matrix exponential function on Windows with Visual C++.  It refused to produce optimized code. I messed around and came up with this simple test which seems to show a huge performance difference between gcc and MSVC for fixed-size vectorizable classes.  

e.g.

using Eigen::Matrix4d;

int main()
{
	
	Matrix4d A = Matrix4d::Random();
	Matrix4d A_Id = Matrix4d::Identity();
	Matrix4d A2 = A * A;
	Matrix4d A4 = A2 * A2;
	Matrix4d m_tmp2;
	
	const double b[] = {17297280., 8648640., 1995840., 277200., 25200., 1512., 56., 1.};
	
	std::clock_t start = std::clock();

	for(int ct=0; ct< 1e7; ++ct)
	{
	m_tmp2 = b[7]*A + b[5]*A4 + b[3]*A2 + b[1]*A_Id;
	}
	std::clock_t stop = std::clock();
	std::cout << "Time taken: " << (stop-start)/(double)CLOCKS_PER_SEC << std::endl;
	
	return 0;
}

Using gcc 4.6.1 (TDM-GCC) as 
>> g++ -s -O3 -DNDEBUG -Ieigen 
gives 
Time taken: 0.03

Using MSVC 2010 64bit as
>> cl /O2 /D"NDEBUG -I"eigen"
gives
Time taken: 0.213

The difference vanishes (actually gcc is a little slower) with dynamic sized matrices.
Comment 1 Gael Guennebaud 2011-10-04 18:31:44 UTC
Be careful with this kind of simple benchmarks where the compiler could too aggressively optimize your code. For instance the compiler could completely remove the for loop or take advantage that the values of the objects are known at compile time.

It is better to write a small function with EIGEN_DONT_INLINE:

EIGEN_DONT_INLINE void foo(Matrix4d& A, ....) {
 m_tmp2 = b[7]*A + b[5]*A4 + b[3]*A2 + b[1]*A_Id;
}

and then call this function multiple times to bench it.

However I'm not sure EIGEN_DONT_INLINE does anything with MSVC, so perhaps use a separate .cpp file to implement it and make sure this function won't be inlined.
Comment 2 Gael Guennebaud 2011-10-04 18:42:31 UTC
ok, ICC seems to perform poorly as well. I check the assembler, and the reason is poor inlining. I bet this is the same reason with MSVC.

To check the asm, I add an enclosing pair:

EIGEN_ASM_COMMENT("mybegin");
...
EIGEN_ASM_COMMENT("myend");

around the critical expression to facilitate the search of the relevant asm lines.

You could try to figure out which function are poorly inlined with MSVC, and declare them with EIGEN_STRONG_INLINE, and get back to us.
Comment 3 Gael Guennebaud 2011-10-04 18:49:18 UTC
EDIT:

actually ICC does not inline the assignment but the rest is properly inlined, the performance issue is not here for ICC. The reason is an abusive use of the movddup instruction which is called multiple times (4) on the same variable while it should be called only once per b[i].

could you check the assembler produced by MSVC?
Comment 4 Colm 2011-10-04 20:48:33 UTC
Created attachment 216 [details]
GCC Assembler output for test case.
Comment 5 Colm 2011-10-04 20:49:02 UTC
Created attachment 217 [details]
Portion of MSVC assembler output
Comment 6 Colm 2011-10-04 20:50:12 UTC
(In reply to comment #3)
> EDIT:
> 
> actually ICC does not inline the assignment but the rest is properly inlined,
> the performance issue is not here for ICC. The reason is an abusive use of the
> movddup instruction which is called multiple times (4) on the same variable
> while it should be called only once per b[i].
> 
> could you check the assembler produced by MSVC?

Wow, thanks for the fast response.  

Sorry, I'm not very useful with the assembler code.  I spent some time poking around and the best I can come up with is that gcc is creating these nice "assign_LinearTraversal_CompleteUnrolling" functions whereas MSVC is not.  However, I'm clueless on why. Poking around in Assign.h everything seems to be EIGEN_STRONG_INLINE.
Comment 7 Nobody 2019-12-04 11:10:12 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/357.