It turns out that Intel Composer 14 does NOT respect __forcedinline directive present at out-of-class member function definitions, but only those at member declarations. For example, DenseBase::lazyAssign, defined separately in assign.h is NOT inlined for any but the simplest expression templates (which makes a horrible mess of performance for expressions with large number of scalars, as they're all pushed into stack). Visual Studio, on the other hand, requires __forcedinline at definitions (but has no problem with them present at declarations also), so, please, consider duplicating EIGEN_STRONG_INLINE into member declarations.
Sorry for not looking at this issue earlier. ICC is indeed that stupid, and pretty bad at inlining in general. I have examples where it fails to inline the trivial copy-constructor that it generated itself. For instance for CwiseUnaryOp, it introduces calls to functions with a body as trivial as:
movq (%rsi), %rax
movq %rax, (%rdi)
movq 8(%rsi), %rdx
movq %rdx, 8(%rdi)
I'll try to fix as many of them as possible, but I guess that we should also recommend users to compile with -inline-forceinline (or use gcc or clang ;).
Regarding the discrepancies between declarations and definitions, since there are more than 2000 occurences of EIGEN_STRONG_INLINE we would need an automatic way to detect them... any ideas?
Here is a first bunch of fixes limiting the damages:
I haven't included the explicit copy-ctor because I'd prefer to find another workaround, hopefully...
-- GitLab Migration Automatic Message --
This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/667.