This bug is both in Eigen 3.2.1 and today's unstable release (f240bbe84f94).
ei_declare_aligned_stack_constructed_variable (defined in Core/util/Memory.h) must return a 16 byte aligned pointer for loadLhs in Core/products/GeneralBlockPanelKernel.h to work (otherwise there is a segfault).
In clang, I have code where ei_declare_aligned_stack_constructed_variable returns a pointer that is not 16 byte aligned (only 8 byte aligned). I couldn't reduce this to a simple test case, but I did discover that my code is calling EIGEN_ALIGNED_ALLOCA and that EIGEN_ALIGN_BYTES=16. This means my call to ei_declare_aligned_stack is just a simple call to EIGEN_ALLOCA (no allocating extra space or anything).
Memory.h has a comment that LLVM aligns buffer on 16 bytes, but this contradicts that. I couldn't find any evidence that LLVM aligns on 16 bytes, and found this bug indicating the alignment for clang is actually 8 bytes: https://code.google.com/p/nativeclient/issues/detail?id=3795
change the EIGEN_ALIGN_BYTES > 16 check to EIGEN_ALIGN_BYTES > 8, or better yet (if you don't want to rely on undocumented features of clang), just always take that branch.
Also, the indentation is misleading in this header file and should be fixed.
I can confirm this using clang3.1, on Linux 32bit, but not clang3.3 on Linux 64bit. Can you tell what clang version and what bitness you have? (I assume 32bit)
I never ran the unit tests on my 32bit machine with clang (it would take hours ...) otherwise, this bug should have appeared already.
EIGEN_ALIGN_BYTES will always be 16 or 32, so checking for >8 would be equivalent to always using the alternative code path.
In Bug 779 we have a related discussion if we ever should rely on alignedness of malloc.
I ran all product-related tests using clang on 32bit linux, and all passed.
I experienced that clang's alloca seems to be 16 byte aligned, if there happens to be an aligned variable declared "nearby". This could explain why all unit tests pass, but that solution is of course quite fragile.
The question is if we should always use the save variant, or only on 32bit systems, or only with clang on 32bit systems. I found no documentation that gcc's alloca is guaranteed to be aligned on 32bit systems (I haven't searched much, though).
(In reply to comment #2)
> I ran all product-related tests using clang on 32bit linux, and all passed.
> I experienced that clang's alloca seems to be 16 byte aligned, if there happens
> to be an aligned variable declared "nearby". This could explain why all unit
> tests pass, but that solution is of course quite fragile.
> The question is if we should always use the save variant, or only on 32bit
> systems, or only with clang on 32bit systems. I found no documentation that
> gcc's alloca is guaranteed to be aligned on 32bit systems (I haven't searched
> much, though).
I'm using 64bit Linux and a forked version of clang (I don't know at which version the fork diverged from the main clang/LLVM project), but I doubt the fork changed the alignment of alloca calls. This means that we definitely need to use the save variant (assuming this means the branch of the code that allocates the extra space and manually fixes alignment) for both 64 bit and 32 bit clang.
I did a bit of searching about gcc alloca alignment. I didn't find anything too definitive, but these 2 bug reports indicate that gcc's alloca is 16 byte aligned:
The first bug report looks like 32 bit assembly, but that report is old (from 2004), so I think we know that gcc's alloca was 16 byte aligned in 2004. The second bug report affirms that gcc's alloca is 16 byte aligned for 64 bit systems.
Based off of this, I think we could do something like:
if not gcc or align_bytes > 16:
alloca extra space and align manually
use gcc alloca
I have no idea about alignment on CPUs that aren't x86 or x86_64 though, and I haven't looked into how other compilers (such as icc) behave. The non-fragile thing to do would just always use the "alloca extra space and manually align" method.
> The non-fragile thing to do would just always use the "alloca extra space
> and manually align" method.
Indeed, this seems to be the most reasonable choice.
I experimented a bit, and found that if we replace (simplified):
(alloca(size+16) &~15) + 16;
(alloca(size+15) + 15) & ~15;
then GCC is smart enough to produce no extra overhead (except for very few additional bytes on the stack). I.e., if x is already aligned, (x + 15)&~15 is a NOP.
Given that, I agree on always using the safe path.
If that works on your system(s) as well, we can back-port to 3.2
Unfortunately, with EIGEN_ALIGN_BYTES=32, GCC still produces a slightly sub-optimal:
leal 15(%esp), %edx
andl $-16, %edx
addl $31, %edx
andl $-32, %edx
instead of the equivalent:
leal 31(%esp), %edx
andl $-32, %edx
But that is only a minor gcc-optimization problem, we can hardly circumvent.
This works on my system and seems like it should work on every system as a method of getting aligned pointers.
I personally don't need a back-port to 3.2, so its completely up to you if this bug is serious enough to warrant a back-port.