New user self-registration is disabled due to spam. Please email eigen-core-team @ lists.tuxfamily.org if you need an account.
Before reporting a bug, please make sure that your Eigen version is up-to-date!
Bug 837 - ei_declare_aligned_stack_constructed_variable alignment wrong with clang
Summary: ei_declare_aligned_stack_constructed_variable alignment wrong with clang
Status: RESOLVED FIXED
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - general (show other bugs)
Version: 3.2
Hardware: x86 - SSE Linux
: Normal Crash
Assignee: Nobody
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 3.3
  Show dependency treegraph
 
Reported: 2014-07-03 03:03 UTC by Eric Martin
Modified: 2014-07-08 20:15 UTC (History)
4 users (show)



Attachments

Description Eric Martin 2014-07-03 03:03:34 UTC
This bug is both in Eigen 3.2.1 and today's unstable release (f240bbe84f94).

ei_declare_aligned_stack_constructed_variable (defined in Core/util/Memory.h) must return a 16 byte aligned pointer for loadLhs in Core/products/GeneralBlockPanelKernel.h to work (otherwise there is a segfault).

In clang, I have code where ei_declare_aligned_stack_constructed_variable returns a pointer that is not 16 byte aligned (only 8 byte aligned). I couldn't reduce this to a simple test case, but I did discover that my code is calling EIGEN_ALIGNED_ALLOCA and that EIGEN_ALIGN_BYTES=16. This means my call to ei_declare_aligned_stack is just a simple call to EIGEN_ALLOCA (no allocating extra space or anything).

Memory.h has a comment that LLVM aligns buffer on 16 bytes, but this contradicts that. I couldn't find any evidence that LLVM aligns on 16 bytes, and found this bug indicating the alignment for clang is actually 8 bytes: https://code.google.com/p/nativeclient/issues/detail?id=3795

Suggested fix:
change the EIGEN_ALIGN_BYTES > 16 check to EIGEN_ALIGN_BYTES > 8, or better yet (if you don't want to rely on undocumented features of clang), just always take that branch.

Also, the indentation is misleading in this header file and should be fixed.

Cheers,
Eric
Comment 1 Christoph Hertzberg 2014-07-03 15:04:42 UTC
I can confirm this using clang3.1, on Linux 32bit, but not clang3.3 on Linux 64bit. Can you tell what clang version and what bitness you have? (I assume 32bit)
I never ran the unit tests on my 32bit machine with clang (it would take hours ...) otherwise, this bug should have appeared already.

EIGEN_ALIGN_BYTES will always be 16 or 32, so checking for >8 would be equivalent to always using the alternative code path.

In Bug 779 we have a related discussion if we ever should rely on alignedness of malloc.
Comment 2 Christoph Hertzberg 2014-07-03 16:45:06 UTC
I ran all product-related tests using clang on 32bit linux, and all passed.
I experienced that clang's alloca seems to be 16 byte aligned, if there happens to be an aligned variable declared "nearby". This could explain why all unit tests pass, but that solution is of course quite fragile.

The question is if we should always use the save variant, or only on 32bit systems, or only with clang on 32bit systems. I found no documentation that gcc's alloca is guaranteed to be aligned on 32bit systems (I haven't searched much, though).
Comment 3 Eric Martin 2014-07-07 20:10:34 UTC
(In reply to comment #2)
> I ran all product-related tests using clang on 32bit linux, and all passed.
> I experienced that clang's alloca seems to be 16 byte aligned, if there happens
> to be an aligned variable declared "nearby". This could explain why all unit
> tests pass, but that solution is of course quite fragile.
> 
> The question is if we should always use the save variant, or only on 32bit
> systems, or only with clang on 32bit systems. I found no documentation that
> gcc's alloca is guaranteed to be aligned on 32bit systems (I haven't searched
> much, though).

I'm using 64bit Linux and a forked version of clang (I don't know at which version the fork diverged from the main clang/LLVM project), but I doubt the fork changed the alignment of alloca calls. This means that we definitely need to use the save variant (assuming this means the branch of the code that allocates the extra space and manually fixes alignment) for both 64 bit and 32 bit clang.

I did a bit of searching about gcc alloca alignment. I didn't find anything too definitive, but these 2 bug reports indicate that gcc's alloca is 16 byte aligned:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19131
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55945

The first bug report looks like 32 bit assembly, but that report is old (from 2004), so I think we know that gcc's alloca was 16 byte aligned in 2004. The second bug report affirms that gcc's alloca is 16 byte aligned for 64 bit systems.

Based off of this, I think we could do something like:
if not gcc or align_bytes > 16:
    alloca extra space and align manually
else:
    use gcc alloca

I have no idea about alignment on CPUs that aren't x86 or x86_64 though, and I haven't looked into how other compilers (such as icc) behave. The non-fragile thing to do would just always use the "alloca extra space and manually align" method.
Comment 4 Gael Guennebaud 2014-07-08 11:26:57 UTC
> The non-fragile thing to do would just always use the "alloca extra space
> and manually align" method.

Indeed, this seems to be the most reasonable choice.
Comment 5 Christoph Hertzberg 2014-07-08 14:01:14 UTC
I experimented a bit, and found that if we replace (simplified):
    (alloca(size+16) &~15) + 16;
by
    (alloca(size+15) + 15) & ~15;
then GCC is smart enough to produce no extra overhead (except for very few additional bytes on the stack). I.e., if x is already aligned, (x + 15)&~15 is a NOP.
Given that, I agree on always using the safe path.
Pushed here:
https://bitbucket.org/eigen/eigen/commits/f3b634b0f
If that works on your system(s) as well, we can back-port to 3.2


Unfortunately, with EIGEN_ALIGN_BYTES=32, GCC still produces a slightly sub-optimal:
	leal	15(%esp), %edx
	andl	$-16, %edx
	addl	$31, %edx
	andl	$-32, %edx
instead of the equivalent:
	leal	31(%esp), %edx
	andl	$-32, %edx

But that is only a minor gcc-optimization problem, we can hardly circumvent.
Comment 6 Eric Martin 2014-07-08 20:15:14 UTC
This works on my system and seems like it should work on every system as a method of getting aligned pointers.

I personally don't need a back-port to 3.2, so its completely up to you if this bug is serious enough to warrant a back-port.

Note You need to log in before you can comment on or make changes to this bug.