We have experienced crashes with Eigen 3.2.1 (and as far back as Eigen 3.1.2) under 64-bit OS variants, including Windows 7 and 8 (VS 2010 and VS 2012) and Red Hat Enterprise Linux 6.5 (glibc-2.12, standard GCC 4.4, devtoolset-1.1/GCC 4.7, and devtoolset-2/GCC 4.8). We are using Intel Parallel Studio XE 2013 SP1 on all platforms. The header file Eigen/Core/util/Memory.h incorrectly asserts that std::malloc/realloc/etc under 64-bit Windows and 64-bit Linux is always properly aligned. This simply is not true, as can be demonstrated by the supplied patch. The same assumption is make about Apple, but we have not verified this.
If you replace Eigen/Core/util/Memory.h with the version provided, or if you apply the provided patch, you will see warnings about std::malloc not being aligned. Clearly, the only solution on these systems is to always call posix_memalign (Linux) or _aligned_malloc (Windows).
Created attachment 427 [details]
Patch for Eigen/Core/util/Memory.h (wraps malloc and realloc to test allocated memory for proper alignment)
Wraps std::malloc and std::realloc to test allocated memory for proper alignment.
Created attachment 428 [details]
Patched Eigen/Core/util/Memory.h for Eigen 3.2.1
For your convenience, my already patched version of the header file.
hm, you're the first one to encounter such an issue. The problem probably comes from Intel Parallel Studio which likely generates code bypassing system's malloc.
"The address of a block returned by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems). "
and for windows:
For Windows, alignment is only guaranteed to be "fundamental" (8-byte or 16-byte boundary) for VS 2013. However, if you look at the documentation for VS 2010 or 2012, this is not the case:
"malloc is guaranteed to return memory that's aligned on a boundary that's suitable for storing any object that could fit in the amount of memory that's allocated. For example, a four-byte allocation would be aligned on a boundary that supports any four-byte or smaller object. Memory alignment on a boundary that's suitable for a larger object than will fit in the allocation is not guaranteed."
Therefore, allocating an odd number of single-precision floats may not necessarily be 16-byte aligned on a 64-bit processor when compiling with earlier versions of MSVC.
For Visual Studio 2008, we have:
"malloc is required to return memory on a 16-byte boundary."
I doubt they changed this behaviour for 64bits system, otherwise all our unit tests would fail.
Which compiler flag are you using?
What about the following fix (lines 51-58):
#if (defined(__APPLE__) \
|| defined(_WIN64) \
|| EIGEN_GLIBC_MALLOC_ALREADY_ALIGNED \
|| EIGEN_FREEBSD_MALLOC_ALREADY_ALIGNED) \
#define EIGEN_MALLOC_ALREADY_ALIGNED 1
#define EIGEN_MALLOC_ALREADY_ALIGNED 0
It basically bypass malloc when using Intel's C++ runtime library. Does it work for you?
Your suspicion about Intel Parallel Studio drove me to do some more digging. On the Linux side, I have potentially traced the problem to libtbbmalloc_proxy.so from the Intel TBB library. The same may apply to Windows, but unless you are using VS 2013, I don't find the documentation for VS 2012 and below to be very comforting. If this turns out to be an Intel problem, I don't think you should have to patch anything by default. Rather, people should probably define a macro to force Eigen to assume that memory allocation will not be properly aligned. Is there such a macro at present?
To answer your question, we are using Intel MKL and TBB libraries, but we prefer to use the standard, native compilers on each system (MSVC or GCC) so __INTEL_CXXLIB_ICC will not be defined. I'd say let's leave well alone, but perhaps devise a way to force posix_memalign or _aligned_malloc to be called when the user knows that his application will clash with your otherwise correct #ifdef logic.
Thank you. You guys are very responsive, and Eigen rocks!
You can compile with
to force posix_memalign/_aligned_malloc
I wonder, is there a measurable advantage of using malloc instead of one of the aligned mallocs?
I wish I could answer that, but I simply do not know.
(In reply to comment #10)
> I wonder, is there a measurable advantage of using malloc instead of one of the
> aligned mallocs?
The assumption is that standard malloc can only be better or in the worst case equivalent to the aligned versions.
Moreover, a few libraries bypass standard malloc calls for memory debugging or performance purpose. libtbbmalloc_proxy is one example. Therefore, using malloc instead of aligned_malloc or posix_memalign makes this process easier on systems where malloc is already aligned.
That being said, with this last argument, we should probably enable only two paths: standard malloc and the handmade one....
Won't fix, but at least now the remedy should be easier to found: