This bugzilla service is closed. All entries have been migrated to https://gitlab.com/libeigen/eigen

Bug 907

Summary: ARMv8 NEON compile errors (using both Android NDK & Xcode)
Product: Eigen Reporter: Michael Hofmann <kmhofmann>
Component: Core - vectorizationAssignee: Konstantinos Margaritis <markos>
Status: RESOLVED WORKSFORME    
Severity: Compilation Problem CC: chtz, gael.guennebaud, jacob.benoit.1, jesus.nuevochiquero, markos, timmurray
Priority: Normal    
Version: 3.3 (current stable)   
Hardware: ARM - NEON   
OS: Android   
Whiteboard:
Bug Depends on:    
Bug Blocks: 558    
Attachments:
Description Flags
Two patches--new gcc version macro and NEON workaround none

Description Michael Hofmann 2014-11-19 11:48:01 UTC
When trying to compile a project using Eigen including NEON support on ARM64-v8a, I am encountering a whole bunch of compilation errors. These occur both when compiling with the Android NDK (for Android devices) as well as when compiling with Apple's Xcode (for iOS devices).

As I'm not sure how to fix these (several, seemingly unrelated?) errors, I can just list the partial compiler errors I have encountered so far:

When using Xcode 6.1 (Clang 3.5) / arm64-v8a:
------------
In file included from ../External/src/Eigen/Eigen/Core:304:
../External/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:549:42: error: use of undeclared identifier 'vreinterpretq_u64_f64'; did you mean 'vreinterpretq_u64_s64'?
  return vreinterpretq_f64_u64(vandq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b)));
                                         ^~~~~~~~~~~~~~~~~~~~~
                                         vreinterpretq_u64_s64
In file included from ../External/src/Eigen/Eigen/Core:186:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.0/include/arm_neon.h:28:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.0/include/arm64_neon_internal.h:5176:17: note: 'vreinterpretq_u64_s64' declared here
__ai uint64x2_t vreinterpretq_u64_s64(int64x2_t __a) {
                ^
In file included from ../External/src/Eigen/Eigen/Core:304:
../External/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:549:67: error: use of undeclared identifier 'vreinterpretq_u64_f64'; did you mean 'vreinterpretq_u64_s64'?
  return vreinterpretq_f64_u64(vandq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b)));
                                                                  ^~~~~~~~~~~~~~~~~~~~~
                                                                  vreinterpretq_u64_s64
[……]

When using Android NDK r10c / arm64-v8a / GCC 4.9:
------------
../external/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h: In function 'Packet Eigen::internal::pand(const Packet&, const Packet&) [with Packet = __vector(2) __builtin_aarch64_simd_df]':
../external/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:549:65: error: 'vreinterpretq_u64_f64' was not declared in this scope
   return vreinterpretq_f64_u64(vandq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b)));
                                                                 ^
../external/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:549:92: error: 'vreinterpretq_f64_u64' was not declared in this scope
   return vreinterpretq_f64_u64(vandq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b)));
                                                                                            ^
../external/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h: In function 'Packet Eigen::internal::por(const Packet&, const Packet&) [with Packet = __vector(2) __builtin_aarch64_simd_df]':
../external/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:554:65: error: 'vreinterpretq_u64_f64' was not declared in this scope
   return vreinterpretq_f64_u64(vorrq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b)));
                                                                 ^
../external/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:554:92: error: 'vreinterpretq_f64_u64' was not declared in this scope
   return vreinterpretq_f64_u64(vorrq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b)));
[……]

When using Android NDK r10c / arm64-v8a / Clang 3.5:
------------
../external/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:245:78: error: variable 'res' is uninitialized when used here [-Werror,-Wuninitialized]
  res = __extension__ ({ float32_t __s0 = from[0*stride]; float32x4_t __s1 = res; float32x4_t __ret; __ret = (float32x4_t) __builtin_neon_vsetq_lane_f32(__s0, (int8x16_t)__s1, 0); __ret; });
                                                                             ^~~
../external/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:244:3: note: variable 'res' is declared here
  Packet4f res;
  ^
[……]
../external/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:600:71: error: cannot initialize return object of type 'double' with an rvalue of type 'float64x1_t' (vector of 1 'float64_t' value)
template<> inline double predux<Packet2d>(const Packet2d& a) { return vget_low_f64(a) + vget_high_f64(a); }
                                                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../external/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:616:75: error: cannot initialize return object of type 'double' with an rvalue of type 'float64x1_t' (vector of 1 'float64_t' value)
template<> inline double predux_mul<Packet2d>(const Packet2d& a) { return vget_low_f64(a) * vget_high_f64(a); }
                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[……]


Note that there is a difference when compiling using Clang vs. GCC in the Android NDK; the former does seem to have intrinsic declarations that neither the NDK GCC toolchain nor the Apple Clang toolchain seem to have. Indeed, when doing a grep on 'vreinterpretq_u64_f64' in the Android NDK file structure, I can see matches for Clang-only:
android-ndk-r10c $ find . -type f -name "*.h*" -exec grep -Hni "vreinterpretq_u64_f64" {} \;
./toolchains/llvm-3.4/prebuilt/darwin-x86_64/lib/clang/3.4/include/arm_neon.h:6721:__ai uint64x2_t vreinterpretq_u64_f64(float64x2_t __a) {
./toolchains/llvm-3.5/prebuilt/darwin-x86_64/lib/clang/3.5/include/arm_neon.h:35922:__ai uint64x2_t vreinterpretq_u64_f64(float64x2_t __p0) {
./toolchains/llvm-3.5/prebuilt/darwin-x86_64/lib/clang/3.5/include/arm_neon.h:35928:__ai uint64x2_t vreinterpretq_u64_f64(float64x2_t __p0) {

My version of Eigen is 167ce78594dc4e7a4b9ca27fc745e674300e85ff (default, 6 Nov 2014).
Comment 1 Konstantinos Margaritis 2014-11-20 11:46:49 UTC
according to:

http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/aarch64-neon-intrinsics.c?view=markup&pathrev=194954

LLVM does seem to support this particular intrinsic and it's been included in GCC a while longer. My guess is that both the Android GCC and Clang that you use are based off an older version which has incomplete support for Aarch64 intrinsics.

From Debian gcc(arm64), version 4.9.2-2:

/usr/lib/gcc/aarch64-linux-gnu/4.9/include/arm_neon.h: line 3364

__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
vreinterpretq_u64_f64 (float64x2_t __a)
{
  return (uint64x2_t) __a;
}

My guess is that this is a bug/omission in the compiler: 

https://android.googlesource.com/toolchain/gcc/+/master/gcc-4.9/gcc/config/aarch64/arm_neon.h

Shows it as missing entirely. My suggestion would be to file bugs against both of those compilers, seems a weird coincidence that they miss this intrinsic, maybe there is a logic behind this, and I'll try to figure out a way to maybe define it for Clang/NDK cases explicitly, could you try defining it yourself and see how it goes?
Comment 2 Christoph Hertzberg 2014-11-20 16:57:35 UTC
(In reply to Konstantinos Margaritis from comment #1)
> From Debian gcc(arm64), version 4.9.2-2:
> 
> /usr/lib/gcc/aarch64-linux-gnu/4.9/include/arm_neon.h: line 3364
> 
> __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
> vreinterpretq_u64_f64 (float64x2_t __a)
> {
>   return (uint64x2_t) __a;
> }

Does `__extension__` indicate that this function is indeed only an extension of gcc/clang? If so, and if that function is really just that one-liner, I think the easiest solution would be to simply implement it ourself for all compilers (under a different name, perhaps as templated function), instead of adding compiler specific implementations.
Comment 3 Konstantinos Margaritis 2014-11-20 17:44:38 UTC
According to

https://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html

__extension__ is just to suppress compiler warnings if -pedantic is used.

I agree that we could just implement it ourselves at least until all compilers support the proper intrinsic (we could add an #ifdef and print a #warning), at least that should let the compilation proceed further to more errors :)
Comment 4 Michael Hofmann 2014-11-22 19:54:01 UTC
1)
After inserting the following definitions to PacketMath.h (I did it directly after #if EIGEN_ARCH_ARM64):

__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
vreinterpretq_u64_f64 (float64x2_t __a)
{
  return (uint64x2_t) __a;
}

__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
vreinterpretq_f64_u64 (uint64x2_t __a)
{
  return (float64x2_t) __a;
}

everything continues to compile just fine using GCC 4.9/Android NDK.

2)
Clang/Android NDK seems to have the above intrinsics defined already. (See $NDK_DIR/toolchains/llvm-3.5/prebuilt/darwin-x86_64/lib/clang/3.5/include/arm_neon.h.)
However, there is a different error:

error: cannot initialize return object of type 'double' with an rvalue of type 'float64x1_t' (vector of 1 'float64_t' value)
template<> inline double predux<Packet2d>(const Packet2d& a) { return vget_low_f64(a) + vget_high_f64(a); }

It seems that I cannot just cast float64x1_t to double, since it is defined as its own vector type in the Clang arm_neon.h:
typedef __attribute__((neon_vector_type(1))) float64_t float64x1_t;

(I cannot test compilation using Xcode/Clang at the moment.)
Comment 5 J Nuevo 2014-12-02 03:34:48 UTC
Michael, adding those definitions made the errors go away on Xcode with Clang 6.0, but the same error you mention in 2) appears. Did you find a workaround for that? Thank you.
Comment 6 Konstantinos Margaritis 2014-12-02 10:20:00 UTC
I will add those definitions so that at least the problem with gcc goes away, but I'd leave the bug open until we fix clang as well.
Comment 7 J Nuevo 2014-12-02 11:04:00 UTC
Thanks Konstantinos for your great work.

The problem you mentioned above in #2 is correct. GCC 4.9 on Android is an older version and it doesn't fail with the other errors because float64x1_t is defined as double. There are a few messages on GCC mailing list (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60825) about that and since then it has changed to being a vector type. The fix may be short lived, only until the NDK is updated :(.
Comment 8 Michael Hofmann 2014-12-02 11:23:54 UTC
I have tried fixing the errors using Clang/NDK, and I can get everything to compile using Clang by adding an indexed access ([0]) to the following two functions, like this:

template<> EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a) { return (vget_low_f64(a) + vget_high_f64(a))[0]; }

template<> EIGEN_STRONG_INLINE double predux_mul<Packet2d>(const Packet2d& a) { return (vget_low_f64(a) * vget_high_f64(a))[0]; }

Both vget_low_f64() and vget_high_f64() return a float64x1_t, which in Clang's arm_neon.h is defined as follows:
typedef __attribute__((neon_vector_type(1))) float64_t float64x1_t;

This apparently implies stricter type checking and prevents implicit or explicit conversions from float64x1_t to float64_t/double. I am not sure if there is an intrinsic that does an equivalent thing (did not find a respective vreinterpret*() one).  Note that float64_t is defined as a double.
I'm *assuming* that adding the index [0] does the right thing, but I did not check for correctness.

In GCC's arm_neon.h, float64x1_t is just the same as a double (and float64_t does not exist at all):
typedef double float64x1_t;

This in turn makes it clear that my above change (adding the indexed access) does break compilation under GCC!
I'm not quite sure what the best solution is. Add #ifdefs for the respective compilers?
Comment 9 Michael Hofmann 2014-12-02 11:30:25 UTC
J Nuevo, that's interesting. Comment #2 in the bug report you linked says:
"For intrinsics code to be portable, conversion between neon intrinsics vector types and the scalar types should be done via the corresponding vcreat and vget_lane intrinsics."

Seems like that might be the right way to do it. And Konstantinos might need to maintain a workaround for older(?) GCC versions.
Comment 10 Tim Murray 2015-01-06 00:57:32 UTC
Created attachment 514 [details]
Two patches--new gcc version macro and NEON workaround

I encountered this today while building with clang. According to the gcc bug linked above, 4.9.3 and above plus clang should work with float64x1_t as a vector type, while gcc 4.9.0 through 4.9.2 are broken. I'm attaching two patches: one to add a macro to check gcc version at the patch level and one to add the actual workaround to use the vector syntax with non-gcc compilers and gcc 4.9.3+. I've confirmed that this fixes compilation with clang.
Comment 11 Gael Guennebaud 2015-01-07 09:51:17 UTC
https://bitbucket.org/eigen/eigen/commits/ef4a8dc42592/
Changeset:   ef4a8dc42592
User:        ggael
Date:        2015-01-07 08:41:56+00:00
Summary:     Bug 907: fix compilation with ARM64


https://bitbucket.org/eigen/eigen/commits/b9210aebb4dd/
Changeset:   b9210aebb4dd
User:        ggael
Date:        2015-01-07 08:44:25+00:00
Summary:     Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64)
Comment 12 Michael Hofmann 2015-01-07 15:39:05 UTC
Commit b9210aebb4dd does not fix compilation for Xcode 6.1/Clang, which is also missing the function definitions of vreinterpretq_u64_f64 and vreinterpretq_f64_u64. (Note that they are present in Android NDK r10d/Clang.) Extending the conditional compilation to include Xcode 6.1/Clang fixes this issue.

Commit ef4a8dc42592 is correct and therefore a good fix, but unfortunately it triggers an internal compiler error in the Xcode 6.1/Clang backend when compiling for arm64:

fatal error: error in backend: Cannot select: 0x18f536210: v1f64 = fadd 0x179f3cb10, 0x18f53aa10 [ORD=25] [ID=34] dbg:../External/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:618
  0x179f3cb10: v1f64 = EXTRACT_SUBREG 0x18f44ff10, 0x18f526b10 [ORD=23] [ID=29]
    0x18f44ff10: v2f64,ch = CopyFromReg 0x1968c3918, 0x18f504310 [ORD=23] [ID=21]
      0x18f504310: v2f64 = Register %vreg160 [ID=11]
    0x18f526b10: i32 = TargetConstant<2> [ID=14]
  0x18f53aa10: v1f64 = extract_subvector 0x18f44ff10, 0x179dfba10 [ORD=24] [ID=30]
    0x18f44ff10: v2f64,ch = CopyFromReg 0x1968c3918, 0x18f504310 [ORD=23] [ID=21]
      0x18f504310: v2f64 = Register %vreg160 [ID=11]
    0x179dfba10: i64 = Constant<1> [ID=2]
In function: _ZN5Eigen8internal29general_matrix_vector_productIldLi1ELb0EdLb0ELi1EE3runEllPKdlS4_lPdld
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)
Target: arm64-apple-darwin13.4.0
Thread model: posix
Comment 13 Gael Guennebaud 2015-01-13 09:55:54 UTC
Indeed, I only tested with NDK/clang and expected that xcode/clang would behave the same. That's really annoying.

Do you know which predefined macro we could use to reliably detect Xcode? Would the following condition work:

#if (EIGEN_COMP_GNUC_STRICT && defined(__ANDROID__)) || defined(__apple_build_version__)



Regarding the ICE, the only solution I see is to use your (...)[0] suggestion for clang.
Comment 14 Gael Guennebaud 2015-01-13 10:29:26 UTC
I pushed the proposed changes by mistake:

https://bitbucket.org/eigen/eigen/commits/c401a7c2da4e/
Changeset:   c401a7c2da4e
User:        ggael
Date:        2015-01-13 08:57:37+00:00
Summary:     Bug 907, ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang

https://bitbucket.org/eigen/eigen/commits/c85347b1546d/
Changeset:   c85347b1546d
User:        ggael
Date:        2015-01-13 09:03:00+00:00
Summary:     Bug 907, ARM64: workaround ICE in xcode/clang

Let's hope they are working for you, and if that's the case, feel-free to close the bug entry.
Comment 15 Michael Hofmann 2015-01-13 10:36:58 UTC
Not sure what the best macro would be to specifically detect Xcode. We use
  #if defined(__APPLE__) && TARGET_OS_IPHONE
to reliably detect iOS targets, and indeed, on the Mac/iOS platforms, __APPLE__ is always defined. Clang can be detected by __clang__, so the combination:
  #if defined(__APPLE__) && defined(__clang__)
might get us most of the way. I'm not sure if the current behavior will be fixed in a later version (that's always hard to tell), but for sake of completeness, "/usr/bin/clang -dM -E -x c /dev/null" with Xcode 6.1 installed gives me:
  #define __clang__ 1
  #define __clang_major__ 6
  #define __clang_minor__ 0
  #define __clang_patchlevel__ 0
  #define __clang_version__ "6.0 (clang-600.0.56)"
  (...)


Regarding the ICE, I guess my first suggestion using (...)[0] is not really portable, so we should restrict it to Clang/Xcode as much as possible.
(At some point I want to try and build Clang from their SVN repo - it might be fixed after all - but I haven't figured out how to properly build it as cross-compiler yet...)
Comment 16 Michael Hofmann 2015-01-13 11:44:43 UTC
Commit c401a7c2da4e solved that particular issue for me. That one's good to keep!

Commit c85347b1546d unfortunately does not seem to fix the ICE - just realizing it still crashes. :( Not sure why it didn't do that before. I would suggest to back out that one until I have further insight into what's actually happening. Which I will try to dig deeper into, but it won't be immediately due to time constraints.


By the way, it would be great if you could add

  #pragma GCC diagnostic push
  #pragma GCC diagnostic ignored "-Wuninitialized"

to the beginning (e.g. after the include guard), and

  #pragma GCC diagnostic pop

to the end (e.g. before the include guard) of both Eigen/src/Core/arch/NEON/Complex.h and Eigen/src/Core/arch/NEON/PacketMath.h. This fixes ugly "variable 'res' is uninitialized when used here" warnings that are emitted by both(!) GCC and Clang.
Comment 17 Benoit Jacob 2015-03-03 15:19:15 UTC
I just ran into the ICE too. I am told it is a known bug in Clang 3.5, fixed in Clang 3.6. For now, I'll push a cset disabling double vectorization on current values of __apple_build_version__.
Comment 20 Benoit Jacob 2015-12-17 00:51:32 UTC
Looks like this issue can be closed? --- can anyone still reproduce a problem after the above two changesets?
Comment 21 Michael Hofmann 2015-12-18 20:23:06 UTC
From my side (bug reporter) this can be closed; cannot reproduce the problem anymore.
Comment 22 Benoit Jacob 2016-03-30 13:51:24 UTC
In bug 1186 I'm proposing a change to the vreinterpretq_f64_u64 workaround to support newer Android/Aarch64/Clang toolchains also missing this intrinsic.
Comment 23 Nobody 2019-12-04 13:54:49 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/907.