Summary: | ARMv8 NEON compile errors (using both Android NDK & Xcode) | ||||||
---|---|---|---|---|---|---|---|
Product: | Eigen | Reporter: | Michael Hofmann <kmhofmann> | ||||
Component: | Core - vectorization | Assignee: | Konstantinos Margaritis <markos> | ||||
Status: | RESOLVED WORKSFORME | ||||||
Severity: | Compilation Problem | CC: | chtz, gael.guennebaud, jacob.benoit.1, jesus.nuevochiquero, markos, timmurray | ||||
Priority: | Normal | ||||||
Version: | 3.3 (current stable) | ||||||
Hardware: | ARM - NEON | ||||||
OS: | Android | ||||||
Whiteboard: | |||||||
Bug Depends on: | |||||||
Bug Blocks: | 558 | ||||||
Attachments: |
|
Description
Michael Hofmann
2014-11-19 11:48:01 UTC
according to: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/aarch64-neon-intrinsics.c?view=markup&pathrev=194954 LLVM does seem to support this particular intrinsic and it's been included in GCC a while longer. My guess is that both the Android GCC and Clang that you use are based off an older version which has incomplete support for Aarch64 intrinsics. From Debian gcc(arm64), version 4.9.2-2: /usr/lib/gcc/aarch64-linux-gnu/4.9/include/arm_neon.h: line 3364 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) vreinterpretq_u64_f64 (float64x2_t __a) { return (uint64x2_t) __a; } My guess is that this is a bug/omission in the compiler: https://android.googlesource.com/toolchain/gcc/+/master/gcc-4.9/gcc/config/aarch64/arm_neon.h Shows it as missing entirely. My suggestion would be to file bugs against both of those compilers, seems a weird coincidence that they miss this intrinsic, maybe there is a logic behind this, and I'll try to figure out a way to maybe define it for Clang/NDK cases explicitly, could you try defining it yourself and see how it goes? (In reply to Konstantinos Margaritis from comment #1) > From Debian gcc(arm64), version 4.9.2-2: > > /usr/lib/gcc/aarch64-linux-gnu/4.9/include/arm_neon.h: line 3364 > > __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) > vreinterpretq_u64_f64 (float64x2_t __a) > { > return (uint64x2_t) __a; > } Does `__extension__` indicate that this function is indeed only an extension of gcc/clang? If so, and if that function is really just that one-liner, I think the easiest solution would be to simply implement it ourself for all compilers (under a different name, perhaps as templated function), instead of adding compiler specific implementations. According to https://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html __extension__ is just to suppress compiler warnings if -pedantic is used. I agree that we could just implement it ourselves at least until all compilers support the proper intrinsic (we could add an #ifdef and print a #warning), at least that should let the compilation proceed further to more errors :) 1) After inserting the following definitions to PacketMath.h (I did it directly after #if EIGEN_ARCH_ARM64): __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) vreinterpretq_u64_f64 (float64x2_t __a) { return (uint64x2_t) __a; } __extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) vreinterpretq_f64_u64 (uint64x2_t __a) { return (float64x2_t) __a; } everything continues to compile just fine using GCC 4.9/Android NDK. 2) Clang/Android NDK seems to have the above intrinsics defined already. (See $NDK_DIR/toolchains/llvm-3.5/prebuilt/darwin-x86_64/lib/clang/3.5/include/arm_neon.h.) However, there is a different error: error: cannot initialize return object of type 'double' with an rvalue of type 'float64x1_t' (vector of 1 'float64_t' value) template<> inline double predux<Packet2d>(const Packet2d& a) { return vget_low_f64(a) + vget_high_f64(a); } It seems that I cannot just cast float64x1_t to double, since it is defined as its own vector type in the Clang arm_neon.h: typedef __attribute__((neon_vector_type(1))) float64_t float64x1_t; (I cannot test compilation using Xcode/Clang at the moment.) Michael, adding those definitions made the errors go away on Xcode with Clang 6.0, but the same error you mention in 2) appears. Did you find a workaround for that? Thank you. I will add those definitions so that at least the problem with gcc goes away, but I'd leave the bug open until we fix clang as well. Thanks Konstantinos for your great work. The problem you mentioned above in #2 is correct. GCC 4.9 on Android is an older version and it doesn't fail with the other errors because float64x1_t is defined as double. There are a few messages on GCC mailing list (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60825) about that and since then it has changed to being a vector type. The fix may be short lived, only until the NDK is updated :(. I have tried fixing the errors using Clang/NDK, and I can get everything to compile using Clang by adding an indexed access ([0]) to the following two functions, like this: template<> EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a) { return (vget_low_f64(a) + vget_high_f64(a))[0]; } template<> EIGEN_STRONG_INLINE double predux_mul<Packet2d>(const Packet2d& a) { return (vget_low_f64(a) * vget_high_f64(a))[0]; } Both vget_low_f64() and vget_high_f64() return a float64x1_t, which in Clang's arm_neon.h is defined as follows: typedef __attribute__((neon_vector_type(1))) float64_t float64x1_t; This apparently implies stricter type checking and prevents implicit or explicit conversions from float64x1_t to float64_t/double. I am not sure if there is an intrinsic that does an equivalent thing (did not find a respective vreinterpret*() one). Note that float64_t is defined as a double. I'm *assuming* that adding the index [0] does the right thing, but I did not check for correctness. In GCC's arm_neon.h, float64x1_t is just the same as a double (and float64_t does not exist at all): typedef double float64x1_t; This in turn makes it clear that my above change (adding the indexed access) does break compilation under GCC! I'm not quite sure what the best solution is. Add #ifdefs for the respective compilers? J Nuevo, that's interesting. Comment #2 in the bug report you linked says: "For intrinsics code to be portable, conversion between neon intrinsics vector types and the scalar types should be done via the corresponding vcreat and vget_lane intrinsics." Seems like that might be the right way to do it. And Konstantinos might need to maintain a workaround for older(?) GCC versions. Created attachment 514 [details]
Two patches--new gcc version macro and NEON workaround
I encountered this today while building with clang. According to the gcc bug linked above, 4.9.3 and above plus clang should work with float64x1_t as a vector type, while gcc 4.9.0 through 4.9.2 are broken. I'm attaching two patches: one to add a macro to check gcc version at the patch level and one to add the actual workaround to use the vector syntax with non-gcc compilers and gcc 4.9.3+. I've confirmed that this fixes compilation with clang.
https://bitbucket.org/eigen/eigen/commits/ef4a8dc42592/ Changeset: ef4a8dc42592 User: ggael Date: 2015-01-07 08:41:56+00:00 Summary: Bug 907: fix compilation with ARM64 https://bitbucket.org/eigen/eigen/commits/b9210aebb4dd/ Changeset: b9210aebb4dd User: ggael Date: 2015-01-07 08:44:25+00:00 Summary: Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64) Commit b9210aebb4dd does not fix compilation for Xcode 6.1/Clang, which is also missing the function definitions of vreinterpretq_u64_f64 and vreinterpretq_f64_u64. (Note that they are present in Android NDK r10d/Clang.) Extending the conditional compilation to include Xcode 6.1/Clang fixes this issue. Commit ef4a8dc42592 is correct and therefore a good fix, but unfortunately it triggers an internal compiler error in the Xcode 6.1/Clang backend when compiling for arm64: fatal error: error in backend: Cannot select: 0x18f536210: v1f64 = fadd 0x179f3cb10, 0x18f53aa10 [ORD=25] [ID=34] dbg:../External/src/Eigen/Eigen/src/Core/arch/NEON/PacketMath.h:618 0x179f3cb10: v1f64 = EXTRACT_SUBREG 0x18f44ff10, 0x18f526b10 [ORD=23] [ID=29] 0x18f44ff10: v2f64,ch = CopyFromReg 0x1968c3918, 0x18f504310 [ORD=23] [ID=21] 0x18f504310: v2f64 = Register %vreg160 [ID=11] 0x18f526b10: i32 = TargetConstant<2> [ID=14] 0x18f53aa10: v1f64 = extract_subvector 0x18f44ff10, 0x179dfba10 [ORD=24] [ID=30] 0x18f44ff10: v2f64,ch = CopyFromReg 0x1968c3918, 0x18f504310 [ORD=23] [ID=21] 0x18f504310: v2f64 = Register %vreg160 [ID=11] 0x179dfba10: i64 = Constant<1> [ID=2] In function: _ZN5Eigen8internal29general_matrix_vector_productIldLi1ELb0EdLb0ELi1EE3runEllPKdlS4_lPdld clang: error: clang frontend command failed with exit code 70 (use -v to see invocation) Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn) Target: arm64-apple-darwin13.4.0 Thread model: posix Indeed, I only tested with NDK/clang and expected that xcode/clang would behave the same. That's really annoying. Do you know which predefined macro we could use to reliably detect Xcode? Would the following condition work: #if (EIGEN_COMP_GNUC_STRICT && defined(__ANDROID__)) || defined(__apple_build_version__) Regarding the ICE, the only solution I see is to use your (...)[0] suggestion for clang. I pushed the proposed changes by mistake: https://bitbucket.org/eigen/eigen/commits/c401a7c2da4e/ Changeset: c401a7c2da4e User: ggael Date: 2015-01-13 08:57:37+00:00 Summary: Bug 907, ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang https://bitbucket.org/eigen/eigen/commits/c85347b1546d/ Changeset: c85347b1546d User: ggael Date: 2015-01-13 09:03:00+00:00 Summary: Bug 907, ARM64: workaround ICE in xcode/clang Let's hope they are working for you, and if that's the case, feel-free to close the bug entry. Not sure what the best macro would be to specifically detect Xcode. We use #if defined(__APPLE__) && TARGET_OS_IPHONE to reliably detect iOS targets, and indeed, on the Mac/iOS platforms, __APPLE__ is always defined. Clang can be detected by __clang__, so the combination: #if defined(__APPLE__) && defined(__clang__) might get us most of the way. I'm not sure if the current behavior will be fixed in a later version (that's always hard to tell), but for sake of completeness, "/usr/bin/clang -dM -E -x c /dev/null" with Xcode 6.1 installed gives me: #define __clang__ 1 #define __clang_major__ 6 #define __clang_minor__ 0 #define __clang_patchlevel__ 0 #define __clang_version__ "6.0 (clang-600.0.56)" (...) Regarding the ICE, I guess my first suggestion using (...)[0] is not really portable, so we should restrict it to Clang/Xcode as much as possible. (At some point I want to try and build Clang from their SVN repo - it might be fixed after all - but I haven't figured out how to properly build it as cross-compiler yet...) Commit c401a7c2da4e solved that particular issue for me. That one's good to keep! Commit c85347b1546d unfortunately does not seem to fix the ICE - just realizing it still crashes. :( Not sure why it didn't do that before. I would suggest to back out that one until I have further insight into what's actually happening. Which I will try to dig deeper into, but it won't be immediately due to time constraints. By the way, it would be great if you could add #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wuninitialized" to the beginning (e.g. after the include guard), and #pragma GCC diagnostic pop to the end (e.g. before the include guard) of both Eigen/src/Core/arch/NEON/Complex.h and Eigen/src/Core/arch/NEON/PacketMath.h. This fixes ugly "variable 'res' is uninitialized when used here" warnings that are emitted by both(!) GCC and Clang. I just ran into the ICE too. I am told it is a known bug in Clang 3.5, fixed in Clang 3.6. For now, I'll push a cset disabling double vectorization on current values of __apple_build_version__. Also needed this: https://bitbucket.org/eigen/eigen/commits/03554639e953641077c3dc09d7c3c0439475c2bf Looks like this issue can be closed? --- can anyone still reproduce a problem after the above two changesets? From my side (bug reporter) this can be closed; cannot reproduce the problem anymore. In bug 1186 I'm proposing a change to the vreinterpretq_f64_u64 workaround to support newer Android/Aarch64/Clang toolchains also missing this intrinsic. -- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/907. |