...and then generates slow emulation code as the instruction isn't available.
The only work-around for that is going to be to disable FMA on Clang/ARM.
Affected Clang versions: at least Clang 3.8. (note though that 3.5 didn't seem to be affected).
Filed LLVM bug: