New user self-registration is currently disabled. Please email eigen-core-team @ lists.tuxfamily.org if you need an account.
Bug 1270 - forcing vfmadd231ps assembly in Clang might not be necessary
forcing vfmadd231ps assembly in Clang might not be necessary
Status: RESOLVED FIXED
Product: Eigen
Classification: Unclassified
Component: Core - vectorization
3.3 (current stable)
x86 - AVX All
: Normal Optimization
Assigned To: Nobody
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-08-04 08:39 UTC by lama.sabaa
Modified: 2016-08-22 19:51 UTC (History)
5 users (show)



Attachments

Description lama.sabaa 2016-08-04 08:39:55 UTC
implementation of pmadd in AVX architecture forces generating vfmadd231ps in Clang with assmebly, this may not be profitable anymore since Clang no longer always "generates vfmadd213ps instruction plus some vmovaps on registers" like it says in the implementation comment.

commuting is done for memory and register operands and the correct fmadd permutation is chosen allowing optimizations such as Memory Folding.
so forcing assembly code might result in skipping optimization opportunities
Comment 1 Gael Guennebaud 2016-08-22 11:38:13 UTC
Do you known which clang version introduced this optimization?
Comment 2 lama.sabaa 2016-08-22 12:38:12 UTC
if I'm not mistaken this is the first patch introducing commutable fma oeprands: 
https://github.com/llvm-mirror/llvm/commit/b1b4d0819577bfa2a7538b7b2437bc3ffd52d4bb 

but there has been additional patches and changes since then, the latest being this one for AVX512: 
https://reviews.llvm.org/D23108
Comment 3 Gael Guennebaud 2016-08-22 13:39:24 UTC
After benchmarking several clang versions, the first correct one is clang 3.8:

https://bitbucket.org/eigen/eigen/commits/7018fa9192e6/
Comment 4 lama.sabaa 2016-08-22 13:42:13 UTC
nice, were there any performance improvements?
Comment 5 Gael Guennebaud 2016-08-22 19:51:03 UTC
no improvement because pmadd is currently only used in places where vfmadd231ps is really the right choice.

Note You need to log in before you can comment on or make changes to this bug.