This bugzilla service is closed. All entries have been migrated to
Bug 1270 - forcing vfmadd231ps assembly in Clang might not be necessary
Summary: forcing vfmadd231ps assembly in Clang might not be necessary
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - vectorization (show other bugs)
Version: 3.3 (current stable)
Hardware: x86 - AVX All
: Normal Optimization
Assignee: Nobody
Depends on:
Reported: 2016-08-04 08:39 UTC by lama.sabaa
Modified: 2019-12-04 16:04 UTC (History)
5 users (show)


Description lama.sabaa 2016-08-04 08:39:55 UTC
implementation of pmadd in AVX architecture forces generating vfmadd231ps in Clang with assmebly, this may not be profitable anymore since Clang no longer always "generates vfmadd213ps instruction plus some vmovaps on registers" like it says in the implementation comment.

commuting is done for memory and register operands and the correct fmadd permutation is chosen allowing optimizations such as Memory Folding.
so forcing assembly code might result in skipping optimization opportunities
Comment 1 Gael Guennebaud 2016-08-22 11:38:13 UTC
Do you known which clang version introduced this optimization?
Comment 2 lama.sabaa 2016-08-22 12:38:12 UTC
if I'm not mistaken this is the first patch introducing commutable fma oeprands: 

but there has been additional patches and changes since then, the latest being this one for AVX512:
Comment 3 Gael Guennebaud 2016-08-22 13:39:24 UTC
After benchmarking several clang versions, the first correct one is clang 3.8:
Comment 4 lama.sabaa 2016-08-22 13:42:13 UTC
nice, were there any performance improvements?
Comment 5 Gael Guennebaud 2016-08-22 19:51:03 UTC
no improvement because pmadd is currently only used in places where vfmadd231ps is really the right choice.
Comment 6 Nobody 2019-12-04 16:04:48 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance:

Note You need to log in before you can comment on or make changes to this bug.