This bugzilla service is closed. All entries have been migrated to
Bug 70 - Leverage SSE4
Summary: Leverage SSE4
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - vectorization (show other bugs)
Version: unspecified
Hardware: All All
: Lowest Optimization
Assignee: Gael Guennebaud
Keywords: JuniorJob
Depends on:
Reported: 2010-10-16 04:57 UTC by Benoit Jacob
Modified: 2019-12-04 09:44 UTC (History)
4 users (show)


Description Benoit Jacob 2010-10-16 04:57:25 UTC
Make use of more SSE4 instructions, especially for dot products.
Comment 1 Gael Guennebaud 2013-03-20 18:31:59 UTC
Some progress for min/max:
changeset:   761df4c9b3b7
user:        ggael
date:        2013-03-20 18:28:40
summary:     Add SSE4 min/max for integers


- ceil/floor could be easily added
- dpps instruction is not easy to exploit in Eigen because they all are decomposed into a product and a sum reduction.  We should bench it first to see the potential.

Move to 3.3
Comment 2 Christoph Hertzberg 2014-09-07 16:38:58 UTC
I agree with Gael on dpps -- it might be only useful for small vectors. Otherwise, I guess mulps and addps with a final reduction at the end should be faster.

ceil/floor ==> JuniorJob. 
This could also be vectorizable with SSE2 using conversion to int and some handling of special cases.
Comment 3 Gael Guennebaud 2015-11-04 17:29:15 UTC
Add round, ceil and floor for SSE4.1/AVX (Bug #70)
Comment 4 Nobody 2019-12-04 09:44:30 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance:

Note You need to log in before you can comment on or make changes to this bug.