This bugzilla service is closed. All entries have been migrated to https://gitlab.com/libeigen/eigen
Bug 70 - Leverage SSE4
Summary: Leverage SSE4
Status: RESOLVED FIXED
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - vectorization (show other bugs)
Version: unspecified
Hardware: All All
: Lowest Optimization
Assignee: Gael Guennebaud
URL:
Whiteboard:
Keywords: JuniorJob
Depends on:
Blocks:
 
Reported: 2010-10-16 04:57 UTC by Benoit Jacob
Modified: 2019-12-04 09:44 UTC (History)
4 users (show)



Attachments

Description Benoit Jacob 2010-10-16 04:57:25 UTC
Make use of more SSE4 instructions, especially for dot products.
Comment 1 Gael Guennebaud 2013-03-20 18:31:59 UTC
Some progress for min/max:

https://bitbucket.org/eigen/eigen/commits/761df4c9b3b7/
changeset:   761df4c9b3b7
user:        ggael
date:        2013-03-20 18:28:40
summary:     Add SSE4 min/max for integers


Then:

- ceil/floor could be easily added
- dpps instruction is not easy to exploit in Eigen because they all are decomposed into a product and a sum reduction.  We should bench it first to see the potential.

Move to 3.3
Comment 2 Christoph Hertzberg 2014-09-07 16:38:58 UTC
I agree with Gael on dpps -- it might be only useful for small vectors. Otherwise, I guess mulps and addps with a final reduction at the end should be faster.

ceil/floor ==> JuniorJob. 
This could also be vectorizable with SSE2 using conversion to int and some handling of special cases.
Comment 3 Gael Guennebaud 2015-11-04 17:29:15 UTC
https://bitbucket.org/eigen/eigen/commits/bbee0451d351/
Add round, ceil and floor for SSE4.1/AVX (Bug #70)
Comment 4 Nobody 2019-12-04 09:44:30 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/70.

Note You need to log in before you can comment on or make changes to this bug.