This bugzilla service is closed. All entries have been migrated to
Bug 1737 - Consider vectorization with strided input
Summary: Consider vectorization with strided input
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - vectorization (show other bugs)
Version: 3.4 (development)
Hardware: All All
: Low Optimization
Assignee: Nobody
Depends on:
Reported: 2019-08-07 16:10 UTC by Christoph Hertzberg
Modified: 2019-12-04 18:44 UTC (History)
4 users (show)


Description Christoph Hertzberg 2019-08-07 16:10:59 UTC
For sufficiently expensive operations it could be worth to vectorize even if the input or output is not consecutively stored in memory. 
For a very trivial case of an unary operator with SSE4.1 and only the input having a dynamic stride, this would cost 1 movss and 3 insertps instructions instead of one movups to load the data -- but could accelerate the computation by almost a factor of 4:

Further optimizations are possible if stride is known at compile-time to be exactly 2 (load two times 16 byte and shuffle them together).

Storing could equivalently be done using extractps (or for stride=2, two shuffles and masked-stores, if they are available)

Comment 1 Nobody 2019-12-04 18:44:14 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance:

Note You need to log in before you can comment on or make changes to this bug.