This bugzilla service is closed. All entries have been migrated to
Bug 585 - Optimize any()/all()/count() reductions
Summary: Optimize any()/all()/count() reductions
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - vectorization (show other bugs)
Version: 3.3 (current stable)
Hardware: All All
: Normal Optimization
Assignee: Nobody
Depends on: 97 272
Blocks: 3.x
  Show dependency treegraph
Reported: 2013-04-12 14:28 UTC by Christoph Hertzberg
Modified: 2019-12-04 12:15 UTC (History)
2 users (show)


Description Christoph Hertzberg 2013-04-12 14:28:29 UTC
Assuming these reductions are applied on the result of SSE-comparisons, it's most likely faster to bit_and/bit_or some consecutive results then _mm_movemask_pX the result to an integer and compare that against 0x0, 0x3 or 0xF. This should reduce latency and the number of branches.

I'm not sure, if this is related to bug 65.
Comment 1 Gael Guennebaud 2013-04-12 17:31:45 UTC
Bug 65 is on vectorizing vertical reductions on row-major matrices, so not related.

_mm_movemask_pX is pretty expensive, but its overhead is probably compensated by allowing vectorization on the input expression. However, we should first vectorize comparisons...
Comment 2 Christoph Hertzberg 2013-04-18 19:10:57 UTC
movemask can be expensive indeed (depending on the processor), so sufficiently many comparison results should be and/or-ed before using that.
For SSE4.1 there would be the more efficient alternative to use PTEST.
Comment 3 Christoph Hertzberg 2015-04-22 09:25:14 UTC
count() could also be improved. On SSE it would simply be subtracting the masks from a counting register using integer arithmetic, because int('true')==-1 for SSE.
Comment 4 Nobody 2019-12-04 12:15:19 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance:

Note You need to log in before you can comment on or make changes to this bug.