This bugzilla service is closed. All entries have been migrated to
Bug 939 - Optimization clamping kc to 360 or 240 is not justified by a comment, and is detrimental on Nexus 4
Summary: Optimization clamping kc to 360 or 240 is not justified by a comment, and is ...
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - matrix products (show other bugs)
Version: unspecified
Hardware: All All
: Normal Unknown
Assignee: Nobody
Depends on:
Blocks: 937
  Show dependency treegraph
Reported: 2015-01-25 01:09 UTC by Benoit Jacob
Modified: 2019-12-04 14:08 UTC (History)
2 users (show)


Description Benoit Jacob 2015-01-25 01:09:45 UTC
computeProductBlockingSizes has this code limiting the value of the kc blocking parameter to 360, or 240 for big Scalar types:

k = std::min<SizeType>(k,sizeof(LhsScalar)<=4 ? 360 : 240);

The other blocking parameters mc and nc are also clamped, though to higher values.

This optimization should be justified by a comment. Please add one?

Moreover, this optimization is *detrimental* on a Nexus 4 (ARM) device. See the attachment in bug 937 comment 3. It shows that for large enough products, the optimal power-of-two value of kc can easily be 512, and for 1024^3 matrix products, kc=1024 or kc=512 both perform optimally, while kc<=256 performs at least 10% worse (see the bottow of that file for the 1024^3 case).

On a Core i7, the data in bug 937 comment 1 does confirm that kc=256 is the highest possible optimal power-of-two size. Still I would like to understand where the value 360 comes from?
Comment 1 Gael Guennebaud 2015-01-26 18:42:07 UTC
These numbers (240/360) are the values giving best performance for very large matrices on i7. These numbers have been introduced when the previous heuristic based on caches sizes was not valid anymore. Then we forgot to update this part of the code, but clearly, those numbers have to be removed by a more general heuristic.

Also, kc does not have to be a power-of-two, a multiple of 16 will do.
Comment 2 Benoit Jacob 2015-01-28 18:25:10 UTC
Thanks for this explanation. So this is a prime example of ad-hoc logic. I initially thought that ad-hoc was bad and we should be able to have nice universal logic instead, but I was wrong. See bug 937 comment 8. Let's instead embrace ad-hoc logic, keep this, and just make it Intel-only while developing a different ad-hoc logic for ARM.
Comment 3 Gael Guennebaud 2015-02-26 17:06:59 UTC
The situation is much better now:
Changeset:   52572e60b5d3
User:        ggael
Date:        2015-02-26 15:04:35+00:00
Summary:     Implement a more generic blocking-size selection algorithm.
Comment 4 Nobody 2019-12-04 14:08:16 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance:

Note You need to log in before you can comment on or make changes to this bug.