This bugzilla service is closed. All entries have been migrated to
Bug 1731 - Column-wise Lp1 sum resulting in aliased calculation when rows==2
Summary: Column-wise Lp1 sum resulting in aliased calculation when rows==2
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - expression templates (show other bugs)
Version: 3.3 (current stable)
Hardware: All All
: Normal Wrong Result
Assignee: Nobody
Depends on:
Blocks: 3.4
  Show dependency treegraph
Reported: 2019-07-12 14:11 UTC by Alex S.
Modified: 2019-12-04 18:42 UTC (History)
3 users (show)

A counterexample. (813 bytes, text/plain)
2019-07-12 14:11 UTC, Alex S.
no flags Details

Description Alex S. 2019-07-12 14:11:09 UTC
Created attachment 946 [details]
A counterexample.

Consider the following code (minimal example produced thanks to distinguished IRC user @ChriSopht, who happened to have enough free time before I did and was kind enough to use it for this):
(in case Godbolt ever, god forbid, goes down, also attached).
The crux of it is the line:
arr.rowwise() /= arr.colwise().sum();
after which you'd expect each column to sum up to one (barring NaNs). Instead, part of the array will contain corrupted data.

The bug sources only when the number of rows is a. known at compile time b. exactly 2.
Further investigation shows that the first row of the array is normalized as expected, but the second seems to be divided by the updated sum. Therefore, it is likely an aliasing issue.
Increasing the number of rows supposedly buffs the complexity up enough to warrant a temporary, but I cannot easily test that hypothesis, as I'm not much of an assembly wizard.

It is clearly possible to encode this operation as a loop using O(1) memory, individually summing and dividing each column. Whether this is possible to account for with template expressions, I do not know. It is understandable if this is declared to be a clear aliasing problem, and decided to not be worth fixing, but I think at least this warrants a documentation warning, as the use-case is not obscure, and it also happens to work most of the time.

Best regards, Alex.
Comment 1 Christoph Hertzberg 2019-07-12 15:21:06 UTC
The behavior is very obvious for cols=1, i.e., 

void foo(Eigen::Array2f& A) {
    A.rowwise() /= A.colwise().sum();

This does the clearly not intended (
  A[0] /= A[0] + A[1];
  A[1] /= A[0] + A[1];

This would be even sub-optimal in case there was no aliasing, since the sum should not need to be calculated twice.

For N>2 or for more expensive reductions (e.g., norm() instead of sum()) it appears that there is a temporary, which is of course sub-optimal as well as it requires allocating memory (unless the number of columns is known at compile-time) -- i.e., our current arr.colwise().normalized() needlessly allocates.

For this case I see some not too complicated workarounds, but I guess a generic solution is not easy. If we can't fix this for 3.4, we should at least document it.

On the other hand, the following could rightfully be considered an aliasing issue, even though it would be trivial to evaluate correctly as well:

    A.rowwise() /= A.row(n);
Comment 2 Gael Guennebaud 2019-09-11 14:42:53 UTC
There is currently no easy way to evaluate an expression one columns at once. To this end we would have to extend the evaluation mechanism with something like:

  for each outer index j
    for each inner index i
        dst_evaluator.coeffRef(i,j) = src_evaluator.coeff(i,j)

and to make things more complicated, for:

    A.rowwise() /= A.colwise().sum();

if A is row-major, then the current implementation works row-by-row (for both vectorization and cache-friendliness):

    tmp = A.colwise().sum();
    for each row index i
      a.row(i) /= tmp;

To reduce the overhead of the tmp, you would like to work on panels:

    enum { P = some_factor * PacketSize };
    for each vertical panel of index j and width P
      for each row index i
        for j1 in j..j+P
          dst_evaluator.coeffRef(i,j1) = src_evaluator.coeff(i,j1)

In our case, start_evaluating_vertical_panel would bake the respective part of A.colwise().sum() into a statically allocated buffer. This is something I had in mind for a while, but this add a lot of complexity on top of an already extremely complicated evaluation mechanism.

So getting back to this precise aliasing issue, since it occurs for a very specific case only, I'd suggest to fix it by always baking the nested reduction within a temporary.
Comment 3 Nobody 2019-12-04 18:42:47 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance:

Note You need to log in before you can comment on or make changes to this bug.