Created attachment 946 [details] A counterexample. Consider the following code (minimal example produced thanks to distinguished IRC user @ChriSopht, who happened to have enough free time before I did and was kind enough to use it for this): https://godbolt.org/z/frbXeS (in case Godbolt ever, god forbid, goes down, also attached). The crux of it is the line: arr.rowwise() /= arr.colwise().sum(); after which you'd expect each column to sum up to one (barring NaNs). Instead, part of the array will contain corrupted data. The bug sources only when the number of rows is a. known at compile time b. exactly 2. Further investigation shows that the first row of the array is normalized as expected, but the second seems to be divided by the updated sum. Therefore, it is likely an aliasing issue. Increasing the number of rows supposedly buffs the complexity up enough to warrant a temporary, but I cannot easily test that hypothesis, as I'm not much of an assembly wizard. It is clearly possible to encode this operation as a loop using O(1) memory, individually summing and dividing each column. Whether this is possible to account for with template expressions, I do not know. It is understandable if this is declared to be a clear aliasing problem, and decided to not be worth fixing, but I think at least this warrants a documentation warning, as the use-case is not obscure, and it also happens to work most of the time. Best regards, Alex.
The behavior is very obvious for cols=1, i.e., void foo(Eigen::Array2f& A) { A.rowwise() /= A.colwise().sum(); } This does the clearly not intended (https://godbolt.org/z/LJldfC): A[0] /= A[0] + A[1]; A[1] /= A[0] + A[1]; This would be even sub-optimal in case there was no aliasing, since the sum should not need to be calculated twice. For N>2 or for more expensive reductions (e.g., norm() instead of sum()) it appears that there is a temporary, which is of course sub-optimal as well as it requires allocating memory (unless the number of columns is known at compile-time) -- i.e., our current arr.colwise().normalized() needlessly allocates. For this case I see some not too complicated workarounds, but I guess a generic solution is not easy. If we can't fix this for 3.4, we should at least document it. On the other hand, the following could rightfully be considered an aliasing issue, even though it would be trivial to evaluate correctly as well: A.rowwise() /= A.row(n);
There is currently no easy way to evaluate an expression one columns at once. To this end we would have to extend the evaluation mechanism with something like: for each outer index j src_evaluator.start_evaluating_sub_vector(j) for each inner index i dst_evaluator.coeffRef(i,j) = src_evaluator.coeff(i,j) and to make things more complicated, for: A.rowwise() /= A.colwise().sum(); if A is row-major, then the current implementation works row-by-row (for both vectorization and cache-friendliness): tmp = A.colwise().sum(); for each row index i a.row(i) /= tmp; To reduce the overhead of the tmp, you would like to work on panels: enum { P = some_factor * PacketSize }; for each vertical panel of index j and width P src_evaluator.start_evaluating_vertical_panel<P>(j); for each row index i for j1 in j..j+P dst_evaluator.coeffRef(i,j1) = src_evaluator.coeff(i,j1) In our case, start_evaluating_vertical_panel would bake the respective part of A.colwise().sum() into a statically allocated buffer. This is something I had in mind for a while, but this add a lot of complexity on top of an already extremely complicated evaluation mechanism. So getting back to this precise aliasing issue, since it occurs for a very specific case only, I'd suggest to fix it by always baking the nested reduction within a temporary.
-- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1731.