This bugzilla service is closed. All entries have been migrated to
Bug 1438 - Implement numerically stable summation methods
Summary: Implement numerically stable summation methods
Alias: None
Product: Eigen
Classification: Unclassified
Component: Tensor (show other bugs)
Version: 3.3 (current stable)
Hardware: All All
: Low Accuracy Problem
Assignee: Nobody
Keywords: accuracy, wrong-result
Depends on:
Reported: 2017-06-20 04:54 UTC by Lakshay Garg
Modified: 2019-12-04 17:03 UTC (History)
6 users (show)

A proposed patch [Work in progress] (2.79 KB, patch)
2017-06-20 05:50 UTC, Lakshay Garg
no flags Details | Diff
Proof-of-concept tree reduction (2.63 KB, text/x-c++src)
2018-11-09 19:25 UTC, Christoph Hertzberg
no flags Details

Description Lakshay Garg 2017-06-20 04:54:37 UTC
Currently, when asked to sum a tensor, the library naively adds the numbers from left-to-right but this approach can lead to loss of precision in certain cases (example shown below). To prevent this loss of precision, I would like to propose the implementation of summation methods like Kahan or pairwise summation.

#include <iostream>
#include <Eigen/Dense>

int main() {
  using std::cout;

  Eigen::Matrix2f mat1, mat2;

  mat1 << 100000, 0.001, -100000, 0.001;
  cout << "sum1: " << mat1.sum() << '\n'; // expected: 0.002, output: 0

  mat2 << 0.001, 0.001, -1, 1;
  cout << "sum2: " << mat2.sum() << '\n'; // expected: 0.002, output: 0.002
Comment 1 Lakshay Garg 2017-06-20 05:50:33 UTC
Created attachment 791 [details]
A proposed patch [Work in progress]

I tried to resolve the issue by implementing a slightly modified version of the Kahan summation algorithm ( The attached patch basically removes the inbuilt SumReducer and replaces it with the new sum reduce in order to check if it makes a difference. All the unit tests which pass on the latest bitbucket version also pass after this patch but the problem still remains. Maybe I did not make changes at the appropriate place or I am not testing properly. Since this is my first contribution to eigen, any hints and suggestions are appreciated.
Comment 2 Gael Guennebaud 2017-06-25 06:45:55 UTC
Your path modifies the Tensor module only (so Eigen::Tensor objects only), whereas your test is checking Eigen::Matrix.
Comment 3 Lakshay Garg 2017-06-25 07:41:24 UTC
@gael Thanks for your reply. It would be very helpful if you could you point me to the files I need to modify in order to change the behaviour globally.
Comment 4 Christoph Hertzberg 2017-07-18 12:17:44 UTC
@Lakshay: In case you did not find it meanwhile: The sum (and other) reductions for matrices is implemented in Eigen/src/Core/Redux.h. Namely, you need to specialize/re-implement `redux_impl`, or you need to implement an alternative to `scalar_sum_op` (which is implemented in Eigen/src/Core/functors/BinaryFunctors.h).

However overall, I doubt that this is really worth the effort, except in special cases (and it is easy to construct cases, where even Kahan summation fails, e.g., some permutation of [1e20, 1, 1e-20, 1e-20, -1, -1e20]).
Comment 5 Lakshay Garg 2017-07-18 12:30:07 UTC
Thanks for the feedback Christopher. You are right in saying the Kahan summation does not work in all the cases but a better alternative is available which solves the problem. Please see for details.

Also, since this method is likely to reduce the efficiency of summation, I think it would make sense to implement it as a separate function.
Comment 6 Rasmus Munk Larsen 2018-09-12 21:20:17 UTC
Proposed solution in:
Comment 7 Christoph Hertzberg 2018-11-09 19:25:15 UTC
Created attachment 891 [details]
Proof-of-concept tree reduction

I attached a proof-of-concept tree reduction. It misses handling of the last elements (currently it only works correctly for sizes which are multiples of 64), and just implements Packet4f-based addition.

However, it is not only more accurate (at least for sufficiently large sizes), but also faster than the current implementation (as long as the input fits into the cache). Without properly handling the last elements, this comparison is of course slightly unfair.

Examples (Intel(R) Core(TM) i7 CPU 950  @ 3.07GHz):

$ ./a.out 128 5000
double: 10.0117, diff redux: 6.55651e-07, diff tree: 6.55651e-07
GFLOPS: redux_best: 6.74629, tree_best: 8.00961
double: 1.69471, diff redux: 3.57628e-07, diff tree: 0
GFLOPS: redux_best: 6.74629, tree_best: 8.00961
double: 3.73089, diff redux: -5.36442e-07, diff tree: 8.9407e-07
GFLOPS: redux_best: 6.74664, tree_best: 8.00961
$ ./a.out 1280000 500
double: -66.3165, diff redux: 0.00162375, diff tree: -8.52346e-05
GFLOPS: redux_best: 5.42728, tree_best: 5.59063
double: -350.063, diff redux: -0.0029037, diff tree: 5.65052e-05
GFLOPS: redux_best: 5.43354, tree_best: 5.59152
Comment 8 Gael Guennebaud 2018-11-13 12:54:39 UTC
Note that your reduce<> is equivalent to the already implemented redux_vec_unroller.
Comment 9 Nobody 2019-12-04 17:03:19 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance:

Note You need to log in before you can comment on or make changes to this bug.