New user self-registration is disabled due to spam. Please email eigen-core-team @ if you need an account.
Bug 613 - Bug in internal::psqrt SSE implementation
Summary: Bug in internal::psqrt SSE implementation
Alias: None
Product: Eigen
Classification: Unclassified
Component: Core - general (show other bugs)
Version: unspecified
Hardware: x86 - SSE Linux
: Normal Unknown
Assignee: Nobody
Depends on:
Reported: 2013-06-13 17:54 UTC by Jeff Dean
Modified: 2013-06-14 09:52 UTC (History)
3 users (show)

Patch to fix the problem. However, it has a lengthy comment that you probably can just remove from the real distribution. (3.40 KB, patch)
2013-06-13 17:54 UTC, Jeff Dean
no flags Details | Diff

Description Jeff Dean 2013-06-13 17:54:42 UTC
Created attachment 346 [details]
Patch to fix the problem.  However, it has a lengthy comment that you probably can just remove from the real distribution.

The internal::psqrt routine in Eigen/src/Core/arch/SSE/MathFunctions.h appears to have a bug whereby values in the range [1.17549e-038, 1.19209e-007] return a square root value of exactly 0.0, rather than a true square root.  This is because the code that computes the non_zero_mask on the Packet4f entry by comparing to see which values are greater than std::numeric_limits<float>::epsilon() (i.e. 1.19209e-007), rather than std::numeric_limits<float>::min() (i.e. 1.17549e-038).

The fix is to change this line:

  Packet4f non_zero_mask =
     _mm_cmpgt_ps(_x, pset1<Packet4f>(std::numeric_limits<float>::epsilon()));


  Packet4f non_zero_mask =
     _mm_cmpgt_ps(_x, pset1<Packet4f>(std::numeric_limits<float>::min()));

A test program that exhaustively shows where values in this range compute sqrt results that differ by more than 0.00001 from the default sqrt(float) implementation is below.

It might be worth incorporating the style of this test program into the library to test all the other specialized Packet implementations in MathFunctions.h in a similar manner (i.e. make sure that the Packet SSE implementations compute answers that are very similar to the default single-value implementations for a large number of randomly selected floating point values).

        Bug fix for Eigen3 psqrt() operation, where values that were
        between 0 and std::numeric_limits<float>::epsilon() (1.19209e-007)
        would be given a value of 0, rather than something closer to
        what the sqrt(x) function in math.h would return.
        Wrote simple test program (shown below) to illustate the problem:
        % g++ -o sqrt_issue -lm -I/google/src/cloud/jeff/eigen-sqrt/google3/third_party/eigen3/
        Without this fix, the program generates the following output:
          % ./sqrt_issue
          Sampled 873254095 values out of 873254095
          Mismatch sample: sqrt() 0 vs. SQRT 0.000345043 for 1.19055e-07
          Mismatch sample: sqrt() 0 vs. SQRT 0.000345146 for 1.19126e-07
          Mismatch sample: sqrt() 0 vs. SQRT 0.000345249 for 1.19197e-07
          Mismatches: 86251777 of 873254095
        With the fix, the program shows 0 mismatches:
          % ./sqrt_issue
          Sampled 873254095 values out of 873254095
          Mismatches: 0 of 873254095
        Where is:
        /* Program to illustrate an issue with the Eigen psqrt implementation
           for SSE.
           Author: Jeff Dean (
        #include <stdio.h>
        #include <math.h>
        #include <vector>
        #include "Eigen/Core"
        typedef unsigned int       uint32;
        namespace {
        float SQRT(float x) {
          return sqrt(x);
        int main(int argc, char** argv) {
          std::vector<float> v;
          float max_value = 1.1 * std::numeric_limits<float>::epsilon();
          int count = 0;
          for (uint32 i = 0; i < (1ull<<32)-1; i++) {
            float f = *(reinterpret_cast<float*>(&i));
            if (f >= 0.0 && f <= max_value) {
              // Randomly sample problematic values
          const int N = v.size();
          fprintf(stderr, "Sampled %d values out of %d\n", N, count);
          std::vector<float> v2(N), v3(N);
          Eigen::Map<Eigen::ArrayXf> r1(&v2[0], N);
          Eigen::Map<Eigen::ArrayXf> r2(&v3[0], N);
          Eigen::Map<Eigen::ArrayXf> in(&v[0], N);
          r1 = in.sqrt();
          r2 = in.unaryExpr(std::ptr_fun(SQRT));
          int mismatches = 0;
          for (int i = 0; i < N; i++) {
            if (fabs(r1[i] - r2[i]) > 0.00001) {
              if ((mismatches % 10000) == 0) {
                        "Mismatch sample: sqrt() %g vs. SQRT %g for %g\n",
                        r1[i], r2[i], in[i]);
          fprintf(stderr, "Mismatches: %d of %d\n", mismatches, N);
Comment 1 Benoit Jacob 2013-06-13 18:17:43 UTC
That's indeed a legitimate bug, as sqrt (like any power function with exponent strictly between 0 and 1) has the property of making small numbers less small. I agree with the proposed 1-line change, for what it's worth.
Comment 2 Gael Guennebaud 2013-06-13 18:19:03 UTC
Thanks a lot for this very detailed report. Fixed with regression unit tests:
Changeset:   99f4e5961f47
User:        ggael
Date:        2013-06-13 18:12:58
Summary:     Extend the magnitude range of tested numbers in packet math unit tests
Changeset:   34f92f66abdb
User:        Jeff Dean
Date:        2013-06-13 18:17:27
Summary:     Fix bug 613: psqrt was incorrect for small numbers
Comment 3 Christoph Hertzberg 2013-06-13 20:46:34 UTC
Wouldn't it be easier to just compare by zero? Or do I miss something here?
Comment 4 Gael Guennebaud 2013-06-13 22:04:13 UTC
We cannot because then rsqrt will overflow for a value like (0.5 * std::numeric_limits<float>::min()). Test program:

#include <Eigen/Core>
#include <iostream>
using namespace Eigen;
int main()
   float a = 0.5 * (std::numeric_limits<float>::min)();
   Array<float,1,4> A(a,a,a,a), B;
   B = A.sqrt();
   std::cout << a <<  " -> " << sqrt(a) << " == " << 1.f/std::sqrt(1.f/a)  << " == " << B << "\n";
   return 0.;

On the other hand, we could change _mm_cmpgt_ps to _mm_cmpge_ps.
Comment 5 Jeff Dean 2013-06-14 01:11:06 UTC
(In reply to comment #3)
> Wouldn't it be easier to just compare by zero? Or do I miss something here?

Thanks for fixing.

+Christoph Hertzberg: I initially tried comparing with 0.0, and got a few -Inf sqrt results when I implemented it that way.  My test program will print out a few examples of failing values if you run it when the line in the bug fix is instead set to be:

  Packet4f non_zero_mask = _mm_cmpgt_ps(_x, pset1<Packet4f>(.0f));
Comment 6 Christoph Hertzberg 2013-06-14 09:52:16 UTC
@Jeff: Yes, I discussed this with Gael on IRC yesterday. The problem is that mm_rsqrt returns +Inf for denormalized numbers. I should have posted that here, to avoid confusion ...

Note You need to log in before you can comment on or make changes to this bug.