New user self-registration is currently disabled. Please email eigen-core-team @ lists.tuxfamily.org if you need an account.
Bug 1004 - LinSpaced last value might be greater than high
LinSpaced last value might be greater than high
Status: REOPENED
Product: Eigen
Classification: Unclassified
Component: Core - general
3.2
All All
: Normal Accuracy Problem
Assigned To: Nobody
: JuniorJob
Depends on:
Blocks: 3.4
  Show dependency treegraph
 
Reported: 2015-04-26 17:43 UTC by Daniel A Paladim
Modified: 2016-11-02 10:46 UTC (History)
4 users (show)



Attachments

Description Daniel A Paladim 2015-04-26 17:43:09 UTC
Sometimes, the last value of vector generated through LinSpaced will be greater than high. I report this an issue, because evaluating a function defined in an interval [low,high] through the values generated by LinSpaced(n,low,high) may cause the function to launch an exception complaining that values are outside the domain.

One combination of values,

	VectorXd v = VectorXd::LinSpaced(51,0.0,7.0);
	if(v.tail(1)(0) > end) {
		cout << "Bigger" << '\n';
	}

Matlab and Numpy seem to copy high into the last position.
Comment 1 Christoph Hertzberg 2015-04-26 19:17:35 UTC
We could do that at the cost of adding a branch (when accessing LinSpaced element-wise). O.t.o.h., we'd avoid a branch during construction (cf Bug 224).
Bug 698 is also related.
Comment 2 Daniel A Paladim 2015-04-28 09:00:35 UTC
What if the computation was like this,
low*((n-1-i)/double(n-1))+high*(i/double(n-1))
since in this case, we don't need the branch? Though, I haven't though about the packetOp.
Comment 3 Christoph Hertzberg 2015-04-28 09:14:00 UTC
This does not handle the special case n==1.
Also we should seek to reduce the number of divisions, if possible.
Comment 4 Daniel A Paladim 2015-04-28 09:38:05 UTC
I see. Can't we handle the n==1 in the constructor, by setting high and low to high for instance?
To reduce the number of divisions, the only idea that I have is to do implement sth like,
double ratio = i/double(n-1);
return low*(1-ratio)+high*ratio;

I can't think of another option without having a branch for every entry.
Comment 5 Christoph Hertzberg 2015-04-29 09:27:00 UTC
I experimented a bit and found that the following appears to work:

  double inv=1.0/(size-1); // done once at construction
  // done when accessing:
  res(i) = low*(1.0-i*inv) + high*(i*inv);

This may result in res(size-1)<high in several cases (however not more error than 1 to 4 ULP), but res(size-1)<=high as long as high is significantly higher than low (didn't measure this exactly).
Unfortunately, for high==low (or very close to each other) this results in many out-of-bound results, but so do both your proposals. Therefore, it's not even guaranteed that res(i)>=res(i-1), in this case.
Above, I always assumed high>=low, the other case is essentially equivalent.

I'm not sure how to properly proceed here, perhaps introducing different variants of LinSpaced for different use cases (we already distinguish between random access and linear access).
Bug 699 is also related to that (linear spaced sequence where the step-size is specified, instead of either the upper bound or the size).
Comment 6 Gael Guennebaud 2016-10-24 14:00:38 UTC
While fixing bug 698, I hit this issue in the unit tests and fixed it as follow:


in ctor:

m_size1(num_steps<=1 ? 1 : num_steps-1)
m_low1 = (low/Scalar(m_size1));
m_high1 = (high/Scalar(m_size1));

then in operator():

return m_low1*Scalar(m_size1-i) + Scalar(i)*m_high1;

which is even less expensive and probably more accurate as for i==m_size1 we exactly get rid of the lower bound, and get:

Scalar(m_size1) * ( m_high1/Scalar(m_size1) )

We would also need to disable the sequential path, which is really inaccurate anyway.
Comment 7 Christoph Hertzberg 2016-10-24 14:43:28 UTC
(In reply to Gael Guennebaud from comment #6)
> return m_low1*Scalar(m_size1-i) + Scalar(i)*m_high1;
> 
> which is even less expensive and probably more accurate as for i==m_size1 we
> exactly get rid of the lower bound, and get:
> 
> Scalar(m_size1) * ( m_high1/Scalar(m_size1) )

This is surprisingly accurate, I still can produce cases where this gets inaccurate by 1 ULP, e.g. (using double precision and no optimization, storing intermediate results in variables):

   58.0 / 7 * 7 - 58.0 = 7.10543e-15
   61.0 / 7 * 7 - 61.0 = -7.10543e-15

However, I'm afraid, we won't be able to find an implementation that works perfectly (without introducing lots of if-branches). 
Also your implementation should at least guarantee monotonousness (which is sometimes just as crucial) and the border elements should not be off by more than 1ULP.
Comment 8 Christoph Hertzberg 2016-10-24 14:57:50 UTC
(In reply to Gael Guennebaud from comment #6)
> We would also need to disable the sequential path, which is really
> inaccurate anyway.

Yes, I totally agree. One of the first textbook examples in numerics is that adding up lots of 0.1 is doomed to fail e.g.:

    Eigen::VectorXd fail = Eigen::VectorXd::LinSpaced(Eigen::Sequential, 10000001, 0.0, 1000000.0);
    std::cout << fail.tail<5>().array() - 1000000.0 << "\n";

results in:
-0.400089
-0.300089
-0.200089
-0.100089
        0

expected behavior: [-0.4 -0.3 -0.2 -0.1 0]
Comment 9 Gael Guennebaud 2016-10-24 15:46:34 UTC
Here is one variant of the current implementation that exactly reproduces both the low and high bounds:

ctor:
m_step = (high-low)/Scalar(num_steps-1);
m_corr = (high - (m_low+Scalar(num_steps-1)*m_step))/(num_steps-1);

operator():
return m_low+i*m_step + i*m_corr;
Comment 10 Gael Guennebaud 2016-10-24 18:28:04 UTC
https://bitbucket.org/eigen/eigen/commits/e87556af3ee1/
Summary:     Bug 1004: remove the inaccurate "sequential" path for LinSpaced, mark respective function as deprecated, and enforce strict interpolation of the higher range using a correction term.
Now, even with floating point precision, both the 'low' and 'high' bounds are exactly reproduced at i=0 and i=size-1 respectively.
Comment 11 Christoph Hertzberg 2016-10-24 21:06:42 UTC
(In reply to Gael Guennebaud from comment #9)

It is admittedly hard to find counter examples, but see this (based on the counter example in comment 7):

    VectorXd a = VectorXd::LinSpaced(8, -7L<<60, 58);
    VectorXd b = VectorXd::LinSpaced(8, -7L<<60, 61);
    std::cout << a(7)-58.0 << "\n" << b(7) - 61.0 << "\n";

I'm not advocating on supporting every pathological example, I mereley wanted to say that it is hard to guarantee absolute accuracy here. (This very example could be supported with yet another correction term, which may be enough for all finite values of high and low)

Btw: Matlab appears to have a pragmatic solution, which calculates 
    low + i*((high-low)/(size-1))
for the "middle" elements and then replaces the first element by low and the last by high (an assumption based on the following test inputs):

>> linspace(0,1/0,4)
ans = 
    0  Inf  Inf  Inf
>> linspace(1/0,0,4)
ans =
    Inf  NaN  NaN  0
Comment 12 Gael Guennebaud 2016-10-25 06:54:43 UTC
ah, to tell the truth I'm not entirely surprised that you found a counter exemple, what really surprised me is that it worked without failure on millions of runs of the unit tests, so I have too fast in my conclusions. I guess that's because the vector sizes in our unit tests are too small.

I've also though about the pragmatic solution to replace the last element with 'high', but we can only do that with the setLinSpaced() functions, not for the static MatrixXd::LinSpaced() functions (that would require a branch).
Comment 13 Gael Guennebaud 2016-10-25 07:35:54 UTC
Ok, so if we mix the approach of comment 6 with the correction term of comment 9, then we should improve robustness because the initial error magnitude is limited to 1 ULP.
Comment 14 Gael Guennebaud 2016-10-25 07:43:13 UTC
Indeed, this seems to work even with high ranges:

ctor:
m_size1 = num_steps==1 ? 1 : num_steps-1;
m_low1 = low/m_size1;
m_high1 = high/m_size1;
m_corr = (high-(num_steps-1)*m_high1)/Scalar(num_steps<=1 ? 1 : num_steps-1);

operator():
return (m_size1-i)*m_low1 + i*m_high1 + i*m_corr;

test code:
for(int k=0;k<10000;++k) {
  int n = internal::random<int>(2,10000000);
  VectorXd a = VectorXd::LinSpaced(n, -pow(4*abs(internal::random<double>()),internal::random<double>(10,300)), 58);
  VectorXd b = VectorXd::LinSpaced(n, -pow(4*abs(internal::random<double>()),internal::random<double>(10,300)), 61);
  std::cout << a(n-1) - 58.0 << "  " << b(n-1) - 61.0 << "\n";
}

I'll do a benchmark to evaluate the overhead.
Comment 15 Christoph Hertzberg 2016-10-25 07:51:01 UTC
IMO it would be sufficient to warn that for some extreme border cases (e.g., borders are orders of magnitudes apart, or borders are subnormal numbers), the last value can be off by 1 (or 2?) ULP. We must also warn that for non-finite borders the result will likely not be useful.
I don't think it is really worth the effort to overcompensate here. Especially, some use cases like   LinSpaced(size, -1, 1).array().asin()   should be fine (with the implementation as of comment 9). If the overhead is negligible, I would not oppose the extra term, though.
Comment 16 Gael Guennebaud 2016-10-25 11:11:04 UTC
It can be seriously off, e.g., with the impl. of comment 9:

VectorXd c =  VectorXd::LinSpaced(5892402,-9.855311729712581e+79,61);
cout << c.tail(5).transpose() << endl;

gives:
-6.69018e+73 -5.01764e+73 -3.34509e+73 -1.67255e+73  -1.4615e+48

Eigen 3.2 is even worse:
-6.69018e+73 -5.01764e+73 -3.34509e+73 -1.67255e+73  -1.3164e+64

The solution of comment 13 gives:

-6.69018e+73 -5.01764e+73 -3.34509e+73 -1.67255e+73           61


Regarding relative speeds with AVX, and taking Eigen3.2 as the basis:

Eigen 3.2:  "m_low + i*m_step" -> 1s
Comment 9:  "m_low + i*m_step + i*m_corr" -> 1.3s
Comment 14: "(m_size1-i)*m_low1 + i*m_high1 + i*m_corr" -> 2s

As a compromise, setLinSpaced could call a "fast" path (i.e., m_low + i*m_step) and overwrite the last entry, whereas LinSpaced() would call a 2x slower but still accurate implementation.
Comment 17 Gael Guennebaud 2016-10-25 11:25:57 UTC
btw, the solution of comment 9 overestimate high, so LinSpaced(size, -1, 1).array().asin() with n large returns NaNs.
Comment 18 Gael Guennebaud 2016-10-25 12:05:02 UTC
For the record, I reversed engineered octave and matlab, and for the intermediate values they do:

Octave: low + i * ((high-low)/(size-1))

MatLab: low + ((high-low) * i) / (size-1)

So octave is like Eigen 3.2, and MatLab does one costly division.
Comment 19 Gael Guennebaud 2016-10-25 14:55:45 UTC
According to discussions on IRC:

https://bitbucket.org/eigen/eigen/commits/153266fe2b36/
Summary:     Bug 1004: one more rewrite of LinSpaced for floating point numbers to guarantee both interpolation and monotonicity.
This version simply does low+i*step plus a branch to return high if i==size-1.
Vectorization is accomplished with a branch and the help of pinsertlast.
Some quick benchmark revealed that the overhead is really marginal, even when filling small vectors.

PS: this commit also includes a workaround for bug 1328 that I did not intend to include here.
Comment 20 Zsbán Ambrus 2016-10-28 15:38:37 UTC
I tried to find a good reference implementation of LinSpaced, whether in pseudocode or source code, but haven't found anything yet.  So instead, I've been thinking.  The disclaimer is that I'm not a numeric analysis expert, so this could be wrong.

So recapping the problem.  You have a real floating point type Scalar, and the parameters (Index n, Scalar a, Scalar b).  You want to compute the linspaced values x(k) for each integer k between 0 and n half-inclusive, where the ideal numeric value of x(k) is a+k*(b-a)/(n-1) (interpreted as exact operations on real numbers).  In my opinion, a good way to compute this would be the following.

Precompute: Scalar d = (b-a)/(n-1); bool c = abs(a)<=abs(b);
Then for each index: x(k) = c ? a+d*k : b-d*(n-k-1);

This typically gives a value close to the correct one.  There are three bad cases possible.  Two rare problems are if (b-a) overflows the exponent and if (b-a)/n underflows the exponent, in which case you're screwed.  A somewhat more common problem is when abs(a) and abs(b) are of opposite signs and one of the intermediate x(k) coefficients is close to zero, in which case you could get a large relative error on that value, althogh the absolute error of that coefficient divided by min(abs(a),abs(b)) is still small.  I don't think this is often a practical problem, and I don't know a cheap way to avoid it.

The endpoint values x(0) and x(n-1) will be very close to correct (with relative errors only at most a few units of least precision), but not necessarily exact.  If you want the correct endpoint value, you have to replace one of the endpoints explicitly, like you suggested before.  This complicates the method somewhat, especially since the docs promise that if n==1 then x(0)=b.  You probably can't do much better than 

x(k) = n-1==k ? b : c ? a+d*k : 0==k ? a : b-d*(n-k-1);

Anyway, for float and double values, if you want more than one value at the same time, it is possible to vectorize this computation.  In that case, you can do some of the index arithmetic as Scalar instead of Index.

Although you can modify the computation a lot of ways, the important part of what I'm suggesting here is that you definitely need to check abs(a)<=abs(b), and branch or mask on it.  You can't take a shortcut and use only the the c branch or only the !c branch.  The explanation for why this is needed is the following.  

Suppose someone were to call your function on float values with n=1000000, a=1.0f, b=1.0e-6f, k=n-2.  The mathematically correct value for x(k) is then very close to 2.0e-6.  In this case (a+d*k) gives the very imprecise value near 2.0266e-6, but the computation in the correct branch (b-d*(n-k-1)) gives the correct result.  Similarly, if n=1000000, a=1.0e-6f, b=1.0f, k=1, then (a+d*k) gives the correct result 2.0e-06, but (b-d*(n-k-1)) gives a very imprecise value near 2.0266e-6.

All this applies only to real floating point types.  For complex numbers, you may have to do the same computation twice, separately for the real part and the imaginary part.  For integers, I'm not saying anything here.
Comment 21 Gael Guennebaud 2016-10-29 15:03:09 UTC
(In reply to Zsbán Ambrus from comment #20)
> Precompute: Scalar d = (b-a)/(n-1); bool c = abs(a)<=abs(b);
> Then for each index: x(k) = c ? a+d*k : b-d*(n-k-1);

Indeed, this strategy reduces numerical cancellation in a similar way than one of the previous proposal:

(n-k-1)*(a/(n-1)) + k*(b/(n-1));

that avoids branching but fails if a and b are too close.
Comment 22 Gael Guennebaud 2016-11-02 10:46:39 UTC
I did some benchmark, and the overhead of the additional branch is acceptable. Looks like modern CPUs (and compilers?) are extremely good at handling such kind of branching. So here goes the patch:

https://bitbucket.org/eigen/eigen/commits/95c579161b9f/
Summary:     Bug 1004: improve accuracy of LinSpaced for abs(low) >> abs(high).

Handling std::complex is left as future work (JuniorJob).

In the future, we could also optimize setLinSpaced with custom vectorized loop to gain some speedup when upmost performance matters (not a junior-job).

For the record, I've tried to remove one branch through a more sophisticated initialization:

Intit:
  bool flip = abs(low)>abs(high);
  if(flip) swap(low,high);
  offset = flip ? (1-size) : 0;
  step = (high-low)/(size-1);

operator()
  return low + (i+offset) * step;

But this was not faster, rather the opposite...

Note You need to log in before you can comment on or make changes to this bug.