Created attachment 934 [details]
File containing program and input data to reproduce the problem
The BDCSVD function for computing singular value the decomposition of a matrix gets stuck in a loop when using mpreal for a certain input.
To reproduce this compile the attached program, on my system this is done with
g++ -Wall -I/usr/include/eigen3/ eigen_svd_bug.cpp -o eigen_svd_bug -lmpfr
and then run it with the input from the attached text file
./eigen_svd_bug < bug-matrix-mpfr
It then gets stuck in an loop and does not return. It seems to be very sensitive to the input data, for example just changing the precision to 63 (or 65) bits instead of 64 makes it return.
I have been trying to identify the problem and it seems to be the final while loop in the function computeSingVals that gets stuck
while (rightShifted - leftShifted > Literal(2) * NumTraits<RealScalar>::epsilon() * numext::maxi<RealScalar>(abs(leftShifted), abs(rightShifted)))
I think the problem could be related to the fact that mpreal (mpfr) has a much wider exponent then normal double precision floating points and the worst case scenario for this loop is thus much worse for mpreal than for double. However I'm not sure this is the case and I'll try to look into it a bit more, getting back if I find anything.
The D&C SVD algorithm is very sensitive to the right choice of numerical threshold values, and I'm not entirely surprised it exhibits some issue with something else than float and double.
Regarding this infinite loop, I guess that if you compile with EIGEN_INTERNAL_DEBUGGING, then you should get the following internal assertion:
eigen_internal_assert(fLeft * fRight < Literal(0));
or perhaps even some earlier assertions.
ok, there seems to be a problem with the input vector z of computeSingVals:
4.70966 0 0 0 0 0 0 0 0 0 0 -2.79106e-18 9.23451e-18 -2.02131e-15 5.69823e-15 -1.2789e-11 -2.34678e-09 8.1122e-10 -9.58352e-09 4.86642e-08 -7.05203e-09 -5.02109e-09 -6.01444e-11 -1.26691e-10 6.67011e-11 -3.42708e-12 1.76326e-12 -1.382e-14 1.6142e-16 -6.13475e-17 7.17659e-18 0 3.63408e-18 0 0 0 0 0 0 3.167e-18
which contains an unexpected 0 at position 32, exactly where the infinite loop occurs. The positions 32 and 33 should have been swapped...
hm, no that's a bit more complicated....
The problem is the following:
In the given [left,right] range, we first test the middle:
RealScalar mid = left + (right-left) / Literal(2);
RealScalar fMid = secularEq(mid, col0, diag, perm, diag, Literal(0));
and shift to the left or right depending on whether fMid is positive or negative:
diagShifted = diag - shift;
In our case fMid is negative, so shift==right.
But then if we evaluate the secular equation at fMid after shifting, then we get a positive value, i.e.:
secularEq(-(right - left) * RealScalar(0.5), col0, diag, perm, diagShifted, shift)
is positive. To get a negative value, I need to be as conservative as moving to 70% to the left (instead of 50%), i.e.:
secularEq(-(right - left) * RealScalar(0.7), col0, diag, perm, diagShifted, shift)
is negative. (at -60% I still get a positive value)
In theory, the shifted versions are more accurate, so we could re-compute fMid with the shift, and update the shift if the sign disagree.
Summary: Bug 1695: fix a numerical robustness issue. Computing the secular equation at the middle range without a shift might give a wrong sign.