This bugzilla service is closed. All entries have been migrated to https://gitlab.com/libeigen/eigen

Bug 899

Summary: SparseQR occasionally fails for under-determined systems
Product: Eigen Reporter: Christoph Hertzberg <chtz>
Component: SparseAssignee: Nobody <eigen.nobody>
Status: RESOLVED FIXED    
Severity: Accuracy Problem CC: chtz, gael.guennebaud
Priority: Normal    
Version: 3.3 (current stable)   
Hardware: All   
OS: All   
Whiteboard:
Bug Depends on:    
Bug Blocks: 814    

Description Christoph Hertzberg 2014-10-21 20:15:56 UTC
Currently, e.g., this fails:
> ./test/sparseqr_1 s2

I debugged this and found that a column which is only slightly above the pivotThreshold was used.
In our test-suite this happens very rarely, but happens a lot more often, if you generate more matrices with more cols than rows (e.g., increase MaxCols in generate_sparse_rectangular_problem in test/sparseqr.cpp).

A solution could be to increase the default threshold, perhaps depending on the cols/rows ratio.

Another thing that very rarely fails is the
   VERIFY_IS_EQUAL(dqr.rank(), solver.rank());
test, probably because dqr.rank() depends non-continuously on the DenseQR's threshold.
This could be robustified by checking dqr.rank() for a small interval of thresholds and check that solver.rank() is inside that interval.
Comment 1 Gael Guennebaud 2014-10-29 10:03:12 UTC
The main problem here is that the conditional pivoting strategy we employ is not fully rank-revealing, even though it is the state of the art in sparse QR factorization (it is what is implemented in SuitespasreQR). The threshold is a tradeoff between RR ability and performance/memory consumption. Perhaps we could try to detect that something went wrong and report to the user so that he can retry with a larger threshold, or implement some kind of restarting strategy within SparseQR itself.
Comment 2 Gael Guennebaud 2014-10-29 14:11:52 UTC
(In reply to Gael Guennebaud from comment #1)
> The threshold is a tradeoff between RR ability and performance/memory consumption. 

Sorry, this is not true (I mixed up with the SparseLU pivoting strategy). The choice of the threshold is very tricky here because if it is too large then we might cancel meaningful columns because currently if a column norm is too small, then it is completely discarded while ideally it would preferable to simply defer its treatment and get back to it in case we don't find enough "good" columns among the remaining ones. Easy to say, but I'm not sure that's doable in practice without significant overhead.
Comment 3 Gael Guennebaud 2014-11-07 11:27:50 UTC
I tried to overcome this issue using a two-threshold strategy: I start with a very high and very conservative threshold, and differ the treatment of the discarded columns once all very safe ones have been processed. The choice of the first threshold is very tricky: if it is too small then we pick poor columns and miss much better candidates, and if it is too high, then we can also pick bad candidates and miss much better ones during the second pass... With matrices filled with random values in the [-1,1] range, this seems to work reasonably well using for the conservative threshold:

A.colwise().norm().maxCoeff() * sqrt(machine_epsilon) * (Scalar==double ? 1024 : 16)

This is rather huge and it's likely to fail on more tricky problems. 


Actually, I'm not sure that we really want to fix it because if the input matrix A is fat (cols>rows) then we usually want the QR factorization of A^T, or equivalently compute the LQ factorization of A and thus the aforementioned issue does not apply anymore. If we agree on that, when can either forbid SparseQR on fat matrices, or let it performs a PA=LQ factorization.
Comment 4 Gael Guennebaud 2019-02-19 09:57:50 UTC
So, for 3.4, shall we impose cols<=rows by an assert (with the risk of breaking existing likely-broken code), or simply add warnings to the documentation?
Comment 5 Christoph Hertzberg 2019-02-19 12:05:03 UTC
Just asserting that cols<=rows would not help if a matrix contained a lot of zero-rows or a lot of (near-) linear dependent rows.
Similarly, for a fat matrix with many zero (or linear dependent) columns a QR decomposition can make more sense than a LQ factorization.

I'm for adding warnings to the documentation and robustifying the unit-test, e.g., the following still fails:

$ ./test/sparseqr_1 s1550293377
Comment 6 Gael Guennebaud 2019-02-19 22:07:40 UTC
Padding with empty rows does have an effect as the algorithm will then go through all columns (and thus pick all the good ones) instead of stopping early, but yes, in practice this won't make much a difference as picking just one bad column is enough to make the decomposition useless for solving.

For the record, I did some comparison to SuiteSparseQR, and both perform as bad for non full rank inputs, and both can sometimes be completely off with a relative error of 10^6. That's sad for a supposed "rank revealing" factorisation. So yes, let's add some notes in the doc:

https://bitbucket.org/eigen/eigen/commits/0e17a61475c
Bug 899: remove "rank-revealing" qualifier for SparseQR and warn that it is not always rank-revealing.

and make the unit test more robust:

https://bitbucket.org/eigen/eigen/commits/c8134e0bbdc5
Bug 899: make sparseqr unit test more stable by 1) trying with larger threshold and 2) relax rank computation for rank-deficient problems.
Comment 7 Nobody 2019-12-04 13:52:09 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/899.