Created attachment 196 [details] A patch to add update/downdate functionality The LDLT decomposition of a matrix A can be "updated" to yield the decomposition of A + W*W^T. Typically W is a column vector (a "rank-1 update"), but this patch also permits efficient rank-r updates. Moreover, it also supports "downdates," corresponding to removing previously-added columns of W.
hi, thanks for the patch. I see you adapted an algorithm initially designed for sparse matrices, but for dense ones do you know how does it compare to more classical algorithms based on Givens rotation? For instance see this page for a tech report and code on such an algorithm: http://lapmal.epfl.ch/software/index.shtml
(In reply to comment #1) > hi, thanks for the patch. I see you adapted an algorithm initially designed for > sparse matrices, but for dense ones do you know how does it compare to more > classical algorithms based on Givens rotation? For instance see this page for a > tech report and code on such an algorithm: > http://lapmal.epfl.ch/software/index.shtml I don't know; I haven't tested it myself. One advantage of this algorithm is that, for multiple-rank updates, it does the diagonals first and then L. If the diagonals were maintained in a separate array, this would presumably be better from a memory-locality perspective. (See comments in the summary section of the paper I referred to.) However, since Eigen maintains the diagonals as part of the overall matrix, it may not be advantageous. Also, the main use case will presumably be rank-1 updates, for which it sounds like the BLAS-based code may have advantages. Finally, it may be worth noting that the link you sent me contains code for LLT, whereas what I contributed is for LDLT.
Fair enough. For the record, this paper is very interesting too: "Methods for Modifying Matrix Factorizations", starting from page 10. Regarding your patch, don't expect it to be integrated very quickly because it would require some bits of work such as cleaning, unit testing, make the API consistent with the rest of Eigen, etc. but again thanks a lot for sharing.
(In reply to comment #3) > Fair enough. I did some testing. I'll attach one test program---maybe this will be a starter on a unit test? Testing: DataType = float, W has 200 rows & 50 columns Speedwise, the new update() functionality seems to be comparable to, and indeed faster than, the current decomposition code. Last line of valgrind output: With just the conventional decomposition of A enabled (all else commented out, except for cout << ldltDirect.vectorD().coeff(0) to ensure that the computation is not optimized out): I refs: 18,949,850 With just the updating (W-based) decomposition enabled (for comparison leaving the calculation of A intact, even though it is not used): I refs: 15,118,056 With the (useless) calculation of A also disabled: I refs: 6,705,744 So if one is working with matrices originally defined as W W^T, the updating code is almost 3-fold faster than the current code. This seems like an enhancement worth having. Accuracywise, the new update() functionality is significantly worse, perhaps due to the lack of pivoting-updates (i.e., there is no pivoting). Here was my output: Direct err: 0.0289234 Update err: 0.548278 Direct: post/pre = 2.74359e-06 Update: post/pre = 2.42571e-05 The last comparison is probably the most relevant (see comments in code), since we typically care far more about the inverse properties of the decomposition than its accuracy in reconstructing the original matrix. > For the record, this paper is very interesting too: "Methods for > Modifying Matrix Factorizations", starting from page 10. Thanks for the tip, I will look at it. On a related note, for updating SVD (something I may also be interested in implementing) I have the impression that the state-of-the-art is Brand 2002, "Incremental singular value decomposition of uncertain data with missing values." Even without missing values, my naive impression is that this algorithm is now widely used. > Regarding your patch, don't expect it to be integrated very quickly because it > would require some bits of work such as cleaning, unit testing, make the API > consistent with the rest of Eigen, etc. but again thanks a lot for sharing. OK. If there's more I can do, let me know. I'd be curious to see how it evolves in the hands of someone who really knows the API, it would surely be educational for me.
Created attachment 197 [details] A program testing the LDLT decomposition
Further update: if one changes to 200 dimensions and 200 data vectors, one obtains: "Classical" decomposition (including calculation of A): I refs: 44,198,467 "Update" decomposition (skipping the calculation of A): I refs: 51,286,285 So in the previous test case, the main advantage of update() was simply that W is smaller than A. When they are of the same size, the classical code has a slight advantage, but the two are comparable.
(In reply to comment #3) > Fair enough. For the record, this paper is very interesting too: "Methods for > Modifying Matrix Factorizations", starting from page 10. Finally, I just noticed that the algorithm in the paper I cited (Davis & Hager) is simply a reorganization of the "Method C1" algorithm in the paper you cited (Gill et al). The one in the Davis paper apparently improves the memory access patterns by making only a single pass through L (should improve cache behavior I think). I doubt my own Eigenized implementation is ideal, I'm sure it could be improved upon.
Tim, thanks for the experiments. I played a bit with it too, and here, if dims==vectors==200, then the direct method is an order of magnitude faster. Same regarding the accuracy: the update method is much less accurate than the direct. Nevertheless, I also compared it to the LLT update method (recently available in the default branch) which is based on Givens rotations, and it seems the algorithm you implemented is quite comparable in term of performance, and slightly better in terms of accuracy, though as you noticed one is about LDLt and the other LLT, so they are not perfectly comparable. For the unit tests, look at eigen/test/cholesky.cpp, there are already tests for LLT::rankUpdate. Some remarks regarding the function: update(MatrixBase<Derived>& w,int alpha) - update -> rankUpdate - the alpha should be a RealScalar (if your algorithm supports any alpha) - w should be const qualified, and its values should really not be modified. others: - check it works for complexes, and for "upper" decompositions. - I'm wondering whether it makes sense to do multi-rank updates as once, because: 1 - the speed improvement is quite small, especially for small rank update 2 - for relatively large updates it is faster and more accurate to restart from scratch anyway 3 - multi-rank updates requires more temporaries
Created attachment 198 [details] Revision of LDLT rankUpdate in response to review comments I sent comments via email, if the email doesn't appear here soon I'll paste the text in a 2nd comment.
(In reply to comment #8) > Same regarding the accuracy: the update method is much less accurate than the > direct. Nevertheless, I also compared it to the LLT update method (recently > available in the default branch) which is based on Givens rotations, and it > seems the algorithm you implemented is quite comparable in term of > performance, and slightly better in terms of accuracy, though as you noticed one > is about LDLt and the other LLT, so they are not perfectly comparable. The more I think about it, the stranger I find it that updating is of lower accuracy. After all, the condition number of W W^T is the square of that of W. So I am increasingly puzzled by this result. However, I don't have any ideas at the moment for what to do about it. > For the unit tests, look at eigen/test/cholesky.cpp, there are already tests > for LLT::rankUpdate. Thanks---I hadn't noticed that LLT already had rankUpdate, or I would have based my work on it. I simply made copies of the LLT tests for LDLT. > Some remarks regarding the function: > > update(MatrixBase<Derived>& w,int alpha) > > - update -> rankUpdate Done > - the alpha should be a RealScalar (if your algorithm supports any alpha) Mathematically any value is acceptable, so this seems reasonable, even though I can't imagine why anyone would want to use a value different from +1 or -1. > - w should be const qualified, and its values should really not be modified. Done (this was easy to achieve once I restricted it to rank-1 updates, which I did in response to your other suggestions below). > > others: > > - check it works for complexes, and for "upper" decompositions. It works for complex lower, and it works for real upper, but it does not work for complex upper. Interestingly, I notice that LLT updating seems to fail its unit test for complex upper, too (cholesky_6). Not quite sure what is going on, or what to do about it---maybe the error is in Transpose? > - I'm wondering whether it makes sense to do multi-rank updates as once, > because: > 1 - the speed improvement is quite small, especially for small rank update > 2 - for relatively large updates it is faster and more accurate to restart > from scratch anyway > 3 - multi-rank updates requires more temporaries Agreed, and in any case my own work requires only rank-1 updates. I changed it, and this somewhat streamlines the code.
thanks a lot for the updated patch, it looks pretty good now, except the upper/complex case, see below. (In reply to comment #10) > (In reply to comment #8) > > Same regarding the accuracy: the update method is much less accurate than the > > direct. Nevertheless, I also compared it to the LLT update method (recently > > available in the default branch) which is based on Givens rotations, and it > > seems the algorithm you implemented is quite comparable in term of > > performance, and slightly better in terms of accuracy, though as you noticed one > > is about LDLt and the other LLT, so they are not perfectly comparable. > > The more I think about it, the stranger I find it that updating is of lower > accuracy. After all, the condition number of W W^T is the square of that of W. > So I am increasingly puzzled by this result. However, I don't have any ideas > at the moment for what to do about it. I don't know either what's the reason, but that seems to be a well known fact! > > - the alpha should be a RealScalar (if your algorithm supports any alpha) > Mathematically any value is acceptable, so this seems reasonable, even though > I can't imagine why anyone would want to use a value different from +1 or -1. think alpha as a weight in incremental weighted least squares. This is also to be consistent with SelfAdjointView::rankUpdate. > > - check it works for complexes, and for "upper" decompositions. > It works for complex lower, and it works for real upper, but it does not work > for complex upper. Interestingly, I notice that LLT updating seems to fail its > unit test for complex upper, too (cholesky_6). Not quite sure what is going > on, or what to do about it---maybe the error is in Transpose? yes, I noticed it a few days ago and the easiest solution was to pass w.conjugate(), see the latest version of the devel branch. Perhaps this works the same here. > > - I'm wondering whether it makes sense to do multi-rank updates as once, > > because: > > 1 - the speed improvement is quite small, especially for small rank update > > 2 - for relatively large updates it is faster and more accurate to restart > > from scratch anyway > > 3 - multi-rank updates requires more temporaries > > Agreed, and in any case my own work requires only rank-1 updates. I changed > it, and this somewhat streamlines the code. I think rankUpdate should still accept matrices, but simply process it one column at a time. I probably have to do the same change for LLT::rankUpdate() ;)
I see I've not checked this bug report for a long time, and due to my lingering confusion about how hg works I had the impression the patch had been committed. But with the impending release of 3.1 I checked out a fresh copy, and noticed (I think) that it's still not in. I'd like to finish this patch so it can be included. I updated my copy of the default branch. Unless I'm doing something wrong (which is entirely possible), I noticed LLT is still failing cholesky_6. So I am not certain which part of "latest version of the devel branch" (see comment above) I am supposed to check.
> I think rankUpdate should still accept matrices, but simply process it one > column at a time. I probably have to do the same change for > LLT::rankUpdate() ;) I'm happy to implement this too, but before I do I should ask: having it loop over columns would seem to introduce a very slight penalty for the case where the user knows there is only one column. (If the updating matrix has ColsAtCompileTime = 1 this can be optimized away, but if the user uses Dynamic then it would seem to need to run the loop.) Are you concerned at all about this? Presumably it is a very small penalty compared to the actual matrix operations. Nevertheless, if you want the absolute most in efficiency it would seem to be best to make the user run the loop in cases where they want to do multi-rank updates. Just let me know what you prefer.
hi, I've just release the alpha1, but there is still time to include it for the next beta. I'm not worried at all about the performance of looping over the rows. If the user pass a vector then the loop will be optimized away, and a loop is mainly a "if" so compared to the rest of the computations this is nothing! cholesky_6 should be be failing, works here on many different systems.
committed and updated: https://bitbucket.org/eigen/eigen/changeset/a30fb8deb347/ changeset: a30fb8deb347 summary: feature 319: Add update and downdate functionality to LDLT https://bitbucket.org/eigen/eigen/changeset/4c96fd9a053d/ changeset: 4c96fd9a053d summary: feature 319: fix LDLT::rankUpdate for complex/upper, simply the algortihm, update copyrights
Thanks for committing! Does this mean you're satisfied with it? I still never got cholesky_6 to work properly for LLT, here's my result: (note the Hg repository seems to be down, I wasn't able to pull updates) -- ************************************************************ -- -- Configured Eigen 3.0.90 -- -- Some things you can do now: -- --------------+-------------------------------------------------------------- -- Command | Description -- --------------+-------------------------------------------------------------- -- make install | Install to /usr/local. To change that: -- | cmake . -DCMAKE_INSTALL_PREFIX=yourpath -- | Eigen headers will then be installed to: -- | /usr/local/include/eigen3 -- | To install Eigen headers to a separate location, do: -- | cmake . -DEIGEN_INCLUDE_INSTALL_DIR=yourpath -- make doc | Generate the API documentation, requires Doxygen & LaTeX -- make check | Build and run the unit-tests. Read this page: -- | http://eigen.tuxfamily.org/index.php?title=Tests -- make blas | Build BLAS library (not the same thing as Eigen) -- --------------+-------------------------------------------------------------- -- -- Configuring done -- Generating done -- Build files have been written to: /tmp/builddir tim@diva /t/builddir> make cholesky Scanning dependencies of target cholesky_9 [ 0%] Building CXX object test/CMakeFiles/cholesky_9.dir/cholesky.cpp.o Linking CXX executable cholesky_9 [ 0%] Built target cholesky_9 Scanning dependencies of target cholesky_4 [ 0%] Building CXX object test/CMakeFiles/cholesky_4.dir/cholesky.cpp.o Linking CXX executable cholesky_4 [ 0%] Built target cholesky_4 Scanning dependencies of target cholesky_1 [ 0%] Building CXX object test/CMakeFiles/cholesky_1.dir/cholesky.cpp.o Linking CXX executable cholesky_1 [ 0%] Built target cholesky_1 Scanning dependencies of target cholesky_3 [ 0%] Building CXX object test/CMakeFiles/cholesky_3.dir/cholesky.cpp.o Linking CXX executable cholesky_3 [ 0%] Built target cholesky_3 Scanning dependencies of target cholesky_5 [100%] Building CXX object test/CMakeFiles/cholesky_5.dir/cholesky.cpp.o Linking CXX executable cholesky_5 [100%] Built target cholesky_5 Scanning dependencies of target cholesky_2 [100%] Building CXX object test/CMakeFiles/cholesky_2.dir/cholesky.cpp.o Linking CXX executable cholesky_2 [100%] Built target cholesky_2 Scanning dependencies of target cholesky_6 [100%] Building CXX object test/CMakeFiles/cholesky_6.dir/cholesky.cpp.o Linking CXX executable cholesky_6 [100%] Built target cholesky_6 Scanning dependencies of target cholesky_7 [100%] Building CXX object test/CMakeFiles/cholesky_7.dir/cholesky.cpp.o Linking CXX executable cholesky_7 [100%] Built target cholesky_7 Scanning dependencies of target cholesky_8 [100%] Building CXX object test/CMakeFiles/cholesky_8.dir/cholesky.cpp.o Linking CXX executable cholesky_8 [100%] Built target cholesky_8 Scanning dependencies of target cholesky [100%] Built target cholesky tim@diva /t/builddir> ls blas/ cmake_install.cmake demos/ Makefile unsupported/ buildtests.sh* CTestCustom.cmake doc/ release.sh* check.sh* CTestTestfile.cmake Eigen/ scripts/ CMakeCache.txt DartConfiguration.tcl eigen3.pc test/ CMakeFiles/ debug.sh* lapack/ Testing/ tim@diva /t/builddir> cd test/ tim@diva /t/b/test> ls cholesky_1* cholesky_4* cholesky_7* CMakeFiles/ Makefile cholesky_2* cholesky_5* cholesky_8* cmake_install.cmake split_test_helper.h cholesky_3* cholesky_6* cholesky_9* CTestTestfile.cmake tim@diva /t/b/test> ./cholesky_1 Initializing random number generator with seed 1324137446 Repeating each test 10 times tim@diva /t/b/test> ./cholesky_2 Initializing random number generator with seed 1324137448 Repeating each test 10 times tim@diva /t/b/test> ./cholesky_3 Initializing random number generator with seed 1324137450 Repeating each test 10 times tim@diva /t/b/test> ./cholesky_4 Initializing random number generator with seed 1324137452 Repeating each test 10 times tim@diva /t/b/test> ./cholesky_5 Initializing random number generator with seed 1324137455 Repeating each test 10 times tim@diva /t/b/test> ./cholesky_6 Initializing random number generator with seed 1324137457 Repeating each test 10 times Test cholesky_cplx(MatrixXcd(s,s)) failed in /home/tim/src/eigen/test/cholesky.cpp (205) test_isApprox(symmCpy, cholup.reconstructedMatrix()) fish: Job 1, “./cholesky_6” terminated by signal SIGABRT (Abort) tim@diva /t/b/test> ./cholesky_7 Initializing random number generator with seed 1324137467 Repeating each test 10 times tim@diva /t/b/test> ./cholesky_8 Initializing random number generator with seed 1324137470 Repeating each test 10 times tim@diva /t/b/test> ./cholesky_9 Initializing random number generator with seed 1324137471 Repeating each test 10 times If all works for you, that's fine, I was just stymied by this. Sorry.
Re my comment on the Hg server: my ISP seems to be having DNS trouble, there are other sites it can't find, either. So ignore that. I last updated my repository Dec. 3 or so.
hm, line 205 is: Index cols = m.cols();, so your working copy is clearly not up-to-date.
Hmm, sorry, I keep getting tripped up on differences between SVN and Hg. Clearly I should spend a day reading about how Hg works. Agreed, the new code passes all tests. Thanks!
-- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/319.