Bug 319 - [patch] Add update and downdate functionality to LDLT decomposition
[patch] Add update and downdate functionality to LDLT decomposition
Status: RESOLVED FIXED
Product: Eigen
Classification: Unclassified
Component: Cholesky
3.0
All All
: --- enhancement
Assigned To: Nobody
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-18 14:04 UTC by tim.holy
Modified: 2011-12-18 13:02 UTC (History)
1 user (show)



Attachments
A patch to add update/downdate functionality (10.20 KB, application/octet-stream)
2011-07-18 14:04 UTC, tim.holy
no flags Details
A program testing the LDLT decomposition (1.64 KB, text/plain)
2011-07-19 14:12 UTC, tim.holy
no flags Details
Revision of LDLT rankUpdate in response to review comments (13.07 KB, patch)
2011-07-20 04:34 UTC, tim.holy
no flags Details | Diff

Description tim.holy 2011-07-18 14:04:39 UTC
Created attachment 196 [details]
A patch to add update/downdate functionality

The LDLT decomposition of a matrix A can be "updated" to yield the decomposition of A + W*W^T. Typically W is a column vector (a "rank-1 update"), but this patch also permits efficient rank-r updates. Moreover, it also supports "downdates," corresponding to removing previously-added columns of W.
Comment 1 Gael Guennebaud 2011-07-18 15:13:21 UTC
hi, thanks for the patch. I see you adapted an algorithm initially designed for sparse matrices, but for dense ones do you know how does it compare to more classical algorithms based on Givens rotation? For instance see this page for a tech report and code on such an algorithm:
http://lapmal.epfl.ch/software/index.shtml
Comment 2 tim.holy 2011-07-18 16:59:15 UTC
(In reply to comment #1)
> hi, thanks for the patch. I see you adapted an algorithm initially designed for
> sparse matrices, but for dense ones do you know how does it compare to more
> classical algorithms based on Givens rotation? For instance see this page for a
> tech report and code on such an algorithm:
> http://lapmal.epfl.ch/software/index.shtml

I don't know; I haven't tested it myself.

One advantage of this algorithm is that, for multiple-rank updates, it does the diagonals first and then L. If the diagonals were maintained in a separate array, this would presumably be better from a memory-locality perspective. (See comments in the summary section of the paper I referred to.)

However, since Eigen maintains the diagonals as part of the overall matrix, it may not  be advantageous. Also, the main use case will presumably be rank-1 updates, for which it sounds like the BLAS-based code may have advantages.

Finally, it may be worth noting that the link you sent me contains code for LLT, whereas what I contributed is for LDLT.
Comment 3 Gael Guennebaud 2011-07-19 09:30:26 UTC
Fair enough. For the record, this paper is very interesting too: "Methods for Modifying Matrix Factorizations", starting from page 10.

Regarding your patch, don't expect it to be integrated very quickly because it would require some bits of work such as cleaning, unit testing, make the API consistent with the rest of Eigen, etc. but again thanks a lot for sharing.
Comment 4 tim.holy 2011-07-19 14:11:35 UTC
(In reply to comment #3)
> Fair enough.

I did some testing. I'll attach one test program---maybe this will be a starter on a unit test?

Testing: DataType = float, W has 200 rows & 50 columns
Speedwise, the new update() functionality seems to be comparable to, and indeed faster than, the current decomposition code. Last line of valgrind output:
With just the conventional decomposition of A enabled (all else commented out, except for cout << ldltDirect.vectorD().coeff(0) to ensure that the computation is not optimized out):
I   refs:      18,949,850
With just the updating (W-based) decomposition enabled (for comparison leaving the calculation of A intact, even though it is not used):
I   refs:      15,118,056
With the (useless) calculation of A also disabled:
I   refs:      6,705,744

So if one is working with matrices originally defined as W W^T, the updating code is almost 3-fold faster than the current code. This seems like an enhancement worth having.

Accuracywise, the new update() functionality is significantly worse, perhaps due to the lack of pivoting-updates (i.e., there is no pivoting). Here was my output:
Direct err: 0.0289234
Update err: 0.548278
Direct: post/pre = 2.74359e-06
Update: post/pre = 2.42571e-05
The last comparison is probably the most relevant (see comments in code), since we typically care far more about the inverse properties of the decomposition than its accuracy in reconstructing the original matrix.

> For the record, this paper is very interesting too: "Methods for
> Modifying Matrix Factorizations", starting from page 10.

Thanks for the tip, I will look at it. On a related note, for updating SVD (something I may also be interested in implementing) I have the impression that the state-of-the-art is Brand 2002, "Incremental singular value decomposition of uncertain data with missing values." Even without missing values, my naive impression is that this algorithm is now widely used.

> Regarding your patch, don't expect it to be integrated very quickly because it
> would require some bits of work such as cleaning, unit testing, make the API
> consistent with the rest of Eigen, etc. but again thanks a lot for sharing.

OK. If there's more I can do, let me know. I'd be curious to see how it evolves in the hands of someone who really knows the API, it would surely be educational for me.
Comment 5 tim.holy 2011-07-19 14:12:51 UTC
Created attachment 197 [details]
A program testing the LDLT decomposition
Comment 6 tim.holy 2011-07-19 14:30:15 UTC
Further update: if one changes to 200 dimensions and 200 data vectors, one obtains:
"Classical" decomposition (including calculation of A): I   refs:      44,198,467
"Update" decomposition (skipping the calculation of A): I   refs:      51,286,285

So in the previous test case, the main advantage of update() was simply that W is smaller than A. When they are of the same size, the classical code has a slight advantage, but the two are comparable.
Comment 7 tim.holy 2011-07-19 14:41:53 UTC
(In reply to comment #3)
> Fair enough. For the record, this paper is very interesting too: "Methods for
> Modifying Matrix Factorizations", starting from page 10.

Finally, I just noticed that the algorithm in the paper I cited (Davis & Hager) is simply a reorganization of the "Method C1" algorithm in the paper you cited (Gill et al). The one in the Davis paper apparently improves the memory access patterns by making only a single pass through L (should improve cache behavior I think). I doubt my own Eigenized implementation is ideal, I'm sure it could be improved upon.
Comment 8 Gael Guennebaud 2011-07-19 15:52:12 UTC
Tim, thanks for the experiments. I played a bit with it too, and here, if dims==vectors==200, then the direct method is an order of magnitude faster. Same regarding the accuracy: the update method is much less accurate than the direct. Nevertheless, I also compared it to the LLT update method (recently available in the default branch) which is based on Givens rotations, and it seems the algorithm you implemented is quite comparable in term of performance, and slightly better in terms of accuracy, though as you noticed one is about LDLt and the other LLT, so they are not perfectly comparable.

For the unit tests, look at eigen/test/cholesky.cpp, there are already tests for LLT::rankUpdate.

Some remarks regarding the function:

update(MatrixBase<Derived>& w,int alpha)

- update -> rankUpdate
- the alpha should be a RealScalar (if your algorithm supports any alpha)
- w should be const qualified, and its values should really not be modified.

others:

- check it works for complexes, and for "upper" decompositions.
- I'm wondering whether it makes sense to do multi-rank updates as once, because:
 1 - the speed improvement is quite small, especially for small rank update
 2 - for relatively large updates it is faster and more accurate to restart from scratch anyway
 3 - multi-rank updates requires more temporaries
Comment 9 tim.holy 2011-07-20 04:34:31 UTC
Created attachment 198 [details]
Revision of LDLT rankUpdate in response to review comments

I sent comments via email, if the email doesn't appear here soon I'll paste the text in a 2nd comment.
Comment 10 tim.holy 2011-07-20 11:50:48 UTC
(In reply to comment #8)
> Same regarding the accuracy: the update method is much less accurate than the
> direct. Nevertheless, I also compared it to the LLT update method (recently
> available in the default branch) which is based on Givens rotations, and it
> seems the algorithm you implemented is quite comparable in term of
> performance, and slightly better in terms of accuracy, though as you noticed one
> is about LDLt and the other LLT, so they are not perfectly comparable.

The more I think about it, the stranger I find it that updating is of lower accuracy. After all, the condition number of W W^T is the square of that of W. So I am increasingly puzzled by this result. However, I don't have any ideas at the moment for what to do about it.

> For the unit tests, look at eigen/test/cholesky.cpp, there are already tests
> for LLT::rankUpdate.

Thanks---I hadn't noticed that LLT already had rankUpdate, or I would have 
based my work on it.

I simply made copies of the LLT tests for LDLT.

> Some remarks regarding the function:
> 
> update(MatrixBase<Derived>& w,int alpha)
> 
> - update -> rankUpdate
Done

> - the alpha should be a RealScalar (if your algorithm supports any alpha)
Mathematically any value is acceptable, so this seems reasonable, even though 
I can't imagine why anyone would want to use a value different from +1 or -1.

> - w should be const qualified, and its values should really not be modified.
Done (this was easy to achieve once I restricted it to rank-1 updates, which I did in response to your other suggestions below).

> 
> others:
> 
> - check it works for complexes, and for "upper" decompositions.
It works for complex lower, and it works for real upper, but it does not work 
for complex upper. Interestingly, I notice that LLT updating seems to fail its 
unit test for complex upper, too (cholesky_6). Not quite sure what is going 
on, or what to do about it---maybe the error is in Transpose?

> - I'm wondering whether it makes sense to do multi-rank updates as once,
> because:
>  1 - the speed improvement is quite small, especially for small rank update
>  2 - for relatively large updates it is faster and more accurate to restart
> from scratch anyway
>  3 - multi-rank updates requires more temporaries

Agreed, and in any case my own work requires only rank-1 updates. I changed 
it, and this somewhat streamlines the code.
Comment 11 Gael Guennebaud 2011-07-21 11:17:20 UTC
thanks a lot for the updated patch, it looks pretty good now, except the upper/complex case, see below.

(In reply to comment #10)
> (In reply to comment #8)
> > Same regarding the accuracy: the update method is much less accurate than the
> > direct. Nevertheless, I also compared it to the LLT update method (recently
> > available in the default branch) which is based on Givens rotations, and it
> > seems the algorithm you implemented is quite comparable in term of
> > performance, and slightly better in terms of accuracy, though as you noticed one
> > is about LDLt and the other LLT, so they are not perfectly comparable.
> 
> The more I think about it, the stranger I find it that updating is of lower
> accuracy. After all, the condition number of W W^T is the square of that of W.
> So I am increasingly puzzled by this result. However, I don't have any ideas 
> at the moment for what to do about it.

I don't know either what's the reason, but that seems to be a well known fact!

> > - the alpha should be a RealScalar (if your algorithm supports any alpha)
> Mathematically any value is acceptable, so this seems reasonable, even though 
> I can't imagine why anyone would want to use a value different from +1 or -1.

think alpha as a weight in incremental weighted least squares. This is also to be consistent with SelfAdjointView::rankUpdate.
 
> > - check it works for complexes, and for "upper" decompositions.
> It works for complex lower, and it works for real upper, but it does not work 
> for complex upper. Interestingly, I notice that LLT updating seems to fail its 
> unit test for complex upper, too (cholesky_6). Not quite sure what is going 
> on, or what to do about it---maybe the error is in Transpose?

yes, I noticed it a few days ago and the easiest solution was to pass w.conjugate(), see the latest version of the devel branch. Perhaps this works the same here.
 
> > - I'm wondering whether it makes sense to do multi-rank updates as once,
> > because:
> >  1 - the speed improvement is quite small, especially for small rank update
> >  2 - for relatively large updates it is faster and more accurate to restart
> > from scratch anyway
> >  3 - multi-rank updates requires more temporaries
> 
> Agreed, and in any case my own work requires only rank-1 updates. I changed 
> it, and this somewhat streamlines the code.

I think rankUpdate should still accept matrices, but simply process it one column at a time. I probably have to do the same change for LLT::rankUpdate() ;)
Comment 12 tim.holy 2011-12-03 17:44:36 UTC
I see I've not checked this bug report for a long time, and due to my lingering confusion about how hg works I had the impression the patch had been committed. But with the impending release of 3.1 I checked out a fresh copy, and noticed (I think) that it's still not in. I'd like to finish this patch so it can be included.

I updated my copy of the default branch. Unless I'm doing something wrong (which is entirely possible), I noticed LLT is still failing cholesky_6. So I am not certain which part of "latest version of the devel branch" (see comment above) I am supposed to check.
Comment 13 tim.holy 2011-12-03 17:51:05 UTC
> I think rankUpdate should still accept matrices, but simply process it one
> column at a time. I probably have to do the same change for
> LLT::rankUpdate() ;)

I'm happy to implement this too, but before I do I should ask: having it loop over columns would seem to introduce a very slight penalty for the case where the user knows there is only one column. (If the updating matrix has ColsAtCompileTime = 1 this can be optimized away, but if the user uses Dynamic then it would seem to need to run the loop.) Are you concerned at all about this? Presumably it is a very small penalty compared to the actual matrix operations. Nevertheless, if you want the absolute most in efficiency it would seem to be best to make the user run the loop in cases where they want to do multi-rank updates. Just let me know what you prefer.
Comment 14 Gael Guennebaud 2011-12-06 17:13:02 UTC
hi, I've just release the alpha1, but there is still time to include it for the next beta. I'm not worried at all about the performance of looping over the rows. If the user pass a vector then the loop will be optimized away, and a loop is mainly a "if" so compared to the rest of the computations this is nothing!

cholesky_6 should be be failing, works here on many different systems.
Comment 15 Gael Guennebaud 2011-12-09 23:10:49 UTC
committed and updated:

https://bitbucket.org/eigen/eigen/changeset/a30fb8deb347/
changeset:   a30fb8deb347
summary:     feature 319: Add update and downdate functionality to LDLT

https://bitbucket.org/eigen/eigen/changeset/4c96fd9a053d/
changeset:   4c96fd9a053d
summary:     feature 319: fix LDLT::rankUpdate for complex/upper, simply the algortihm, update copyrights
Comment 16 tim.holy 2011-12-17 17:02:44 UTC
Thanks for committing! Does this mean you're satisfied with it? I still never got cholesky_6 to work properly for LLT, here's my result:
(note the Hg repository seems to be down, I wasn't able to pull updates)


-- ************************************************************
-- 
-- Configured Eigen 3.0.90
-- 
-- Some things you can do now:
-- --------------+--------------------------------------------------------------
-- Command       |   Description
-- --------------+--------------------------------------------------------------
-- make install  | Install to /usr/local. To change that:
--               |     cmake . -DCMAKE_INSTALL_PREFIX=yourpath
--               |   Eigen headers will then be installed to:
--               |     /usr/local/include/eigen3
--               |   To install Eigen headers to a separate location, do:
--               |     cmake . -DEIGEN_INCLUDE_INSTALL_DIR=yourpath
-- make doc      | Generate the API documentation, requires Doxygen & LaTeX
-- make check    | Build and run the unit-tests. Read this page:
--               |   http://eigen.tuxfamily.org/index.php?title=Tests
-- make blas     | Build BLAS library (not the same thing as Eigen)
-- --------------+--------------------------------------------------------------
-- 
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/builddir
tim@diva /t/builddir> make cholesky
Scanning dependencies of target cholesky_9
[  0%] Building CXX object test/CMakeFiles/cholesky_9.dir/cholesky.cpp.o                   
Linking CXX executable cholesky_9                                                          
[  0%] Built target cholesky_9                                                             
Scanning dependencies of target cholesky_4
[  0%] Building CXX object test/CMakeFiles/cholesky_4.dir/cholesky.cpp.o                   
Linking CXX executable cholesky_4                                                          
[  0%] Built target cholesky_4                                                             
Scanning dependencies of target cholesky_1
[  0%] Building CXX object test/CMakeFiles/cholesky_1.dir/cholesky.cpp.o                   
Linking CXX executable cholesky_1                                                          
[  0%] Built target cholesky_1                                                             
Scanning dependencies of target cholesky_3
[  0%] Building CXX object test/CMakeFiles/cholesky_3.dir/cholesky.cpp.o                   
Linking CXX executable cholesky_3                                                          
[  0%] Built target cholesky_3                                                             
Scanning dependencies of target cholesky_5
[100%] Building CXX object test/CMakeFiles/cholesky_5.dir/cholesky.cpp.o                   
Linking CXX executable cholesky_5                                                          
[100%] Built target cholesky_5                                                             
Scanning dependencies of target cholesky_2
[100%] Building CXX object test/CMakeFiles/cholesky_2.dir/cholesky.cpp.o                   
Linking CXX executable cholesky_2                                                          
[100%] Built target cholesky_2                                                             
Scanning dependencies of target cholesky_6
[100%] Building CXX object test/CMakeFiles/cholesky_6.dir/cholesky.cpp.o                   
Linking CXX executable cholesky_6                                                          
[100%] Built target cholesky_6                                                             
Scanning dependencies of target cholesky_7
[100%] Building CXX object test/CMakeFiles/cholesky_7.dir/cholesky.cpp.o                   
Linking CXX executable cholesky_7                                                          
[100%] Built target cholesky_7                                                             
Scanning dependencies of target cholesky_8
[100%] Building CXX object test/CMakeFiles/cholesky_8.dir/cholesky.cpp.o                   
Linking CXX executable cholesky_8                                                          
[100%] Built target cholesky_8                                                             
Scanning dependencies of target cholesky
[100%] Built target cholesky                                                               
tim@diva /t/builddir> ls
blas/           cmake_install.cmake    demos/     Makefile     unsupported/
buildtests.sh*  CTestCustom.cmake      doc/       release.sh*
check.sh*       CTestTestfile.cmake    Eigen/     scripts/
CMakeCache.txt  DartConfiguration.tcl  eigen3.pc  test/
CMakeFiles/     debug.sh*              lapack/    Testing/
tim@diva /t/builddir> cd test/
tim@diva /t/b/test> ls
cholesky_1*  cholesky_4*  cholesky_7*  CMakeFiles/          Makefile
cholesky_2*  cholesky_5*  cholesky_8*  cmake_install.cmake  split_test_helper.h
cholesky_3*  cholesky_6*  cholesky_9*  CTestTestfile.cmake
tim@diva /t/b/test> ./cholesky_1
Initializing random number generator with seed 1324137446
Repeating each test 10 times
tim@diva /t/b/test> ./cholesky_2                                                           
Initializing random number generator with seed 1324137448
Repeating each test 10 times
tim@diva /t/b/test> ./cholesky_3                                                           
Initializing random number generator with seed 1324137450
Repeating each test 10 times
tim@diva /t/b/test> ./cholesky_4                                                           
Initializing random number generator with seed 1324137452
Repeating each test 10 times
tim@diva /t/b/test> ./cholesky_5                                                           
Initializing random number generator with seed 1324137455
Repeating each test 10 times
tim@diva /t/b/test> ./cholesky_6                                                           
Initializing random number generator with seed 1324137457
Repeating each test 10 times
Test cholesky_cplx(MatrixXcd(s,s)) failed in /home/tim/src/eigen/test/cholesky.cpp (205)
    test_isApprox(symmCpy, cholup.reconstructedMatrix())

fish: Job 1, “./cholesky_6” terminated by signal SIGABRT (Abort)
tim@diva /t/b/test> ./cholesky_7                                                           
Initializing random number generator with seed 1324137467
Repeating each test 10 times
tim@diva /t/b/test> ./cholesky_8                                                           
Initializing random number generator with seed 1324137470
Repeating each test 10 times
tim@diva /t/b/test> ./cholesky_9                                                           
Initializing random number generator with seed 1324137471
Repeating each test 10 times

If all works for you, that's fine, I was just stymied by this. Sorry.
Comment 17 tim.holy 2011-12-17 17:11:01 UTC
Re my comment on the Hg server: my ISP seems to be having DNS trouble, there are other sites it can't find, either. So ignore that. I last updated my repository Dec. 3 or so.
Comment 18 Gael Guennebaud 2011-12-17 18:21:30 UTC
hm, line 205 is: Index cols = m.cols();, so your working copy is clearly not up-to-date.
Comment 19 tim.holy 2011-12-18 13:02:30 UTC
Hmm, sorry, I keep getting tripped up on differences between SVN and Hg. Clearly I should spend a day reading about how Hg works.

Agreed, the new code passes all tests. Thanks!

Note You need to log in before you can comment on or make changes to this bug.