Summary: Add a dense LU factorization with rook-pivoting
The is a rank-revealing LU implementation which can leverage fast matrix-matrix operations. It thus has a very small overhead compared to PartialPivLU (about 30%), and it looks pretty simple to implement. A Fortran implementation under the BSD licence is available there:
