When compiling with MKL enabled, the function householder_qr_inplace_blocked is defined twice, leading to an ambiguous call compile error. Adding #ifndef EIGEN_USE_LAPACKE around the version in Eigen/src/QR/HouseholderQR.h fixes the problem. I'm not sure if this is the correct solution though?
I just remembered that we are calling this function directly from our project, using "Eigen::internal::householder_qr_inplace_blocked(...)". If desired, I can check whether the compile problem still occurs if Householder QR is called through the public interface.
Created attachment 394 [details]
possible fix for bug 704
(In reply to comment #0)
> [...] Adding #ifndef
> EIGEN_USE_LAPACKE around the version in Eigen/src/QR/HouseholderQR.h fixes the
> problem. I'm not sure if this is the correct solution though?
That would disable the possibility to run QR on types not supported by MKL.
The problem is that the redefinition of said method is not an actual template specialization.
Could you try if the attached patch solves your issue? (I don't have MKL installed, so I can't check). The patch will make the specialization be called only for dynamic-sized Eigen::Matrix of the types supported by MKL.
I guess calling MKL on (partially) fixed sized matrices is not really necessary?
But not being able to call MKL's geqrf for Blocks or Maps might be a restriction ...
A really proper solution will require some refactoring, I assume.
Generally, it is not save to call internal methods, as they could be removed, renamed or change behavior at any time.
At the moment I'm in Linux and noticed that the duplicate definition does not occur with GCC - it only seems to occur with MSVC.
I'll try the patch tomorrow though when I'm rebooted into Windows.
I'll attempt to develop a patch that doesn't prevent use with Block and Map, and contribute that back here. But can you help me to understand how the function selection is supposed to work? Does it work because the MKL version of householder_qr_inplace_blocked explicitly specifies the type for the tempData argument whereas in the non-MKL version it is a template argument dependent type? i.e. the compiler is supposed to prefer the explicitly-typed version?
Would a solution be to create additional macros in HouseholderQR_MKL.h to specialize the template for all types supported by MKL? Should any types on top of DenseBase and MapBase be supported?
One more note - the reason we're calling the internal method is we want to do QR in-place. Our library is factorizing millions of small-ish matrices of various sizes, thus we're trying to avoid extra memory allocation. Is there a way to make HouseholderQR work in-place using the public API?
(In reply to comment #3)
> At the moment I'm in Linux and noticed that the duplicate definition does not
> occur with GCC - it only seems to occur with MSVC.
Hm, strange. I can't see at the moment, how GCC distinguishes between those two. I tried some smaller examples, and GCC indeed prefers the overloaded variant -- although actually, the parameters are exactly the same.
> I'll attempt to develop a patch that doesn't prevent use with Block and Map,
> and contribute that back here. But can you help me to understand how the
> function selection is supposed to work? Does it work because the MKL version
> of householder_qr_inplace_blocked explicitly specifies the type for the
> tempData argument whereas in the non-MKL version it is a template argument
> dependent type? i.e. the compiler is supposed to prefer the explicitly-typed
It's called partial template specialization. I would not claim to have completely understand the rules for that (and apparently, GCC and MSVC disagree on the rules). Essentially, you can define a function depending on template parameters and then write a specialized function, where the template parameters must fit into a certain pattern. That is, e.g., how std::swap(std::vector<T>&, std::vector<T>&) is implemented.
> Would a solution be to create additional macros in HouseholderQR_MKL.h to
> specialize the template for all types supported by MKL? Should any types on
> top of DenseBase and MapBase be supported?
I think a clean solution would be to make a helper-struct which extracts the Scalar type from MatrixQR (and actually, also the internal stride). This struct has a `run` function. And the struct can be specialized based on the extracted informations.
(In reply to comment #4)
> One more note - the reason we're calling the internal method is we want to do
> QR in-place. Our library is factorizing millions of small-ish matrices of
> various sizes, thus we're trying to avoid extra memory allocation. Is there a
> way to make HouseholderQR work in-place using the public API?
If they are "small-ish", I wonder if you significantly profit from MKL at all ...
AFAIK there is no clean way to call in-place decompositions at the moment. I thought I filed a bug for that once, but I could not find, so I just filed Bug 707.
Btw: How do you call the inplace function?
In gcc I need to pass the optional parameters as well, so that he does not complain about ambiguity. And actually, it's quite easy to mess up that call, by passing a MatrixXd and a float* (or vice versa).
Hi Christoph, sorry for the delay here, things got busy and we have a holiday this week. I'm going to look at this a bit today and continue next week if I don't finish today. Thanks for your help!
Created attachment 402 [details]
Patch to use a selector struct for choosing the MKL version of householder_qr_inplace_blocked
Here is a patch that uses a selector struct with a run method. I tested this on Linux and Windows. It selects for the Scalar type in MatrixQR. I didn't implement selecting on the InnerStride - why is this necessary and how would the lapack call change?
It would be good if someone can check the code and test on their machine since I don't have much experience with the Eigen code. To make checking the code easier, the next attachment is a "human readable" version of the patch created with the -b option that more clearly shows the changes but leaves out the indentation changes.
I was able to run the Eigen unit tests on Windows but on Linux am having linking errors that I can't figure out right now. So it would be good if someone can specifically test on Linux and Mac OS.
Created attachment 403 [details]
Human-readable (diff -b) version of the mkl_selector patch
This patch should be easier to inspect by hand since it uses the -b option to ignore indentation changes.
Christoph, in response to your question about how we call the in-place function, we indeed were calling it without the optional argument, so that must be why we got the ambiguous symbol error!
Also, I didn't realize that MKL was not automatically not used for small matrices (I thought I'd read something about that but maybe that was for matrix-matrix products). We're doing a multi-frontal QR factorization so there is a wide range of matrix sizes.
Do you think it would make sense to have a threshold in the Eigen householder QR code that selects MKL only if the matrix is above a certain size?
Basically, your patch looks good. I think you still need to make sure that the internal stride is 1 (MatrixQR::InnerStrideAtCompileTime==1) and the type of HCoeffs must be compatible as well (same scalar type, VectorAtCompileTime, InnerStrideAtCompileTime==1).
Even after that, calling the internal function will not be a good idea, because one could shoot oneself in the foot, e.g. by manually changing the implicit template parameters. So a public wrapper function should be implemented for that(see bug 707)
W.r.t selecting MKL vs Eigen via a threshold: I don't think that MKL will be much worse on small types (what I wanted to say in comment 5 is that MKL will unlikely be much better). But maybe you should benchmark that, before I further speculate here.
A thing that would be considerable is to call the Eigen method if the size is known at compile time (and maybe also below a threshold). For fixed sizes the decomposition could be completely or partly unrolled -- this would very likely be much faster up to a certain size. I'm not sure if that is worth the effort, though.
I'm finally preparing a final patch. One question about HCoeffs - do the restrictions of same scalar type and VectorAtCompileTime apply to the plain implementation of householder_qr_inplace_blocked, or only the MKL version? If it's the former, perhaps some kind of static assert check is needed in QR? Also if the former, it means I shouldn't make the check part of the MKL override version.
(In reply to comment #12)
> I'm finally preparing a final patch. One question about HCoeffs - do the
> restrictions of same scalar type and VectorAtCompileTime apply to the plain
> implementation of householder_qr_inplace_blocked, or only the MKL version? If
The restriction is for both methods, but since it is intended only to be an internal method that restriction is always guaranteed by the calling method. If we introduce a clean public interface for inplace decomposition we must make a static assert in that method.
We can consider adding static assertions into the internal methods, as well.
Ok, given that info, in the patch I'll attach, the only requirements to select the MKL version are that the scalar type is supported by MKL and the InnerStrideAtCompileTime of both MatrixQR and HCoeffs is 1. Then I assume the conditions of same scalar type and VectorAtCompileTime are guaranteed by the calling method. Let me know if this is incorrect.
Created attachment 409 [details]
Proposed final patch to improve compile-time selection of MKL Householder QR
I applied the patch here:
I only ran the unit tests without MKL, but I trust you that you ran them as well.
It seems the bug is back in the 3.2.4!!
I think there was a regression with the last commit:
Thanks for your work!
The patch has been applied to the devel branch and it has never been backported to the 3.2 branch. I'll do.