Currently, for large enough matrices we dynamically allocate memory for the blocks for each call. With C++11 it should be possible to allocate only once at the first call in a thread safe manner. Of course, we would keep the current strategy as fallback for non C++11 builds. Some questions through: - shall we allocate for the maximal block size? This would be a waste of memory if the user only deal with, say, 50x50 matrices (blocks are much larger!) - how to properly release the memory?
As mentioned on the mailing list, we also don't always use stack allocation for parallelized GEMM. Not sure how easy it is to integrate this (we could certainly not encapsulate the allocation as it is now).
I think Whaley is working on threaded version of http://math-atlas.sourceforge.net/ , or upgrading the threading for latest hardware, so that mailing list might be a good place to ask or search for ideas.
This is related to bug 1568.
-- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1364.