Created attachment 654 [details] testcase The current way to set the number of threads in Eigen is not flexible enough, consider the following example, on a machine with 8 cores: void f() { Eigen::setNbThreads( 1 ); #pragma omp parallel for num_threads( 4 ) for( int i = 0; i < 100; ++i ) { #pragma omp parallel for num_threads( 2 ) for( int j = 0; j < 20; j++ ) { // do some matrix computation } // sequential part: Eigen::setNbThreads( 2 ); // do some matrix computation using eigen's parallelism. Eigen::setNbThreads( 1 ); #pragma omp paralell for num_threads( 2 ) for( int j = 0; j < 20; j++ ) { // do some matrix computation } } } The problem is that the thread count acts as a global variable (with static storage duration), and therefore it is not guarantee that the sequential part be performed with 2 threads in Eigen, and the parallel parts be performed with 1 thread in Eigen. Would it be possible to provide a way to achieve that goal ? The obvious answer would be to make the thread count thread local. I guess that it is not possible for compatibility reasons, and on platforms that perhaps do not have TLS support. Hence the idea would be to add a new function to set the thread count per thread: void Eigen::setNbThreadsInThisThread( int nbThreads ); What do you think ? testcase attached.
You can probably workaround the issue by never calling Eigen::setNbThreads, and rather controlling the number of thread using omp_set_num_threads(). Indeed, by default Eigen's uses the value returned by omp_get_max_threads(). In case you already called Eigen::setNbThreads and want to retrieve the default behavior, then call: Eigen::setNbThreads(-1); Any negative number will do. In the future, we should extend the API to enable per expression/per block tuning of the evaluation process (not only the number of threads). For instance, in the tensor module, you can do: SomeDevice</*compile-time-paramters*/> dev(/*runtime params*/); res.device(dev) = expr; "dev" is passed to expr for its evaluation.
(In reply to Gael Guennebaud from comment #1) > You can probably workaround the issue by never calling Eigen::setNbThreads, > and rather controlling the number of thread using omp_set_num_threads(). > Indeed, by default Eigen's uses the value returned by omp_get_max_threads(). Ah, I see, thank you. Then all I'm asking for is that Eigen::setNbThreads (or an other function) behaves the same than omp_set_num_threads :-) Though, not using Eigen::setNbThreads leads me to insert a call to omp_set_num_threads inside a lot of loops. A dedicated function to achieve that would have my preference, but Oh well, acceptable. > In the future, we should extend the API to enable per expression/per block > tuning of the evaluation process (not only the number of threads). For > instance, in the tensor module, you can do: > > SomeDevice</*compile-time-paramters*/> dev(/*runtime params*/); > > res.device(dev) = expr; > > "dev" is passed to expr for its evaluation. For blocks as well ? SomeDevice</*compile-time-paramters*/> dev(/*runtime params*/); //... { ScopedParm p( dev ); // all exprs here use "dev" } ?
Created attachment 665 [details] testcase
-- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1169.