Summary: | EigenContractionKernel causes ptx error(cost too much shared memory) | ||
---|---|---|---|
Product: | Eigen | Reporter: | xiah <xiah_sunny> |
Component: | Tensor | Assignee: | Nobody <eigen.nobody> |
Status: | DECISIONNEEDED --- | ||
Severity: | Unknown | CC: | benoit.steiner.goog, chtz, gael.guennebaud, rmlarsen |
Priority: | Normal | ||
Version: | 3.3 (current stable) | ||
Hardware: | GPU (CUDA) | ||
OS: | All | ||
Whiteboard: |
Description
xiah
2016-04-27 07:56:57 UTC
I guess Benoit should have a look at this. Maybe add a compile time define to declare the size of available shared memory? You want to pack as many values in shared memory as possible in order to maximize the performance. Since the amount of shared memory is fixed, this number depend on the size of the scalar used in the contraction. This means that we need to specialize the kernels for each possible input type. Unfortunately we haven't had time to do this so far. One good strategy would be to write a fallback kernel that does a decent job on the biggest scalar we're likely to encounter (probably complex<double> and use this unless we have an optimized kernel for the type we care about. Another strategy would be to call cuBlas directly whenever possible (i.e. when the input data for the 2 operand is directly addressable by pointer.) -- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1212. |