For decomposing/composing transform matrices in Cuda kernels, I am using Transform primitive and calling Transform<>::fromPositionOrientationScale or Transform<>::computeRotationScaling causing my application to freeze. Running Cuda-Memcheck report prints:

========= Invalid __global__ read of size 4
=========     at 0x00000050 in kernel(unsigned long, Eigen::Matrix<float, int=4, int=4, int=2, int=4, int=4>*)
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x00000000 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib64/ (cuLaunchKernel + 0x2c5) [0x204235]
=========     Host [0xd23d]
=========     Host (cudaLaunch + 0x143) [0x33783]
=========     Host Frame:EigenTest [0xf4d]
=========     Host Frame:EigenTest (_Z63__device_stub__Z6kernelmPN5Eigen6MatrixIfLi4ELi4ELi2ELi4ELi4EEEmPN5Eigen6MatrixIfLi4ELi4ELi2ELi4ELi4EEE + 0x67) [0xe53]
=========     Host Frame:EigenTest (_Z6kernelmPN5Eigen6MatrixIfLi4ELi4ELi2ELi4ELi4EEE + 0x23) [0xe7e]
=========     Host Frame:EigenTest (_Z4testv + 0x91) [0xd86]
=========     Host Frame:EigenTest (main + 0x9) [0xcd9]
=========     Host Frame:/lib64/ (__libc_start_main + 0xfd) [0x1ed1d]
=========     Host Frame:EigenTest [0xc09]
========= Program hit cudaErrorLaunchFailure (error 4) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib64/ [0x2ef503]
=========     Host (cudaDeviceSynchronize + 0x166) [0x334a6]
=========     Host Frame:EigenTest (_Z4testv + 0x96) [0xd8b]
=========     Host Frame:EigenTest (main + 0x9) [0xcd9]
=========     Host Frame:/lib64/ (__libc_start_main + 0xfd) [0x1ed1d]
=========     Host Frame:EigenTest [0xc09]

On my system I have Cuda 8.0.44 and Quadro M5000. My guess is that somewhere in the Eigen it's trying to call 'host' function. But I wasn't able to find it. Hopefully it should be easy to reproduce it, I was able to get my application freezing every single time. Also checked the available system/device memory prior to test. I had free 62GB system/8GB device memory.

Attached project for your reference.

Many Thanks
Comment 1 Ali Nakipoglu 2017-04-21 09:34:17 UTC
I managed to find a workaround:

EigenTranslationT translation( ... );
EigenQuaternionT rotation( ... );
EigenTransformT scale;

scale.linear().diagonal()   = EigenVectorT( .... );

EigenTransformT transform   = translation * rotation * scale; 

I don't understand why this is works but not Transform::fromPositionOrientationScale. The only difference I can see is how or which DiagonalMatrix methods called.
