This bugzilla service is closed. All entries have been migrated to https://gitlab.com/libeigen/eigen
Bug 1554 - CUDA-related Tensor unit tests fail
Summary: CUDA-related Tensor unit tests fail
Status: DECISIONNEEDED
Alias: None
Product: Eigen
Classification: Unclassified
Component: Tensor (show other bugs)
Version: 3.4 (development)
Hardware: GPU (CUDA) Linux
: Low Failed Unit Test
Assignee: Nobody
URL:
Whiteboard:
Keywords: BuildSystem
Depends on:
Blocks:
 
Reported: 2018-06-07 12:46 UTC by Christoph Hertzberg
Modified: 2019-12-04 17:41 UTC (History)
4 users (show)



Attachments

Description Christoph Hertzberg 2018-06-07 12:46:48 UTC
I'm having trouble running the CUDA test-cases.

First of all, when compiling after just enabling `EIGEN_TEST_CUDA`, I'm getting 
  "__builtin_ia32_monitorx" is undefined
and related errors.
A workaround is to add some flags to CUDA_NVCC_FLAGS:
-D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -D__STRICT_ANSI__
(Source: https://github.com/NVIDIA/nccl/issues/29)

After that, all tests compile (with numerous warnings), but all tests except `cuda_basic` fail:


Test project /home/chtz/workspace/eigen-bisect/build-nvcc
      Start 701: cuda_basic
 1/35 Test #701: cuda_basic ............................   Passed    0.37 sec
      Start 891: cxx11_tensor_complex_cuda
 2/35 Test #891: cxx11_tensor_complex_cuda .............***Exception: Illegal  0.29 sec
      Start 892: cxx11_tensor_complex_cwise_ops_cuda
 3/35 Test #892: cxx11_tensor_complex_cwise_ops_cuda ...***Exception: Illegal  0.17 sec
      Start 893: cxx11_tensor_reduction_cuda_1
 4/35 Test #893: cxx11_tensor_reduction_cuda_1 .........***Exception: Illegal  0.14 sec
      Start 894: cxx11_tensor_reduction_cuda_2
 5/35 Test #894: cxx11_tensor_reduction_cuda_2 .........***Exception: Illegal  0.14 sec
      Start 895: cxx11_tensor_reduction_cuda_3
 6/35 Test #895: cxx11_tensor_reduction_cuda_3 .........***Exception: Illegal  0.15 sec
      Start 896: cxx11_tensor_reduction_cuda_4
 7/35 Test #896: cxx11_tensor_reduction_cuda_4 .........***Exception: Illegal  0.14 sec
      Start 897: cxx11_tensor_reduction_cuda_5
 8/35 Test #897: cxx11_tensor_reduction_cuda_5 .........***Exception: Illegal  0.14 sec
      Start 898: cxx11_tensor_reduction_cuda_6
 9/35 Test #898: cxx11_tensor_reduction_cuda_6 .........***Exception: Illegal  0.14 sec
      Start 899: cxx11_tensor_argmax_cuda_1
10/35 Test #899: cxx11_tensor_argmax_cuda_1 ............***Exception: Illegal  0.14 sec
      Start 900: cxx11_tensor_argmax_cuda_2
11/35 Test #900: cxx11_tensor_argmax_cuda_2 ............***Exception: Illegal  0.13 sec
      Start 901: cxx11_tensor_argmax_cuda_3
12/35 Test #901: cxx11_tensor_argmax_cuda_3 ............***Exception: Illegal  0.14 sec
      Start 902: cxx11_tensor_cast_float16_cuda
13/35 Test #902: cxx11_tensor_cast_float16_cuda ........***Exception: Illegal  0.15 sec
      Start 903: cxx11_tensor_scan_cuda_1
14/35 Test #903: cxx11_tensor_scan_cuda_1 ..............***Exception: Illegal  0.13 sec
      Start 904: cxx11_tensor_scan_cuda_2
15/35 Test #904: cxx11_tensor_scan_cuda_2 ..............***Exception: Illegal  0.14 sec
      Start 907: cxx11_tensor_cuda_1
16/35 Test #907: cxx11_tensor_cuda_1 ...................***Exception: Illegal  0.14 sec
      Start 908: cxx11_tensor_cuda_2
17/35 Test #908: cxx11_tensor_cuda_2 ...................***Exception: Illegal  0.14 sec
      Start 909: cxx11_tensor_cuda_3
18/35 Test #909: cxx11_tensor_cuda_3 ...................***Exception: Illegal  0.14 sec
      Start 910: cxx11_tensor_cuda_4
19/35 Test #910: cxx11_tensor_cuda_4 ...................***Exception: Illegal  0.14 sec
      Start 911: cxx11_tensor_cuda_5
20/35 Test #911: cxx11_tensor_cuda_5 ...................***Exception: Illegal  0.14 sec
      Start 912: cxx11_tensor_cuda_6
21/35 Test #912: cxx11_tensor_cuda_6 ...................***Exception: Illegal  0.14 sec
      Start 913: cxx11_tensor_contract_cuda_1
22/35 Test #913: cxx11_tensor_contract_cuda_1 ..........***Exception: Illegal  0.13 sec
      Start 914: cxx11_tensor_contract_cuda_2
23/35 Test #914: cxx11_tensor_contract_cuda_2 ..........***Exception: Illegal  0.13 sec
      Start 915: cxx11_tensor_contract_cuda_3
24/35 Test #915: cxx11_tensor_contract_cuda_3 ..........***Exception: Illegal  0.13 sec
      Start 916: cxx11_tensor_contract_cuda_4
25/35 Test #916: cxx11_tensor_contract_cuda_4 ..........***Exception: Illegal  0.14 sec
      Start 917: cxx11_tensor_contract_cuda_5
26/35 Test #917: cxx11_tensor_contract_cuda_5 ..........***Exception: Illegal  0.13 sec
      Start 918: cxx11_tensor_contract_cuda_6
27/35 Test #918: cxx11_tensor_contract_cuda_6 ..........***Exception: Illegal  0.15 sec
      Start 919: cxx11_tensor_contract_cuda_7
28/35 Test #919: cxx11_tensor_contract_cuda_7 ..........***Exception: Illegal  0.14 sec
      Start 920: cxx11_tensor_contract_cuda_8
29/35 Test #920: cxx11_tensor_contract_cuda_8 ..........***Exception: Illegal  0.14 sec
      Start 921: cxx11_tensor_contract_cuda_9
30/35 Test #921: cxx11_tensor_contract_cuda_9 ..........***Exception: Illegal  0.13 sec
      Start 922: cxx11_tensor_of_float16_cuda_1
31/35 Test #922: cxx11_tensor_of_float16_cuda_1 ........***Exception: Illegal  0.17 sec
      Start 923: cxx11_tensor_of_float16_cuda_2
32/35 Test #923: cxx11_tensor_of_float16_cuda_2 ........***Exception: Other  0.17 sec
      Start 924: cxx11_tensor_of_float16_cuda_3
33/35 Test #924: cxx11_tensor_of_float16_cuda_3 ........***Exception: Illegal  0.17 sec
      Start 925: cxx11_tensor_of_float16_cuda_4
34/35 Test #925: cxx11_tensor_of_float16_cuda_4 ........***Exception: Other  0.18 sec
      Start 926: cxx11_tensor_of_float16_cuda_5
35/35 Test #926: cxx11_tensor_of_float16_cuda_5 ........***Exception: Other  0.18 sec

3% tests passed, 34 tests failed out of 35

Label Time Summary:
Official       =   0.37 sec (1 test)
Unsupported    =   5.05 sec (34 tests)

~~~~~~~~~~~~~~~~~~~~~~

Output of ./test/cuda_basic (yes the GPU is quite old):

CUDA device info:
  name:                        GeForce GTX 460 SE
  capability:                  2.1
  multiProcessorCount:         6
  maxThreadsPerMultiProcessor: 1536
  warpSize:                    32
  regsPerBlock:                32768
  concurrentKernels:           1
  clockRate:                   1296000
  canMapHostMemory:            1
  computeMode:                 0


And nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17


Am I missing something obvious, or is my compiler or GPU too old for tensor-CUDA support?
Comment 1 Gael Guennebaud 2018-06-07 14:55:01 UTC
Works for me.

CUDA device info:
  name:                        GeForce GT 750M
  capability:                  3.0
  multiProcessorCount:         2
  maxThreadsPerMultiProcessor: 2048
  warpSize:                    32
  regsPerBlock:                65536
  concurrentKernels:           1
  clockRate:                   925500
  canMapHostMemory:            1
  computeMode:                 0

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:46_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
Comment 2 Benoit Steiner 2018-06-07 16:02:59 UTC
The default value of EIGEN_CUDA_COMPUTE_ARCH is 30, but your GPU only supports the instructions available in architecture 2.1. The most likely problem is that nvcc will generate instructions not supported by your GeForce GTX 460 SE. You should be able to run the tests by setting EIGEN_CUDA_COMPUTE_ARCH to 21 when compiling.
Comment 3 Christoph Hertzberg 2018-06-07 17:24:13 UTC
Changing EIGEN_CUDA_COMPUTE_ARCH to 21 adds a `-arch compute_21` which is not supported (`-arch compute_20` is as well as `-arch sm_21`).

I noticed that I accidentally had `EIGEN_TEST_F16C` activated which seems to be incompatible with sm_21, but nvcc politely compiled this without complaining :-/

Compiling and testing with `-arch sm_21` (and without half-precision tests) does succeed :)

It would be nice, if half-precision compatibility would be checked at configuration time (do you know the minimum architecture required?) and rejected if not available.
Furthermore, testing for the right COMPUTE_ARCH at runtime would be great (instead of silently trying to execute incompatible code).

Also, I still require the `CUDA_NVCC_FLAGS` workaround mentioned at the top (this might be an nvcc-7.5 issue).
Shall we automatically add these for NVCC_VERSION<8.0?


Overall, these are mostly convenience issues.
Comment 4 Nobody 2019-12-04 17:41:50 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1554.

Note You need to log in before you can comment on or make changes to this bug.