-
Notifications
You must be signed in to change notification settings - Fork 93
Closed
Description
Possibly related to #216 except that I have CUDA. Post installation my tests fail with a timeout error.
$ DEVICE=cuda1 make test
Running tests...
Test project /export/a16/gkumar/code/libgpuarray/Build
Start 1: test_types
1/6 Test #1: test_types ....................... Passed 0.04 sec
Start 2: test_util
2/6 Test #2: test_util ........................ Passed 0.03 sec
Start 3: test_array
3/6 Test #3: test_array .......................***Failed 8.14 sec
Start 4: test_elemwise
4/6 Test #4: test_elemwise ....................***Failed 89.57 sec
Start 5: test_error
5/6 Test #5: test_error ....................... Passed 1.98 sec
Start 6: test_buffer
6/6 Test #6: test_buffer ...................... Passed 1.88 sec
67% tests passed, 2 tests failed out of 6
Total Test time (real) = 101.71 sec
The following tests FAILED:
3 - test_array (Failed)
4 - test_elemwise (Failed)
Errors while running CTest
make: *** [test] Error 8
Here's the output from an individual run:
$ DEVICE='cuda1' tests/check_elemwise
Running suite(s): elemwise
0%: Checks: 11, Failures: 0, Errors: 11
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:43:E:contig:test_contig_simple:0: (after this point) Test timeout expired
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:105:E:contig:test_contig_f16:0: (after this point) Test timeout expired
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:155:E:contig:test_contig_0:0: (after this point) Test timeout expired
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:206:E:basic:test_basic_simple:0: (after this point) Test timeout expired
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:269:E:basic:test_basic_f16:0: (after this point) Test timeout expired
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:333:E:basic:test_basic_offset:0: (after this point) Test timeout expired
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:392:E:basic:test_basic_remove1:0: (after this point) Test timeout expired
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:458:E:basic:test_basic_broadcast:0: (after this point) Test timeout expired
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:520:E:basic:test_basic_collapse:0: (after this point) Test timeout expired
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:589:E:basic:test_basic_neg_strides:0: (after this point) Test timeout expired
/export/a16/gkumar/code/libgpuarray/tests/check_elemwise.c:643:E:basic:test_basic_0:0: (after this point) Test timeout expired
Any thoughts on what could be wrong?
Here's the output from my nvidia-smi
$ nvidia-smi
Mon Aug 8 00:39:30 2016
+------------------------------------------------------+
| NVIDIA-SMI 346.46 Driver Version: 346.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m On | 0000:05:00.0 Off | 0 |
| N/A 54C P0 106W / 225W | 3185MiB / 4799MiB | 51% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m On | 0000:42:00.0 Off | 0 |
| N/A 47C P8 17W / 225W | 14MiB / 4799MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 33698 C nnet3-chain-train 3169MiB |
+-----------------------------------------------------------------------------+
Metadata
Metadata
Assignees
Labels
No labels