-
Notifications
You must be signed in to change notification settings - Fork 761
synchronize in user provided stream when readin data from D->H #771
Conversation
|
Thanks Evghenii. This looks like a good first cut. I think we'll want to collapse the copy + synchronize code at the end of There's already a Thrust primitive called
That way, all the required synchronization is located in one place and we don't need to duplicate so much code in What do you think? |
|
Having taken a look at the implementation of |
|
I attempted to replace I don't understand what happens, exactly; and this would be a good place to insert sync, e.g.
but first I need to deal with segfault. Doesn't |
|
Update: seems there was some bug, which got fixed. Need still help in understanding the following: replacing |
|
I think it is because trivial_copy_n, which this code will eventually call, is hard-coded to use the legacy stream: You should investigate what happens when you try relaxing that to use the stream inside the execution policy. If all tests pass, we should be good |
|
All tests pass with the 2 known failures with this change. |
…e user-provided stream if present
thrust/system/cuda/detail/reduce.inl
Outdated
| #include <thrust/system/cuda/detail/decomposition.h> | ||
| #include <thrust/system/cuda/detail/execution_policy.h> | ||
| #include <thrust/system/cuda/detail/execute_on_stream.h> | ||
| #include <thrust/system/cuda/detail/synchronize.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this header is superfluous because reduce does not call synchronize.
|
It looks pretty good modulo the cosmetic changes. I think we should avoid checking for a null reference in I assume the reason we receive null execution policy references at that point in the code is due to these shenanigans here: https://github.com/thrust/thrust/blob/master/thrust/detail/reference.inl#L114 Rather than continue to traffic in null pointers, I think we should see what happens if we just default-construct a system inside of |
|
I'll be investigating solutions to avoid checking for null references. Keep this PR open for now, if solution is simple I'll commit it with this PR. |
thrust/detail/reference.inl
Outdated
| // XXX avoid default-constructing a system | ||
| // XXX use null references for dispatching | ||
| // XXX this assumes that the eventual invocation of | ||
| // XXX assign_value will not access system state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be eliminated because it no longer applies
|
The changes look good so far. I think the only thing left is to eliminate some of that code related to |
|
Can you verify that all the unnecessary code has been eliminated ? Thanks |
| // we may wish to enable async host <-> device copy in the future | ||
| trivial_copy_detail::checked_cudaMemcpyAsync(dst, src, n * sizeof(T), kind, legacy_stream()); | ||
|
|
||
| // XXX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to drop this XXX since we've resolved the problem it was highlighting
|
It looks like Replacing in https://github.com/thrust/thrust/blob/1.8.3/thrust/detail/reference.inl#L139 with Generates Any idea what might be going on? |
|
I don't know what's going on, but here is one clue: When I plugged that name into It seems like it should have chosen the CUDA-specific overload of this function. |
|
I see, the generic overloads is better match for because the first argument is rvalue, and CUDA-specific overload is for lvalue ref. It appears |
|
Nice analysis, I agree. |
No description provided.