Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Conversation

@3gx
Copy link
Contributor

@3gx 3gx commented Apr 6, 2016

No description provided.

@jaredhoberock
Copy link
Contributor

Thanks Evghenii. This looks like a good first cut. I think we'll want to collapse the copy + synchronize code at the end of reduce().

There's already a Thrust primitive called get_value(). This is the function that detail::reference's conversion operator calls when it needs to convert a reference into a value. Since get_value() will suffer from the same problem as reduce() exposed, I suggest the following course of action:

  • call your synchronize() function at the end of get_value() to ensure that the result is ready
  • have reduce() call something like return thrust::detail::get_value(policy, partial_sums[0])

That way, all the required synchronization is located in one place and we don't need to duplicate so much code in reduce() and other functions where this problem will inevitably show up again.

What do you think?

@jaredhoberock
Copy link
Contributor

Having taken a look at the implementation of get_value(), it looks like it is itself implemented with assign_value(). That might be the correct place to put the call to synchronize().

@3gx
Copy link
Contributor Author

3gx commented Apr 6, 2016

I attempted to replace parital_sum[0] with get_value(exec, &partial_sum[0]) but it terminates with

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  an illegal memory access was encountered
Aborted (core dumped)

I don't understand what happens, exactly; get_value calls assign_value https://github.com/thrust/thrust/blob/master/thrust/system/cuda/detail/get_value.h#L57

and this would be a good place to insert sync, e.g.

synchronize(stream(derived_cast(exec)), "kaboom")

but first I need to deal with segfault. Doesn't &partial_sum[0] return device_ptr, and assign_value will dispatch the correct copy? If not, please advice on what is going on, thanks.

@3gx
Copy link
Contributor Author

3gx commented Apr 6, 2016

Update: seems there was some bug, which got fixed. Need still help in understanding the following:

replacing return partial_sums[0] with return thrust::detail::get_value(exec, &partial_sums[0]) still copies in default stream. Why is that?

@jaredhoberock
Copy link
Contributor

I think it is because trivial_copy_n, which this code will eventually call, is hard-coded to use the legacy stream:

https://github.com/thrust/thrust/blob/master/thrust/system/cuda/detail/trivial_copy.inl#L137

You should investigate what happens when you try relaxing that to use the stream inside the execution policy. If all tests pass, we should be good

@3gx
Copy link
Contributor Author

3gx commented Apr 7, 2016

All tests pass with the 2 known failures with this change.

#include <thrust/system/cuda/detail/decomposition.h>
#include <thrust/system/cuda/detail/execution_policy.h>
#include <thrust/system/cuda/detail/execute_on_stream.h>
#include <thrust/system/cuda/detail/synchronize.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this header is superfluous because reduce does not call synchronize.

@jaredhoberock
Copy link
Contributor

It looks pretty good modulo the cosmetic changes.

I think we should avoid checking for a null reference in cuda_memcpy_stream().

I assume the reason we receive null execution policy references at that point in the code is due to these shenanigans here:

https://github.com/thrust/thrust/blob/master/thrust/detail/reference.inl#L114

Rather than continue to traffic in null pointers, I think we should see what happens if we just default-construct a system inside of reference's conversion to value operator and call get_value() directly.

@3gx
Copy link
Contributor Author

3gx commented Apr 7, 2016

I'll be investigating solutions to avoid checking for null references. Keep this PR open for now, if solution is simple I'll commit it with this PR.

// XXX avoid default-constructing a system
// XXX use null references for dispatching
// XXX this assumes that the eventual invocation of
// XXX assign_value will not access system state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be eliminated because it no longer applies

@jaredhoberock
Copy link
Contributor

The changes look good so far. I think the only thing left is to eliminate some of that code related to detail::reference which no longer has a purpose.

@3gx
Copy link
Contributor Author

3gx commented Apr 11, 2016

Can you verify that all the unnecessary code has been eliminated ? Thanks

// we may wish to enable async host <-> device copy in the future
trivial_copy_detail::checked_cudaMemcpyAsync(dst, src, n * sizeof(T), kind, legacy_stream());

// XXX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to drop this XXX since we've resolved the problem it was highlighting

@3gx
Copy link
Contributor Author

3gx commented Apr 11, 2016

It looks like strip_const_assign_value should be removable, but something strange is going on in attempt to remove the function:

Replacing

strip_const_assign_value(select_system(system1, system2), src);

in https://github.com/thrust/thrust/blob/1.8.3/thrust/detail/reference.inl#L139 with

using thrust::system::detail::generic::assign_value;
assign_value(thrust::detail::derived_cast(select_system(system1, system2)), m_ptr, src);

Generates ptxas linking error...

ptxas fatal   : Unresolved extern function '_ZN6thrust6system6detail7generic12assign_valueINS_7pointerINS_5tupleIblNS_9null_typeES6_S6_S6_S6_S6_S6_S6_EENS0_4cuda6detail17execute_on_streamENS_11use_defaultESB_EEPKS7_EEvNS2_3tagET_T0_'

Any idea what might be going on?

@jaredhoberock
Copy link
Contributor

I don't know what's going on, but here is one clue:

When I plugged that name into c++filt it indicated that it's the system::detail::generic overload of assign_value

It seems like it should have chosen the CUDA-specific overload of this function.

@3gx
Copy link
Contributor Author

3gx commented Apr 11, 2016

I see, the generic overloads is better match for

assign_value(thrust::detail::derived_cast(select_system(system1, system2)), m_ptr, src); 

because the first argument is rvalue, and CUDA-specific overload is for lvalue ref. It appears strip_const_assign_value from the beginning served a purpose of converting rvalue to lvalue ref, and thus should be kept.

@jaredhoberock
Copy link
Contributor

Nice analysis, I agree.

@jaredhoberock jaredhoberock merged commit 1e61e91 into NVIDIA:1.8.3 Apr 11, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants