This repository was archived by the owner on Mar 21, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 464
Add libcu++ dependency; initial round of NV_IF_TARGET ports.
#448
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3886111 to
9414e43
Compare
NV_IF_TARGET ports.
9414e43 to
b1bbe02
Compare
gevtushenko
suggested changes
Apr 11, 2022
Collaborator
gevtushenko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of code is much cleaner now, thanks! There are a few minor changes that need to be addressed.
b1bbe02 to
3efed83
Compare
robertmaynard
approved these changes
May 3, 2022
3efed83 to
b523fc5
Compare
gevtushenko
approved these changes
May 11, 2022
Collaborator
gevtushenko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan on how to address dynamic shared memory allocation without PTX_ARCH when we support redux?. If so, this can be merged.
Collaborator
Author
We'll need to use |
b523fc5 to
f037174
Compare
nvc++ will stop defining __NVCOMPILER_CUDA_ARCH__ soon, removing the ability to determine the PTX arch at compile time. This updates agents and collective algorithms to no longer require the PTX_ARCH template parameter, and changes the CUB_WARP_SIZE(PTX_ARCH), etc helpers to ignore their argument. These macros only differed on obsolete arches and have no effect on currently supported architectures.
This fixes the issue reported in NVIDIA#299. There's no clear reason why this should use `RandomBits` unconditionally.
The merge sort test with pow2 >20 fails on GTX 1650. Detect bad_alloc failures and skip those tests. Tests for smaller problem sizes will still fail if there's a bad_alloc.
f037174 to
4de961a
Compare
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
helps: nvc++
Helps or needed by NVC++.
P0: must have
Absolutely necessary. Critical issue, major blocker, etc.
release: breaking change
Include in "Breaking Changes" section of release notes.
type: enhancement
New feature or request.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Requires NVIDIA/thrust#1605.
This PR contains an initial set of changes necessary to migrate Thrust and CUB to NV_IF_TARGET and remove dependence on
__CUDA_ARCH__. It does not fully remove all usages of__CUDA_ARCH__, but rather focuses on the following:#ifdef __CUDA_ARCH__to useNV_IF_TARGET.This also includes various bug fixes for issues exposed by the above.
Future PRs will address the remaining usages of
__CUDA_ARCH__in the CDP macros and the kernel dispatch infrastructure.Pre-written Release Notes
Breaking Changes
NV_IF_TARGETports. #448 Add libcu++ dependency.NV_IF_TARGETports. #448: The following macros are no longer defined by default. They can be re-enabled by definingCUB_PROVIDE_LEGACY_ARCH_MACROS. These will be completely removed in a future release.CUB_IS_HOST_CODE: Replace withNV_IF_TARGET.CUB_IS_DEVICE_CODE: Replace withNV_IF_TARGET.CUB_INCLUDE_HOST_CODE: Replace withNV_IF_TARGET.CUB_INCLUDE_DEVICE_CODE: Replace withNV_IF_TARGET.Other Enhancements
NV_IF_TARGETports. #448: Removed special case code for unsupported CUDA architectures.NV_IF_TARGETports. #448: Replace several usages of__CUDA_ARCH__with<nv/target>to handle host/device code divergence.NV_IF_TARGETports. #448: Mark unused PTX arch parameters as legacy.