Migrate kernel specialization infra to ptx_dispatch#1735
Migrate kernel specialization infra to ptx_dispatch#1735alliepiper wants to merge 7 commits intoNVIDIA:mainfrom
ptx_dispatch#1735Conversation
|
run tests |
|
run tests |
|
run tests |
3 similar comments
|
run tests |
|
run tests |
|
run tests |
e04685b to
a7d19dc
Compare
|
run tests |
a7d19dc to
67b370d
Compare
|
run tests |
67b370d to
fb3dae9
Compare
|
run tests |
fb3dae9 to
28ddfca
Compare
|
run tests |
28ddfca to
99cdc8e
Compare
|
run tests |
99cdc8e to
25b4b04
Compare
|
run tests |
0002fc2 to
c86a3b7
Compare
|
Resolved conflicts, restarting CI for new CUB tests. run tests |
05c3527 to
a78f492
Compare
|
run tests |
a78f492 to
cab82b7
Compare
|
run tests |
cab82b7 to
b869c2c
Compare
|
run tests |
|
Builds were interrupted by a Jenkin upgrade. run tests |
|
run tests |
b869c2c to
5a87579
Compare
|
run tests |
|
|
||
| if (d_temp_storage == NULL) | ||
| { | ||
| if (d_temp_storage == nullptr) |
There was a problem hiding this comment.
| if (d_temp_storage == nullptr) | |
| if (!d_temp_storage) |
|
|
||
| reduce_agent ra(reduce_plan, num_items, stream, vshmem_ptr, "reduce_agent: single_tile only", debug_sync); | ||
| ra.launch(input_it, output_it, num_items, reduction_op); | ||
| char *vshmem_ptr = vshmem_size > 0 |
There was a problem hiding this comment.
Above you have char *const. We consolidate all uses
|
|
||
| size_t vshmem_storage = core::vshmem_size(partition_plan.shared_memory_size, | ||
| num_tiles); | ||
| if (!d_temp_storage) |
There was a problem hiding this comment.
No Change requested:
There is a whole lot of duplication with respect to the partitioning of the work. I am wondering whether we can extract that common functionality into a common helper function.
Not in this PR though
| for (size_t i = 0; i < n; i++) | ||
| { | ||
| // XXX Use proper random number generation facility. | ||
| h_keys[i] = FixedVector<T, N>(rand()); |
There was a problem hiding this comment.
Do we want to take the opportunity and use something from <random>
| static const cub::CacheLoadModifier LOAD_MODIFIER = _LOAD_MODIFIER; | ||
| static const cub::BlockScanAlgorithm SCAN_ALGORITHM = _SCAN_ALGORITHM; | ||
| }; // struct PtxPolicy | ||
| static constexpr int BLOCK_THREADS = _BLOCK_THREADS; |
There was a problem hiding this comment.
I would rather rename the template arguments than store additional static variables
| CUB_MIN(NOMINAL_4B_ITEMS_PER_THREAD, | ||
| CUB_MAX(1, (NOMINAL_4B_ITEMS_PER_THREAD * 4 / sizeof(T)))); | ||
|
|
||
| using Policy = PtxPolicy<128, |
There was a problem hiding this comment.
Can we give the magic 128 a name?
|
@allisonvacanti and I talked and we agreed to close this PR for now as this work has been de-prioritized. We will pick it back up if/when it becomes a priority again. |
No description provided.