diff --git a/README.md b/README.md index 7ce7462..9e1e52a 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ - Document Number: N4506 - Date: 2015-05-05 + Document Number: N4699 + Date: 2017-10-16 Revises: Project: Programming Language C++ Project Number: TS 19570 @@ -7,24 +7,22 @@ NVIDIA Corporation jhoberock@nvidia.com -# Parallelism TS Editor's Report, post-Lenexa mailing +# Parallelism TS Editor's Report, pre-Albuquerque mailing -N4505 is the latest Parallelism TS Working Draft. It contains editorial and technical changes to the Parallelism TS to apply the following revisions: +N4698 is the proposed working draft of Parallelism TS Version 2. It contains changes to the Parallelism TS as directed by the committee at the Toronto meeting, and editorial changes. - * N4274 - Relaxing Packing Rules for Exceptions Thrown by Parallel Algorithms - Proposed Wording (Revision 1) - * Feature test macro for the Parallelism TS +N4698 updates the previous draft, N4669, published in the pre-Toronto mailing. -N4505 updates the previous draft, N4407, published in the pre-Lenexa mailing. +# Technical Changes -N4507 is document N4505 reformatted as a TS document. It updates N4409, which was published in the pre-Lenexa mailing. +* Apply P0076R4 - Vector and Wavefront Policies. -## Technical Changes +# Editorial Changes -* Applied N4274, which relaxes the exception packaging rules for exceptions thrown by parallel algorithms. Additionally, changed instances of "terminates with (exception)" phrasing to "exits via (exception)", as directed by the Library Working Group. +* Reformat Table 1 - Feature Test Macro(s), to match the style of the Library Fundamentals TS. -* Introduced the feature test macro `__cpp_lib_experimental_parallel_algorithm` for the functionality of the Parallelism TS as directed by SG1. +# Notes -## Editorial Changes - -* Promoted subsection 1.3.1, which was incorrectly grouped under section 1.3, to section 1.4. +* The pre-existing content of N4698 has not yet been harmonized with C++17. As a result, this content is named and namespaced inconsistently with the newly applied content of P0076R4. We anticipate that these inconsistencies will be harmonized by a future revision. +* N4698 contains forward references to `for_loop` and `for_loop_strided`. We anticipate their introduction in a future revision. diff --git a/algorithms.html b/algorithms.html index 0a62818..17ec63d 100644 --- a/algorithms.html +++ b/algorithms.html @@ -88,6 +88,37 @@
+ The invocations of element access functions in parallel algorithms invoked with an
+ execution policy of type unsequenced_policy are permitted to execute
+ in an unordered fashion in the calling thread, unsequenced with respect to one another
+ within the calling thread.
+
+
++ +
+ The invocations of element access functions in parallel algorithms invoked with an
+ executino policy of type vector_policy are permitted to execute
+ in an unordered fashion in the calling thread, unsequenced with respect to one another
+ within the calling thread, subject to the sequencing constraints of wavefront application
+ (for_loop or for_loop_strided.
+
The invocations of element access functions in parallel algorithms invoked with an execution
policy of type parallel_vector_execution_policy
@@ -163,6 +194,107 @@
+ For the purposes of this section, an evaluation is a value computation or side effect of + an expression, or an execution of a statement. Initialization of a temporary object is considered a + subexpression of the expression that necessitates the temporary object. +
+ ++ An evaluation A contains an evaluation B if: + +
+ An evaluation A is ordered before an evaluation B if A is deterministically
+ sequenced before B.
+ For an evaluation A ordered before an evaluation B, both contained in the same + invocation of an element access function, A is a vertical antecedent of B if: + +
goto statement or asm declaration that jumps to a statement outside of S, orswitch statement executed within S that transfers control into a substatement of a nested selection or iteration statement, orthrow longjmp.
+
+ In the following, Xi and Xj refer to evaluations of the same expression
+ or statement contained in the application of an element access function corresponding to the ith and
+ jth elements of the input sequence.
+ Horizontally matched is an equivalence relationship between two evaluations of the same expression. An + evaluation Bi is horizontally matched with an evaluation Bj if: + +
+ Let f be a function called for each argument list in a sequence of argument lists. + Wavefront application of f requires that evaluation Ai be sequenced + before evaluation Bi if i < j and and: + +
ExecutionPolicy algorithm overloads<experimental/algorithm> synopsis<experimental/algorithm> synopsisstd::forward>F<(f)(). When invoked within an element access function
+ in a parallel algorithm using vector_policy, if two calls to no_vec are
+ horizontally matched within a wavefront application of an element access function over input
+ sequence S, then the execution of f in the application for one element in S is
+ sequenced before the execution of f in the application for a subsequent element in
+ S; otherwise, there is no effect on sequencing.
+ f.
+ f returns a result, the result is ignored.
+ f exits via an exception, then terminate will be called, consistent
+ with all other potentially-throwing operations invoked with vector_policy execution.
+
+ extern int* p;
+for_loop(vec, 0, n[&](int i) {
+ y[i] +=y[i+1];
+ if(y[i] < 0) {
+ no_vec([]{
+ *p++ = i;
+ });
+ }
+});
+
+ The updates *p++ = i will occur in the same order as if the policy were seq.
+
+class ordered_update_t {
+ T& ref_; // exposition only
+public:
+ ordered_update_t(T& loc) noexcept
+ : ref_(loc) {}
+ ordered_update_t(const ordered_update_t&) = delete;
+ ordered_update_t& operator=(const ordered_update_t&) = delete;
+
+ template <class U>
+ auto operator=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ = std::move(rhs); }); }
+ template <class U>
+ auto operator+=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ += std::move(rhs); }); }
+ template <class U>
+ auto operator-=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ -= std::move(rhs); }); }
+ template <class U>
+ auto operator*=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ *= std::move(rhs); }); }
+ template <class U>
+ auto operator/=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ /= std::move(rhs); }); }
+ template <class U>
+ auto operator%=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ %= std::move(rhs); }); }
+ template <class U>
+ auto operator>>=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ >>= std::move(rhs); }); }
+ template <class U>
+ auto operator<<=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ <<= std::move(rhs); }); }
+ template <class U>
+ auto operator&=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ &= std::move(rhs); }); }
+ template <class U>
+ auto operator^=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ ^= std::move(rhs); }); }
+ template <class U>
+ auto operator|=(U rhs) const noexcept
+ { return no_vec([&]{ return ref_ |= std::move(rhs); }); }
+
+ auto operator++() const noexcept
+ { return no_vec([&]{ return ++ref_; }); }
+ auto operator++(int) const noexcept
+ { return no_vec([&]{ return ref_++; }); }
+ auto operator--() const noexcept
+ { return no_vec([&]{ return --ref_; }); }
+ auto operator--(int) const noexcept
+ { return no_vec([&]{ return ref_--; }); }
+};
+
+
+
+ An object of type ordered_update_t>T< is a proxy for an object of type T
+ intended to be used within a parallel application of an element access function using a
+ policy object of type vector_policy. Simple increments, assignments, and compound
+ assignments to the object are forwarded to the proxied object, but are sequenced as though
+ executed within a no_vec invocation.
+
+
{ loc }.
+ <experimental/numeric> synopsis
During the execution of a standard parallel algorithm, if the invocation of an element access function
- exits viaterminates with an uncaught exception, the behavior of the program is determined by the type of
+ exits via an uncaught exception, the behavior of the program is determined by the type of
execution policy used to invoke the algorithm:
class parallel_vector_execution_policy,
+ If the execution policy object is of type class parallel_vector_execution_policy, unsequenced_policy, or vector_policy,
std::terminate shall be called.
sequential_execution_policy or
- parallel_execution_policy, the execution of the algorithm exits viaexception_listexception_list containing allexception_list+parallel_execution_policy, the execution of the algorithm exits via an + exception. The exception shall be anexception_listcontaining all uncaught exceptions thrown during + the invocations of element access functions, or optionally the uncaught exception if there was only one.- For example, the number of invocations of the user-provided function object in -whenfor_eachis unspecified. Wfor_eachis executed sequentially, - if an invocation of the user-provided function object throws an exception,for_eachcan exit via the uncaught exception, or throw anexception_listcontaining the original exception. -only one exception will be contained in the+ For example, whenexception_listobject.for_eachis executed sequentially, + if an invocation of the user-provided function object throws an exception,for_eachcan exit via the uncaught exception, or throw anexception_listcontaining the original exception.These guarantees imply that, unless the algorithm has failed to allocate memory and - exits via terminated withstd::bad_alloc, all exceptions thrown during the execution of + exits viastd::bad_alloc, all exceptions thrown during the execution of the algorithm are communicated to the caller. It is unspecified whether an algorithm implementation will "forge ahead" after encountering and capturing a user exception.- The algorithm may exit via terminate withthestd::bad_allocexception even if one or more - user-provided function objects have exited viaterminated withan exception. For example, this can happen when an algorithm fails to allocate memory while + The algorithm may exit via thestd::bad_allocexception even if one or more + user-provided function objects have exited via an exception. For example, this can happen when an algorithm fails to allocate memory while creating or adding elements to theexception_listobject.
<experimental/exception_list> synopsis<experimental/execution_policy> synopsis<experimental/execution_policy> synopsisThe class parallel_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be parallelized.
@@ -126,6 +133,32 @@Parallel+Vector execution policy
+class unsequenced_policy{ unspecified };
+
+
+ The class unsequenced_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized, e.g., executed on a single thread using instructions that operate on multiple data items.
+class vector_policy{ unspecified };
+
+
+ The class vector_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized. Additionally, such vectorization will result in an execution that respects the sequencing constraints of wavefront application ([parallel.alg.general.wavefront]). unsequenced_policy, for example.
This Technical Specification describes requirements for implementations of an - interface that computer programs written in the C++ programming language may - use to invoke algorithms with parallel execution. The algorithms described by - this Technical Specification are realizable across a broad class of - computer architectures.
- -This Technical Specification is non-normative. Some of the functionality - described by this Technical Specification may be considered for standardization - in a future version of C++, but it is not currently part of any C++ standard. - Some of the functionality in this Technical Specification may never be - standardized, and other functionality may be standardized in a substantially - changed form.
- -The goal of this Technical Specification is to build widespread existing - practice for parallelism in the C++ standard algorithms library. It gives - advice on extensions to those vendors who wish to provide them.
-The following referenced document is indispensable for the - application of this document. For dated references, only the - edition cited applies. For undated references, the latest edition - of the referenced document (including any amendments) applies.
- -ISO/IEC 14882:— is herein called the C++ Standard. - The library described in ISO/IEC 14882:— clauses 17-30 is herein called - the C++ Standard Library. The C++ Standard Library components described in - ISO/IEC 14882:— clauses 25, 26.7 and 20.7.2 are herein called the C++ Standard - Algorithms Library.
- -Unless otherwise specified, the whole of the C++ Standard's Library
- introduction (
std. Unless otherwise specified, all
components described in this Technical Specification are declared in namespace
- std::experimental::parallel::v1.
+ std::experimental::parallel::v2.
std.
@@ -60,7 +15,7 @@ Unless otherwise specified, references to such entities described in this
Technical Specification are assumed to be qualified with
- std::experimental::parallel::v1, and references to entities described in the C++
+ std::experimental::parallel::v2, and references to entities described in the C++
Standard Library are assumed to be qualified with std::.
Extensions that are expected to eventually be added to an existing header @@ -72,65 +27,11 @@
For the purposes of this document, the terms and definitions given in the C++ Standard and the following apply.
- -A parallel algorithm is a function template described by this Technical Specification declared in namespace std::experimental::parallel::v1 with a formal template parameter named ExecutionPolicy.
- Parallel algorithms access objects indirectly accessible via their arguments by invoking the following functions: - -
sort function may invoke the following element access functions:
-
- RandomAccessIterator.
- swap function on the elements of the sequence (as per 25.4.1.1 [sort]/2).
- Compare function object.
- An implementation that provides support for this Technical Specification shall define the feature test macro(s) in Table 1.
+__cpp_lib_experimental_parallel_task_block |
+ 201510 | +
+ <experimental/task_block>+ |
+
| Doc. No. | +Title | +Primary Section | +Macro Name | +Value | +Header | +
|---|---|---|---|---|---|
| N4505 | +Working Draft, Technical Specification for C++ Extensions for Parallelism | +__cpp_lib_experimental_parallel_algorithm |
+ 201505 | +
+ <experimental/algorithm>+ <experimental/exception_list>+ <experimental/execution_policy>+ <experimental/numeric>
+ |
+ |
| P0155R0 | +Task Block R5 | +__cpp_lib_experimental_parallel_task_block |
+ 201510 | +
+ <experimental/task_block>+ |
+ |
| P0076R4 | +Vector and Wavefront Policies | +__cpp_lib_experimental_execution_vector_policy |
+ 201707 | +
+ <experimental/algorithm>+ <experimental/execution>+ |
+
The following referenced document is indispensable for the + application of this document. For dated references, only the + edition cited applies. For undated references, the latest edition + of the referenced document (including any amendments) applies.
+ +ISO/IEC 14882:— is herein called the C++ Standard. + The library described in ISO/IEC 14882:— clauses 17-30 is herein called + the C++ Standard Library. The C++ Standard Library components described in + ISO/IEC 14882:— clauses 25, 26.7 and 20.7.2 are herein called the C++ Standard + Algorithms Library.
+ +Unless otherwise specified, the whole of the C++ Standard's Library
+ introduction (
| Document Number: | |
|---|---|
| Document Number: | |
| Date: | |
| Date: | |
| Revises: | |
| Revises: | |
| Editor: |
Note: this is an early draft. It’s known to be incomplet and incorrekt, and it has lots of bad formatting.
std. Unless otherwise specified, all
components described in this Technical Specification are declared in namespace
- std::experimental::parallel::v1.
+ std::experimental::parallel::v2v1.
Unless otherwise specified, references to such entities described in this
Technical Specification are assumed to be qualified with
- std::experimental::parallel::v1, and references to entities described in the C++
+ std::experimental::parallel::v2, and references to entities described in the C++
Standard Library are assumed to be qualified with v1std::.
Extensions that are expected to eventually be added to an existing header @@ -1298,7 +1347,7 @@
For the purposes of this document, the terms and definitions given in the C++ Standard and the following apply.
-A parallel algorithm is a function template described by this Technical Specification declared in namespace std::experimental::parallel::v1 with a formal template parameter named ExecutionPolicy.
A parallel algorithm is a function template described by this Technical Specification declared in namespace std::experimental::parallel::v2 with a formal template parameter named v1ExecutionPolicy.
Parallel algorithms access objects indirectly accessible via their arguments by invoking the following functions: @@ -1316,7 +1365,7 @@
An implementation that provides support for this Technical Specification shall define the feature test macro(s) in Table 1.
+ +An implementation that provides support for this Technical Specification shall define the feature test macro(s) in Table 1.
__cpp_lib_experimental_parallel_task_block |
+ 201510 | +
+ <experimental/task_block>+ |
+
<experimental/execution_policy> synopsis<experimental/execution_policy> synopsisnamespace std {
namespace experimental {
namespace parallel {
-inline namespace v1 {
+inline namespace v2v1 {
// 2.3, Execution policy type trait
template<class T> struct is_execution_policy;
template<class T> constexpr bool is_execution_policy_v = is_execution_policy<T>::value;
@@ -1514,7 +1568,10 @@ Feature-testing recommendations
template<class T> struct is_execution_policy { see below };
- is_execution_policy can be used to detect parallel execution policies for the purpose of excluding function signatures from otherwise ambiguous overload resolution participation.
+ is_execution_policy
+ can be used to detect parallel execution policies for the purpose of
+excluding function signatures from otherwise ambiguous overload
+resolution participation.
is_execution_policy<T> shall be a UnaryTypeTrait with a BaseCharacteristic of true_type if T is the type of a standard or implementation-defined execution policy, otherwise false_type.
@@ -1543,7 +1600,10 @@
Feature-testing recommendations
class sequential_execution_policy{ unspecified };
- The class sequential_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and require that a parallel algorithm's execution may not be parallelized.
+ The class sequential_execution_policy
+ is an execution policy type used as a unique type to disambiguate
+parallel algorithm overloading and require that a parallel algorithm's
+execution may not be parallelized.
class parallel_execution_policy{ unspecified };
- The class parallel_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be parallelized.
The class parallel_execution_policy
+ is an execution policy type used as a unique type to disambiguate
+parallel algorithm overloading and indicate that a parallel algorithm's
+execution may be parallelized.
class parallel_vector_execution_policy{ unspecified };
- The class class parallel_vector_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized and parallelized.
The class class parallel_vector_execution_policy
+ is an execution policy type used as a unique type to disambiguate
+parallel algorithm overloading and indicate that a parallel algorithm's
+execution may be vectorized and parallelized.
During the execution of a standard parallel algorithm, if the invocation of an element access function
- exits viaterminates with an uncaught exception, the behavior of the program is determined by the type of
+ exits via an uncaught exception, the behavior of the program is determined by the type of
execution policy used to invoke the algorithm:
sequential_execution_policy or
- parallel_execution_policy, the execution of the algorithm exits viaexception_listexception_list containing allexception_listparallel_execution_policy, the execution of the algorithm exits via an
+ exception. The exception shall be an exception_list containing all uncaught exceptions thrown during
+ the invocations of element access functions, or optionally the uncaught exception if there was only one.
for_each is unspecified. Wfor_each is executed sequentially,
- if an invocation of the user-provided function object throws an exception, for_each can exit via the uncaught exception, or throw an exception_list containing the original exception.
- exception_list object.for_each is executed sequentially,
+ if an invocation of the user-provided function object throws an exception, for_each can exit via the uncaught exception, or throw an exception_list containing the original exception.
+
— end note ]
std::bad_alloc, all exceptions thrown during the execution of
- the algorithm are communicated to the caller. It is unspecified whether an algorithm implementation will "forge ahead" after
+ exits via std::bad_alloc, all exceptions
+thrown during the execution of
+ the algorithm are communicated to the caller. It is
+unspecified whether an algorithm implementation will "forge ahead" after
+
encountering and capturing a user exception.
— end note ]
std::bad_alloc exception even if one or more
- user-provided function objects have exited viastd::bad_alloc
+ exception even if one or more
+ user-provided function objects have exited via an
+exception. For example, this can happen when an algorithm fails to
+allocate memory while
creating or adding elements to the exception_list object.
— end note ]
@@ -1837,14 +1906,13 @@ <experimental/exception_list> synopsis<experimental/exception_list> synopsis
-namespace std {
+ namespace std {
namespace experimental {
namespace parallel {
-inline namespace v1 {
+inline namespace v2v1 {
class exception_list : public exception
{
@@ -2403,14 +2471,14 @@ Feature-testing recommendations
- 4.3.1 Header <experimental/algorithm> synopsis
[parallel.alg.ops.synopsis]
+ 4.3.1 Header <experimental/algorithm> synopsis
[parallel.alg.ops.synopsis]
namespace std {
namespace experimental {
namespace parallel {
-inline namespace v1 {
+inline namespace v2v1 {
template<class ExecutionPolicy,
class InputIterator, class Function>
void for_each(ExecutionPolicy&& exec,
@@ -2629,14 +2697,14 @@ Feature-testing recommendations
- 4.4.1 Header <experimental/numeric> synopsis
[parallel.alg.numeric.synopsis]
+ 4.4.1 Header <experimental/numeric> synopsis
[parallel.alg.numeric.synopsis]
namespace std {
namespace experimental {
namespace parallel {
-inline namespace v1 {
+inline namespace v2v1 {
template<class InputIterator>
typename iterator_traits<InputIterator>::value_type
reduce(InputIterator first, InputIterator last);
@@ -3275,6 +3343,469 @@ Feature-testing recommendations
+
+
+
+
+
+ 5 Task Block
[parallel.task_block]
+
+
+
+
+
+
+
+ 5.1 Header <experimental/task_block> synopsis
[parallel.task_block.synopsis]
+
+
+
+ namespace std {
+namespace experimental {
+namespace parallel {
+inline namespace v2 {
+ class task_cancelled_exception;
+
+ class task_block;
+
+ template<class F>
+ void define_task_block(F&& f);
+
+ template<class f>
+ void define_task_block_restore_thread(F&& f);
+}
+}
+}
+}
+
+
+
+
+
+
+
+
+
+ 5.2 Class task_cancelled_exception
[parallel.task_block.task_cancelled_exception]
+
+
+ namespace std {
+namespace experimental {
+namespace parallel
+inline namespace v2 {
+
+ class task_cancelled_exception : public exception
+ {
+ public:
+ task_cancelled_exception() noexcept;
+ virtual const char* what() const noexcept;
+ };
+}
+}
+}
+}
+
+
+
+ The class task_cancelled_exception defines the type of objects thrown by
+ task_block::run or task_block::wait if they detect than an
+ exception is pending within the current parallel block. See 5.5 , below.
+
+
+
+
+
+
+ 5.2.1 task_cancelled_exception member function what
[parallel.task_block.task_cancelled_exception.what]
+
+
+
+
+
+ virtual const char* what() const noexcept
+
+
+
+
+
+
+
+ - Returns:
-
+ An implementation-defined NTBS.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 5.3 Class task_block
[parallel.task_block.class]
+
+
+ namespace std {
+namespace experimental {
+namespace parallel {
+inline namespace v2 {
+
+ class task_block
+ {
+ private:
+ ~task_block();
+
+ public:
+ task_block(const task_block&) = delete;
+ task_block& operator=(const task_block&) = delete;
+ void operator&() const = delete;
+
+ template<class F>
+ void run(F&& f);
+
+ void wait();
+ };
+}
+}
+}
+}
+
+
+
+ The class task_block defines an interface for forking and joining parallel tasks. The define_task_block and define_task_block_restore_thread function templates create an object of type task_block and pass a reference to that object to a user-provided function object.
+
+
+
+ An object of class task_block cannot be constructed,
+ destroyed, copied, or moved except by the implementation of the task
+block library. Taking the address of a task_block object via operator& is ill-formed. Obtaining its address by any other means (including addressof) results in a pointer with an unspecified value; dereferencing such a pointer results in undefined behavior.
+
+
+
+ A task_block is active if it was created by the nearest enclosing task block, where “task block” refers to an
+ invocation of define_task_block or define_task_block_restore_thread and “nearest enclosing” means the most
+ recent invocation that has not yet completed. Code designated for execution in another thread by means other
+ than the facilities in this section (e.g., using thread or async) are not enclosed in the task block and a
+ task_block passed to (or captured by) such code is not active within that code. Performing any operation on a
+ task_block that is not active results in undefined behavior.
+
+
+
+ When the argument to task_block::run is called, no task_block is active, not even the task_block on which run was called.
+ (The function object should not, therefore, capture a task_block from the surrounding block.)
+
+
+
+
+ [ Example:
+
+ define_task_block([&](auto& tb) {
+ tb.run([&]{
+ tb.run([] { f(); }); // Error: tb is not active within run
+ define_task_block([&](auto& tb2) { // Define new task block
+ tb2.run(f);
+ ...
+ });
+ });
+ ...
+});
+
+
+ — end example ]
+
+
+ [ Note:
+
+ Implementations are encouraged to diagnose the above error at translation time.
+
+ — end note ]
+
+
+
+
+
+
+ 5.3.1 task_block member function template run
[parallel.task_block.class.run]
+
+
+
+
+
+
+ template<class F> void run(F&& f);
+
+
+
+
+
+
+
+ - Requires:
-
+
F shall be MoveConstructible. DECAY_COPY(std::forward<F>(f))() shall be a valid expression.
+
+
+
+
+
+ - Preconditions:
-
+
*this shall be the active task_block.
+
+
+
+
+
+ - Effects:
-
+ Evaluates
DECAY_COPY(std::forward<F>(f))(), where DECAY_COPY(std::forward<F>(f))
+ is evaluated synchronously within the current thread. The call to the resulting copy of the function object is
+ permitted to run on an unspecified thread created by the implementation in an unordered fashion relative to
+ the sequence of operations following the call to run(f) (the continuation), or indeterminately sequenced
+ within the same thread as the continuation. The call to run synchronizes with the call to the function
+ object. The completion of the call to the function object synchronizes with the next invocation of wait on
+ the same task_block or completion of the nearest enclosing task block (i.e., the define_task_block or
+ define_task_block_restore_thread that created this task_block).
+
+
+
+
+
+ - Throws:
-
+
task_cancelled_exception, as described in 5.5 .
+
+
+
+
+
+ - Remarks:
-
+ The
run function may return on a thread other than the one on which it was called; in such cases,
+ completion of the call to run synchronizes with the continuation.
+
+ [ Note:
+ The return from run is ordered similarly to an ordinary function call in a single thread.
+ — end note ]
+
+
+
+
+
+
+ - Remarks:
-
+ The invocation of the user-supplied function object
f may be immediate or may be delayed until
+ compute resources are available. run might or might not return before the invocation of f completes.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 5.3.2 task_block member function wait
[parallel.task_block.class.wait]
+
+
+
+
+
+
+ void wait();
+
+
+
+
+
+
+
+ - Preconditions:
*this shall be the active task_block.
+
+
+
+
+ - Effects:
-
+ Blocks until the tasks spawned using this
task_block have completed.
+
+
+
+
+
+ - Throws:
-
+
task_cancelled_exception, as described in 5.5 .
+
+
+
+
+
+ - Postconditions:
-
+ All tasks spawned by the nearest enclosing task block have completed.
+
+
+
+
+
+ - Remarks:
-
+ The
wait function may return on a thread other than the one on which it was called; in such cases, completion of the call to wait synchronizes with subsequent operations.
+
+ [ Note:
+ The return from wait is ordered similarly to an ordinary function call in a single thread.
+ — end note ]
+
+
+
+
+ [ Example:
+ define_task_block([&](auto& tb) {
+ tb.run([&]{ process(a, w, x); }); // Process a[w] through a[x]
+ if (y < x) tb.wait(); // Wait if overlap between [w,x) and [y,z)
+ process(a, y, z); // Process a[y] through a[z]
+});
+
+
+ — end example ]
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 5.4 Function template define_task_block
[parallel.task_block.define_task_block]
+
+
+
+
+
+ template<class F>
+void define_task_block(F&& f);
+ template<class F>
+void define_task_block_restore_thread(F&& f);
+
+
+
+
+
+
+
+
+
+
+ - Requires:
-
+ Given an lvalue
tb of type task_block, the expression f(tb) shall be well-formed
+
+
+
+
+
+ - Effects:
-
+ Constructs a
task_block tb and calls f(tb).
+
+
+
+
+
+ - Throws:
-
+
exception_list, as specified in 5.5 .
+
+
+
+
+
+ - Postconditions:
-
+ All tasks spawned from
f have finished execution.
+
+
+
+
+
+ - Remarks:
-
+ The
define_task_block function may return on a thread other than the one on which it was called
+ unless there are no task blocks active on entry to define_task_block (see 5.3 ), in which
+ case the function returns on the original thread. When define_task_block returns on a different thread,
+ it synchronizes with operations following the call. [ Note:
+ The return from define_task_block is ordered
+ similarly to an ordinary function call in a single thread.
+ — end note ]
+ The define_task_block_restore_thread
+ function always returns on the same thread as the one on which it was called.
+
+
+
+
+
+ - Notes:
-
+ It is expected (but not mandated) that
f will (directly or indirectly) call tb.run(function-object).
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 5.5 Exception Handling
[parallel.task_block.exceptions]
+
+
+
+
+ Every task_block has an associated exception list. When the task block starts, its associated exception list is empty.
+
+
+
+ When an exception is thrown from the user-provided function object passed to define_task_block or
+ define_task_block_restore_thread, it is added to the exception list for that task block. Similarly, when
+ an exception is thrown from the user-provided function object passed into task_block::run, the exception
+ object is added to the exception list associated with the nearest enclosing task block. In both cases, an
+ implementation may discard any pending tasks that have not yet been invoked. Tasks that are already in
+ progress are not interrupted except at a call to task_block::run or task_block::wait as described below.
+
+
+
+ If the implementation is able to detect that an exception has been thrown by another task within
+ the same nearest enclosing task block, then task_block::run or task_block::wait may throw
+ task_canceled_exception; these instances of task_canceled_exception are not added to the exception
+ list of the corresponding task block.
+
+
+
+ When a task block finishes with a non-empty exception list, the exceptions are aggregated into an exception_list object, which is then thrown from the task block.
+
+
+
+ The order of the exceptions in the exception_list object is unspecified.
+
+
+
+
+
+
+
+
+
+
diff --git a/parallelism-ts.pdf b/parallelism-ts.pdf
deleted file mode 100644
index afabf17..0000000
Binary files a/parallelism-ts.pdf and /dev/null differ
diff --git a/scope.html b/scope.html
new file mode 100644
index 0000000..04db655
--- /dev/null
+++ b/scope.html
@@ -0,0 +1,20 @@
+
+ Scope
+ This Technical Specification describes requirements for implementations of an
+ interface that computer programs written in the C++ programming language may
+ use to invoke algorithms with parallel execution. The algorithms described by
+ this Technical Specification are realizable across a broad class of
+ computer architectures.
+
+ This Technical Specification is non-normative. Some of the functionality
+ described by this Technical Specification may be considered for standardization
+ in a future version of C++, but it is not currently part of any C++ standard.
+ Some of the functionality in this Technical Specification may never be
+ standardized, and other functionality may be standardized in a substantially
+ changed form.
+
+ The goal of this Technical Specification is to build widespread existing
+ practice for parallelism in the C++ standard algorithms library. It gives
+ advice on extensions to those vendors who wish to provide them.
+
+
diff --git a/task_block.html b/task_block.html
new file mode 100644
index 0000000..e006c52
--- /dev/null
+++ b/task_block.html
@@ -0,0 +1,298 @@
+
+ Task Block
+
+
+ Header <experimental/task_block> synopsis
+
+
+namespace std {
+namespace experimental {
+namespace parallel {
+inline namespace v2 {
+ class task_cancelled_exception;
+
+ class task_block;
+
+ template<class F>
+ void define_task_block(F&& f);
+
+ template<class f>
+ void define_task_block_restore_thread(F&& f);
+}
+}
+}
+}
+
+
+
+
+ Class task_cancelled_exception
+
+
+namespace std {
+namespace experimental {
+namespace parallel
+inline namespace v2 {
+
+ class task_cancelled_exception : public exception
+ {
+ public:
+ task_cancelled_exception() noexcept;
+ virtual const char* what() const noexcept;
+ };
+}
+}
+}
+}
+
+
+
+ The class task_cancelled_exception defines the type of objects thrown by
+ task_block::run or task_block::wait if they detect than an
+ exception is pending within the current parallel block. See , below.
+
+
+
+ task_cancelled_exception member function what
+
+
+ virtual const char* what() const noexcept
+
+
+ An implementation-defined NTBS.
+
+
+
+
+
+
+ Class task_block
+
+
+namespace std {
+namespace experimental {
+namespace parallel {
+inline namespace v2 {
+
+ class task_block
+ {
+ private:
+ ~task_block();
+
+ public:
+ task_block(const task_block&) = delete;
+ task_block& operator=(const task_block&) = delete;
+ void operator&() const = delete;
+
+ template<class F>
+ void run(F&& f);
+
+ void wait();
+ };
+}
+}
+}
+}
+
+
+
+ The class task_block defines an interface for forking and joining parallel tasks. The define_task_block and define_task_block_restore_thread function templates create an object of type task_block and pass a reference to that object to a user-provided function object.
+
+
+
+ An object of class task_block cannot be constructed, destroyed, copied, or moved except by the implementation of the task block library. Taking the address of a task_block object via operator& is ill-formed. Obtaining its address by any other means (including addressof) results in a pointer with an unspecified value; dereferencing such a pointer results in undefined behavior.
+
+
+
+ A task_block is active if it was created by the nearest enclosing task block, where “task block” refers to an
+ invocation of define_task_block or define_task_block_restore_thread and “nearest enclosing” means the most
+ recent invocation that has not yet completed. Code designated for execution in another thread by means other
+ than the facilities in this section (e.g., using thread or async) are not enclosed in the task block and a
+ task_block passed to (or captured by) such code is not active within that code. Performing any operation on a
+ task_block that is not active results in undefined behavior.
+
+
+
+ When the argument to task_block::run is called, no task_block is active, not even the task_block on which run was called.
+ (The function object should not, therefore, capture a task_block from the surrounding block.)
+
+
+
+ define_task_block([&](auto& tb) {
+ tb.run([&]{
+ tb.run([] { f(); }); // Error: tb is not active within run
+ define_task_block([&](auto& tb2) { // Define new task block
+ tb2.run(f);
+ ...
+ });
+ });
+ ...
+});
+
+
+
+
+
+ Implementations are encouraged to diagnose the above error at translation time.
+
+
+
+
+ task_block member function template run
+
+
+ template<class F> void run(F&& f);
+
+
+ F shall be MoveConstructible. DECAY_COPY(std::forward<F>(f))() shall be a valid expression.
+
+
+
+ *this shall be the active task_block.
+
+
+
+ Evaluates DECAY_COPY(std::forward<F>(f))(), where DECAY_COPY(std::forward<F>(f))
+ is evaluated synchronously within the current thread. The call to the resulting copy of the function object is
+ permitted to run on an unspecified thread created by the implementation in an unordered fashion relative to
+ the sequence of operations following the call to run(f) (the continuation), or indeterminately sequenced
+ within the same thread as the continuation. The call to run synchronizes with the call to the function
+ object. The completion of the call to the function object synchronizes with the next invocation of wait on
+ the same task_block or completion of the nearest enclosing task block (i.e., the define_task_block or
+ define_task_block_restore_thread that created this task_block).
+
+
+
+ task_cancelled_exception, as described in .
+
+
+
+ The run function may return on a thread other than the one on which it was called; in such cases,
+ completion of the call to run synchronizes with the continuation.
+
+ The return from run is ordered similarly to an ordinary function call in a single thread.
+
+
+
+ The invocation of the user-supplied function object f may be immediate or may be delayed until
+ compute resources are available. run might or might not return before the invocation of f completes.
+
+
+
+
+
+
+
+ task_block member function wait
+
+
+ void wait();
+
+ *this shall be the active task_block.
+
+
+ Blocks until the tasks spawned using this task_block have completed.
+
+
+
+ task_cancelled_exception, as described in .
+
+
+
+ All tasks spawned by the nearest enclosing task block have completed.
+
+
+
+ The wait function may return on a thread other than the one on which it was called; in such cases, completion of the call to wait synchronizes with subsequent operations.
+
+ The return from wait is ordered similarly to an ordinary function call in a single thread.
+
+
+define_task_block([&](auto& tb) {
+ tb.run([&]{ process(a, w, x); }); // Process a[w] through a[x]
+ if (y < x) tb.wait(); // Wait if overlap between [w,x) and [y,z)
+ process(a, y, z); // Process a[y] through a[z]
+});
+
+
+
+
+
+
+
+
+ Function template define_task_block
+
+
+ template<class F>
+void define_task_block(F&& f);
+
+
+ template<class F>
+void define_task_block_restore_thread(F&& f);
+
+
+
+ Given an lvalue tb of type task_block, the expression f(tb) shall be well-formed
+
+
+
+ Constructs a task_block tb and calls f(tb).
+
+
+
+ exception_list, as specified in .
+
+
+
+ All tasks spawned from f have finished execution.
+
+
+
+ The define_task_block function may return on a thread other than the one on which it was called
+ unless there are no task blocks active on entry to define_task_block (see ), in which
+ case the function returns on the original thread. When define_task_block returns on a different thread,
+ it synchronizes with operations following the call. The return from define_task_block is ordered
+ similarly to an ordinary function call in a single thread. The define_task_block_restore_thread
+ function always returns on the same thread as the one on which it was called.
+
+
+
+ It is expected (but not mandated) that f will (directly or indirectly) call tb.run(function-object).
+
+
+
+
+
+ Exception Handling
+
+
+ Every task_block has an associated exception list. When the task block starts, its associated exception list is empty.
+
+
+
+ When an exception is thrown from the user-provided function object passed to define_task_block or
+ define_task_block_restore_thread, it is added to the exception list for that task block. Similarly, when
+ an exception is thrown from the user-provided function object passed into task_block::run, the exception
+ object is added to the exception list associated with the nearest enclosing task block. In both cases, an
+ implementation may discard any pending tasks that have not yet been invoked. Tasks that are already in
+ progress are not interrupted except at a call to task_block::run or task_block::wait as described below.
+
+
+
+ If the implementation is able to detect that an exception has been thrown by another task within
+ the same nearest enclosing task block, then task_block::run or task_block::wait may throw
+ task_canceled_exception; these instances of task_canceled_exception are not added to the exception
+ list of the corresponding task block.
+
+
+
+ When a task block finishes with a non-empty exception list, the exceptions are aggregated into an exception_list object, which is then thrown from the task block.
+
+
+
+ The order of the exceptions in the exception_list object is unspecified.
+
+
+
+
diff --git a/terms_and_definitions.html b/terms_and_definitions.html
new file mode 100644
index 0000000..22aa5e4
--- /dev/null
+++ b/terms_and_definitions.html
@@ -0,0 +1,54 @@
+
+ Terms and definitions
+
+ For the purposes of this document, the terms and definitions given in the C++ Standard and the following apply.
+
+ A parallel algorithm is a function template described by this Technical Specification declared in namespace std::experimental::parallel::v2 with a formal template parameter named ExecutionPolicy.
+
+
+ Parallel algorithms access objects indirectly accessible via their arguments by invoking the following functions:
+
+
+ -
+ All operations of the categories of the iterators that the algorithm is instantiated with.
+
+
+ -
+ Functions on those sequence elements that are required by its specification.
+
+
+ -
+ User-provided function objects to be applied during the execution of the algorithm, if required by the specification.
+
+
+ -
+ Operations on those function objects required by the specification.
+
+
+ See clause 25.1 of C++ Standard Algorithms Library.
+
+
+
+
+ These functions are herein called element access functions.
+
+
+ The sort function may invoke the following element access functions:
+
+
+ -
+ Methods of the random-access iterator of the actual template argument, as per 24.2.7, as implied by the name of the
+ template parameters
RandomAccessIterator.
+
+
+ -
+ The
swap function on the elements of the sequence (as per 25.4.1.1 [sort]/2).
+
+
+ -
+ The user-provided
Compare function object.
+
+
+
+
+