@@ -8,7 +8,7 @@ Some Expensive Computation
8
8
Remember that before compiling any you will need to have sourced env.sh
9
9
in your environment::
10
10
11
- Open the cuda/ tutorial directory under your CUDA application . You will see
11
+ Open the `` tutorial `` directory found at the root of this CUDA repository . You will see
12
12
there a typical CUDA project structure. In particular, ``device.gpr `` will build
13
13
the code running on the device, ``host.gpr `` the code on the host. The Makefile
14
14
is responsible to build both project, note that it's using the standard Makefile.build
@@ -17,7 +17,7 @@ structure. If you open cuda/Makefile.build, you'll see both build commands:
17
17
.. code-block :: Makefile
18
18
19
19
gprbuild -Xcuda_host=$(CUDA_HOST) -P device
20
- gprbuild -Xcuda_host=$(CUDA_HOST) -P host -largs $(CURDIR)/lib/*.fatbin.o
20
+ gprbuild -Xcuda_host=$(CUDA_HOST) -P host -largs $(CURDIR)/lib/*.fatbin.o
21
21
22
22
The argument ``-Xcuda_host=$(CUDA_HOST) `` is here to allow to build for cross
23
23
platform such as ARM Linux. In this tutorial, we're only going to build for native
@@ -124,7 +124,7 @@ of the function:
124
124
This kernel will be be called in parallel, once per index in the array to
125
125
compute. Within a kernel, it's possible to index a given call using the
126
126
thread number (``Thread_IDx ``) and the block number (``Block_IDx ``). You
127
- can also retreive the number of thread in a block that have been scheduled
127
+ can also retrieve the number of thread in a block that have been scheduled
128
128
(``Block_Dim ``) and the number of blocks in the grid (``Grid_Dim ``). These
129
129
are 3 dimension values, indexed by x, y and z. In this example, we're only
130
130
going to use the x dimension.
@@ -183,7 +183,7 @@ additional cost of device computation, this allocation will be taken into accoun
183
183
in the total time reported. Indeed, data copy can be a critical limiting factor
184
184
of GPU performance enhancements.
185
185
186
- Indentify the portion of the body marked ``-- INSERT HERE DEVICE CALL ``. Introduce
186
+ Identify the portion of the body marked ``-- INSERT HERE DEVICE CALL ``. Introduce
187
187
here two array allocations and copies for H_A and H_B to D_A and D_B respectively.
188
188
Also allocate one array to D_C to be the size of H_C:
189
189
0 commit comments