7
7
Parallel Execution
8
8
==================
9
9
10
- As of |zfp | |omprelease |, parallel compression (but not decompression) is
11
- supported on multicore processors via `OpenMP <http://www.openmp.org >`_
12
- threads.
10
+ As of |zfp | |omprelease |, parallel compression is supported on multicore
11
+ processors via `OpenMP <http://www.openmp.org >`_ threads.
13
12
|zfp | |cudarelease | adds `CUDA <https://developer.nvidia.com/about-cuda >`_
14
13
support for fixed-rate compression and decompression on the GPU.
14
+ |zfp | |hiprelease | further adds support for
15
+ `HIP <https://rocm.docs.amd.com/projects/HIP/en/latest/ >`_
16
+ and for fixed- and variable-rate parallel compression and decompression
17
+ for all three back-ends (OpenMP, CUDA, and HIP).
15
18
16
19
Since |zfp | partitions arrays into small independent blocks, a
17
20
large amount of data parallelism is inherent in the compression scheme that
@@ -40,10 +43,10 @@ Execution Policies
40
43
41
44
|zfp | supports multiple *execution policies *, which dictate how (e.g.,
42
45
sequentially, in parallel) and where (e.g., on the CPU or GPU) arrays are
43
- compressed. Currently three execution policies are available:
44
- ``serial ``, ``omp ``, and ``cuda ``. The default mode is
46
+ compressed. Currently four execution policies are available:
47
+ ``serial ``, ``omp ``, `` cuda ``, and ``hip ``. The default mode is
45
48
``serial ``, which ensures sequential compression on a single thread.
46
- The ``omp `` and ``cuda `` execution policies allow for data-parallel
49
+ The ``omp ``, `` cuda ``, and ``hip `` execution policies allow for data-parallel
47
50
compression on multiple threads.
48
51
49
52
The execution policy is set by :c:func: `zfp_stream_set_execution ` and
@@ -62,7 +65,7 @@ Execution Parameters
62
65
63
66
Each execution policy allows tailoring the execution via its associated
64
67
*execution parameters *. Examples include number of threads, chunk size,
65
- scheduling, etc. The ``serial `` and ``cuda `` policies have no
68
+ scheduling, etc. The ``serial ``, `` cuda ``, and ``hip `` policies have no
66
69
parameters. The subsections below discuss the ``omp `` parameters.
67
70
68
71
Whenever the execution policy is changed via
@@ -216,6 +219,18 @@ The CUDA implementation has a number of limitations:
216
219
We expect to address these limitations over time.
217
220
218
221
222
+ Using HIP
223
+ ---------
224
+
225
+ Support for HIP is available as of |zfp | |hiprelease |, allowing |zfp | to be
226
+ run in parallel on AMD GPUs. To enable support, |zfp | the
227
+ :c:macro: `ZFP_WITH_HIP ` macro must be set and |zfp | must be built with CMake.
228
+ See :c:macro: `ZFP_WITH_HIP ` for further details.
229
+
230
+ The HIP implementation is based off the CUDA implementation, and therefore
231
+ the same :ref: `limitations <cuda-limitations >` apply.
232
+
233
+
219
234
Setting the Execution Policy
220
235
----------------------------
221
236
@@ -230,9 +245,10 @@ calling :c:func:`zfp_stream_set_execution`
230
245
}
231
246
232
247
before calling :c:func: `zfp_compress `. Replacing :code: `zfp_exec_omp `
233
- with :code: `zfp_exec_cuda ` enables CUDA execution. If OpenMP or CUDA is
234
- disabled or not supported, then the return value of functions setting these
235
- execution policies and parameters will indicate failure. Execution
248
+ with :code: `zfp_exec_cuda ` enables CUDA execution. Similarly,
249
+ :code: `zfp_exec_hip ` enables HIP execution. If the corresponding execution
250
+ policy is disabled or not supported, then the return value of functions
251
+ setting these policies and parameters will indicate failure. Execution
236
252
parameters are optional and may be set using the functions discussed above.
237
253
238
254
The source code for the |zfpcmd | command-line tool includes further examples
@@ -241,39 +257,42 @@ decompression in this tool, see the :option:`-x` command-line option.
241
257
242
258
.. note ::
243
259
As of |zfp | |cudarelease |, the execution policy refers to both
244
- compression and decompression. The OpenMP implementation does not
245
- yet support decompression, and hence :c:func: `zfp_decompress ` will
246
- fail if the execution policy is not reset to :code: `zfp_exec_serial `
247
- before calling the decompressor. Similarly, the CUDA implementation
248
- supports only fixed-rate mode and will fail if other compression modes
249
- are specified.
260
+ compression and decompression.
261
+
262
+ .. note ::
263
+ As of |zfp | |vrdecrelease |, variable-rate compression modes are supported
264
+ for all execution policies, both for compression and decompression.
265
+ However, for parallel decompression, a block index must be provided that
266
+ encodes where in the compressed stream each block resides. See the section
267
+ on :ref: `parallel decompression <parallel-decompression >` for further
268
+ details.
250
269
251
270
The following table summarizes which execution policies are supported
252
271
with which :ref: `compression modes <modes >`:
253
272
254
- +---------------------------------+---------+---------+---------+
255
- | (de)compression mode | serial | OpenMP | CUDA |
256
- +===============+=================+=========+=========+=========+
257
- | | expert | |check | | |check | | |
258
- | +-----------------+---------+---------+---------+
259
- | | fixed rate | |check | | |check | | |check | |
260
- | +-----------------+---------+---------+---------+
261
- | compression | fixed precision | |check | | |check | | |
262
- | +-----------------+---------+---------+---------+
263
- | | fixed accuracy | |check | | |check | | |
264
- | +-----------------+---------+---------+---------+
265
- | | reversible | |check | | |check | | |
266
- +---------------+-----------------+---------+---------+---------+
267
- | | expert | |check | | |check | | |
268
- | +-----------------+---------+---------+---------+
269
- | | fixed rate | |check | | |check | | |check | |
270
- | +-----------------+---------+---------+---------+
271
- | decompression | fixed precision | |check | | |check | | |check | |
272
- | +-----------------+---------+---------+---------+
273
- | | fixed accuracy | |check | | |check | | |check | |
274
- | +-----------------+---------+---------+---------+
275
- | | reversible | |check | | |check | | |
276
- +---------------+-----------------+---------+---------+---------+
273
+ +---------------------------------+---------+---------+---------+---------+
274
+ | (de)compression mode | serial | OpenMP | CUDA | HIP |
275
+ +===============+=================+=========+=========+=========+=========+
276
+ | | expert | |check | | |check | | | check | | | check | |
277
+ | +-----------------+---------+---------+---------+---------+
278
+ | | fixed rate | |check | | |check | | |check | | | check | |
279
+ | +-----------------+---------+---------+---------+---------+
280
+ | compression | fixed precision | |check | | |check | | | check | | | check | |
281
+ | +-----------------+---------+---------+---------+---------+
282
+ | | fixed accuracy | |check | | |check | | | check | | | check | |
283
+ | +-----------------+---------+---------+---------+---------+
284
+ | | reversible | |check | | |check | | | |
285
+ +---------------+-----------------+---------+---------+---------+---------+
286
+ | | expert | |check | | |check | | | check | | | check | |
287
+ | +-----------------+---------+---------+---------+---------+
288
+ | | fixed rate | |check | | |check | | |check | | | check | |
289
+ | +-----------------+---------+---------+---------+---------+
290
+ | decompression | fixed precision | |check | | |check | | |check | | | check | |
291
+ | +-----------------+---------+---------+---------+---------+
292
+ | | fixed accuracy | |check | | |check | | |check | | | check | |
293
+ | +-----------------+---------+---------+---------+---------+
294
+ | | reversible | |check | | |check | | | |
295
+ +---------------+-----------------+---------+---------+---------+---------+
277
296
278
297
:c:func: `zfp_compress ` and :c:func: `zfp_decompress ` both return zero if the
279
298
current execution policy is not supported for the requested compression
@@ -290,6 +309,8 @@ function in turn inspects the execution policy given by the
290
309
for executing compression.
291
310
292
311
312
+ .. _parallel-decompression :
313
+
293
314
Parallel Decompression
294
315
----------------------
295
316
0 commit comments