Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed May 25, 2024
1 parent 649a200 commit cc1112e
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 28 deletions.
14 changes: 7 additions & 7 deletions docs/how-to/debugging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ can use ltrace to visualize the runtime behavior of the entire ROCm software sta

Here's a simple command-line example that uses ltrace to trace HIP APIs and output:

.. code:: console
.. code-block:: console
$ ltrace -C -e "hip*" ./hipGetChanDesc
hipGetChanDesc->hipCreateChannelDesc(0x7ffdc4b66860, 32, 0, 0) = 0x7ffdc4b66860
Expand All @@ -36,7 +36,7 @@ Here's a simple command-line example that uses ltrace to trace HIP APIs and outp
Here's another example that uses ltrace to trace hsa APIs and output:

.. code:: console
.. code-block:: console
$ ltrace -C -e "hsa*" ./hipGetChanDesc
libamdhip64.so.4->hsa_init(0, 0x7fff325a69d0, 0x9c80e0, 0 <unfinished ...>
Expand Down Expand Up @@ -99,7 +99,7 @@ For details, see (https://github.com/ROCm/ROCgdb).

Below is a sample how to use ROCgdb run and debug HIP application, rocgdb is installed with ROCM package in the folder /opt/rocm/bin.

.. code:: console
.. code-block:: console
$ export PATH=$PATH:/opt/rocm/bin
$ rocgdb ./hipTexObjPitch
Expand Down Expand Up @@ -132,7 +132,7 @@ Debugging HIP applications
The following Linux example shows how to get useful information from the debugger while running a
simple memory copy test, which caused a segmentation fault issue.

.. code:: console
.. code-block:: console
test: simpleTest2<?> numElements=4194304 sizeElements=4194304 bytes
Segmentation fault (core dumped)
Expand Down Expand Up @@ -231,13 +231,13 @@ For systems with multiple devices, you can choose to make only certain device(s)
``HIP_VISIBLE_DEVICES`` (or ``CUDA_VISIBLE_DEVICES`` on an NVIDIA platform). Once enabled, HIP can
only view devices that have indices present in the sequence. For example:

.. code:: console
.. code-block:: console
$ HIP_VISIBLE_DEVICES=0,1
Or in the application:

.. code:: cpp
.. code-block:: cpp
if (totalDeviceNum > 2) {
setenv("HIP_VISIBLE_DEVICES", "0,1,2", 1);
Expand Down Expand Up @@ -371,7 +371,7 @@ General debugging tips
* ``gdb --args`` can be used to pass the executable and arguments to ``gdb``.
* You can set environment variables (``set env``) from within GDB on Linux:

.. code:: bash
.. code-block:: bash
(gdb) set env AMD_SERIALIZE_KERNEL 3
Expand Down
40 changes: 20 additions & 20 deletions docs/reference/kernel_language.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ When using ``hipLaunchKernelGGL``, your first five parameters must be:

You can include your kernel arguments after these parameters.

.. code:: cpp
.. code-block:: cpp
// Example hipLaunchKernelGGL pseudocode:
__global__ MyKernel(hipLaunchParm lp, float *A, float *B, float *C, size_t N)
Expand All @@ -128,7 +128,7 @@ parameters.
Kernel launch example
==========================================================

.. code:: cpp
.. code-block:: cpp
// Example showing device function, __device__ __host__
// <- compile for both device and host
Expand Down Expand Up @@ -279,7 +279,7 @@ dimensions.
The dim3 constructor accepts between zero and three arguments. By default, it initializes unspecified
dimensions to 1.

.. code:: cpp
.. code-block:: cpp
typedef struct dim3 {
uint32_t x;
Expand Down Expand Up @@ -1450,7 +1450,7 @@ To read a high-resolution timer from the device, HIP provides the following buil

* Returning the incremental counter value for every clock cycle on a device:

.. code:: cpp
.. code-block:: cpp
clock_t clock()
long long int clock64()
Expand All @@ -1459,14 +1459,14 @@ To read a high-resolution timer from the device, HIP provides the following buil

* Returning the wall clock count at a constant frequency on the device:

.. code:: cpp
.. code-block:: cpp
long long int wall_clock64()
This can be queried using the HIP API with the ``hipDeviceAttributeWallClockRate`` attribute of the
device in HIP application code. For example:

.. code:: cpp
.. code-block:: cpp
int wallClkRate = 0; //in kilohertz
HIPCHECK(hipDeviceGetAttribute(&wallClkRate, hipDeviceAttributeWallClockRate, deviceId));
Expand Down Expand Up @@ -1809,7 +1809,7 @@ portable code to query the warp size.

To get the default warp size of a GPU device, use ``hipGetDeviceProperties`` in you host functions.

.. code:: cpp
.. code-block:: cpp
cudaDeviceProp props;
cudaGetDeviceProperties(&props, deviceID);
Expand All @@ -1835,7 +1835,7 @@ the correct type for the mask.
Warp vote and ballot functions
-------------------------------------------------------------------------------------------------------------

.. code:: cpp
.. code-block:: cpp
int __all(int predicate)
int __any(int predicate)
Expand Down Expand Up @@ -1883,7 +1883,7 @@ undefined.
Warp match functions
-------------------------------------------------------------------------------------------------------------

.. code:: cpp
.. code-block:: cpp
unsigned long long __match_any(T value)
unsigned long long __match_all(T value, int *pred)
Expand Down Expand Up @@ -1915,7 +1915,7 @@ Warp shuffle functions

The default width is ``warpSize`` (see :ref:`warp-cross-lane`). Half-float shuffles are not supported.

.. code:: cpp
.. code-block:: cpp
int __shfl (T var, int srcLane, int width=warpSize);
int __shfl_up (T var, unsigned int delta, int width=warpSize);
Expand Down Expand Up @@ -2103,7 +2103,7 @@ Assert

The assert function is supported in HIP.
Assert function is used for debugging purpose, when the input expression equals to zero, the execution will be stopped.
.. code:: cpp
.. code-block:: cpp
void assert(int input)
Expand All @@ -2127,7 +2127,7 @@ Printf
Printf function is supported in HIP.
The following is a simple example to print information in the kernel.

.. code:: cpp
.. code-block:: cpp
#include <hip/hip_runtime.h>
Expand All @@ -2151,7 +2151,7 @@ GPU multiprocessors have a fixed pool of resources (primarily registers and shar

__launch_bounds__ allows the application to provide usage hints that influence the resources (primarily registers) used by the generated code. It is a function attribute that must be attached to a __global__ function:

.. code:: cpp
.. code-block:: cpp
__global__ void __launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_WARPS_PER_EXECUTION_UNIT)
MyKernel(hipGridLaunch lp, ...)
Expand Down Expand Up @@ -2189,13 +2189,13 @@ Porting from CUDA `__launch_bounds`

CUDA defines a __launch_bounds which is also designed to control occupancy:

.. code:: cpp
.. code-block:: cpp
__launch_bounds(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_MULTIPROCESSOR)
- The second parameter __launch_bounds parameters must be converted to the format used __hip_launch_bounds, which uses warps and execution-units rather than blocks and multi-processors (this conversion is performed automatically by HIPIFY tools).

.. code:: cpp
.. code-block:: cpp
MIN_WARPS_PER_EXECUTION_UNIT = (MIN_BLOCKS_PER_MULTIPROCESSOR * MAX_THREADS_PER_BLOCK) / 32
Expand Down Expand Up @@ -2252,17 +2252,17 @@ Pragma Unroll

Unroll with a bounds that is known at compile-time is supported. For example:

.. code:: cpp
.. code-block:: cpp
#pragma unroll 16 /* hint to compiler to unroll next loop by 16 */
for (int i=0; i<16; i++) ...
.. code:: cpp
.. code-block:: cpp
#pragma unroll 1 /* tell compiler to never unroll the loop */
for (int i=0; i<16; i++) ...
.. code:: cpp
.. code-block:: cpp
#pragma unroll /* hint to compiler to completely unroll next loop. */
for (int i=0; i<16; i++) ...
Expand All @@ -2272,7 +2272,7 @@ In-Line Assembly

GCN ISA In-line assembly, is supported. For example:

.. code:: cpp
.. code-block:: cpp
asm volatile ("v_mac_f32_e32 %0, %2, %3" : "=v" (out[i]) : "0"(out[i]), "v" (a), "v" (in[i]));
Expand All @@ -2295,7 +2295,7 @@ Kernel Compilation
hipcc now supports compiling C++/HIP kernels to binary code objects.
The file format for binary is `.co` which means Code Object. The following command builds the code object using `hipcc`.

.. code:: bash
.. code-block:: bash
hipcc --genco --offload-arch=[TARGET GPU] [INPUT FILE] -o [OUTPUT FILE]
Expand Down
2 changes: 1 addition & 1 deletion docs/understand/programming_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ identical instructions over the available SIMD engines.

Consider the following kernel:

.. code:: cpp
.. code-block:: cpp
__global__ void k(float4* a, const float4* b)
{
Expand Down

0 comments on commit cc1112e

Please sign in to comment.