Basic support for Custom Kernels. #28

bauom · 2024-01-15T10:37:58Z

This issue aims to add the feature of creating Custom Kernels in the Numba style.
below you can find an example of a kernel definition which can be called in the code in the following format.
increment_by_one[BN, TPB](args):
BN : is the number of blocks to be dispatched on the GPU.
TPB: is the number of threads on each block.
this can be implemented by checking if IndexedElement in the semantic stage is a FunctionCall and replace it in the AST with a KernelCall node.
a KernelCall can be detected if an IndexedElement contains a FunctionCall which is decorated by the kernel decorator.

Numba code:

from numba import cuda
@cuda.jit
def increment_by_one(an_array):
    # Thread id in a 1D block
    tx = cuda.threadIdx.x
    # Block id in a 1D grid
    ty = cuda.blockIdx.x
    # Block width, i.e. number of threads per block
    bw = cuda.blockDim.x
    # Compute flattened index inside the array
    pos = tx + ty * bw
    if pos < an_array.size:  # Check array boundaries
        an_array[pos] += 1

Pyccel code:

from pyccel.internals import cuda
from pyccel.decorators import kernel
@kernel
def increment_by_one(an_array):
    # Thread id in a 1D block
    tx = cuda.threadIdx(0)
    # Block id in a 1D grid
    ty = cuda.blockIdx(0)
    # Block width, i.e. number of threads per block
    bw = cuda.blockDim(0)
    # Compute flattened index inside the array
    pos = tx + ty * bw
    if pos < an_array.size:  # Check array boundaries
        an_array[pos] += 1

The text was updated successfully, but these errors were encountered:

EmilyBourne · 2024-01-19T11:40:56Z

from pyccel.decorators import kernel
@kernel
def increment_by_one(an_array):
    # Thread id in a 1D block
    tx = cuda.threadIdx(0)
    # Block id in a 1D grid
    ty = cuda.blockIdx(0)
    # Block width, i.e. number of threads per block
    bw = cuda.blockDim(0)
    # Compute flattened index inside the array
    pos = tx + ty * bw
    if pos < an_array.size:  # Check array boundaries
        an_array[pos] += 1

Your code looks a little problematic to me. cuda is not defined anywhere. Did you want to treat it as a built-in?

I would have expected code such as:

from numba import cuda
from pyccel.decorators import kernel

@kernel
def increment_by_one(an_array):
    # Thread id in a 1D block
    tx = cuda.threadIdx.x
    # Block id in a 1D grid
    ty = cuda.blockIdx.x
    # Block width, i.e. number of threads per block
    bw = cuda.blockDim.x
    # Compute flattened index inside the array
    pos = tx + ty * bw
    if pos < an_array.size:  # Check array boundaries
        an_array[pos] += 1

Would the latter run in pure Python at all ?

bauom · 2024-01-22T10:14:00Z

from pyccel import cuda
from pyccel.decorators import kernel
@kernel
def increment_by_one(an_array):
    # Thread id in a 1D block
    tx = cuda.threadIdx(0)
    # Block id in a 1D grid
    ty = cuda.blockIdx(0)
    # Block width, i.e. number of threads per block
    bw = cuda.blockDim(0)
    # Compute flattened index inside the array
    pos = tx + ty * bw
    if pos < an_array.size:  # Check array boundaries
        an_array[pos] += 1

sorry I missed a from pyccel import cuda
as the aim in this first step is just to have a cuda Pyccel internal library that is not specific to a library that we can later use it with all the libraries.

EmilyBourne · 2024-01-22T10:16:10Z

as the aim in this first step is just to have a cuda Pyccel internal library that is not specific to a library

In that case shouldn't it be from pyccel.internals import cuda to match what is done for the other internal libraries?

bauom · 2024-01-22T10:31:05Z

as the aim in this first step is just to have a cuda Pyccel internal library that is not specific to a library

In that case shouldn't it be from pyccel.internals import cuda to match what is done for the other internal libraries?

yeah that would be better 👍 will change it.

This pull request addresses issue #28 by implementing a new feature in Pyccel that allows users to define custom GPU kernels. The syntax for creating these kernels is inspired by Numba. and I also need to fix issue #45 for testing purposes **Commit Summary** - Introduced KernelCall class - Added cuda printer methods _print_KernelCall and _print_FunctionDef to generate the corresponding CUDA representation for both kernel calls and definitions - Added IndexedFunctionCall represents an indexed function call - Added CUDA module and cuda.synchronize() - Fixing a bug that I found in the header: it does not import the necessary header for the used function --------- Co-authored-by: EmilyBourne <louise.bourne@gmail.com> Co-authored-by: bauom <40796259+bauom@users.noreply.github.com> Co-authored-by: Emily Bourne <emily.bourne@epfl.ch>

bauom mentioned this issue Jan 15, 2024

Pyccel CUDA Project Roadmap #25

Open

22 tasks

smazouz42 assigned smazouz42 and unassigned smazouz42 Feb 7, 2024

smazouz42 mentioned this issue Feb 22, 2024

Add support for kernels #42

Merged

bauom added the blocked label Mar 7, 2024

smazouz42 linked a pull request May 19, 2024 that will close this issue

Add support for kernels #42

Merged

EmilyBourne removed the blocked label Jun 12, 2024

EmilyBourne closed this as completed in #42 Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic support for Custom Kernels. #28

Basic support for Custom Kernels. #28

bauom commented Jan 15, 2024 •

edited

Loading

EmilyBourne commented Jan 19, 2024

bauom commented Jan 22, 2024 •

edited

Loading

EmilyBourne commented Jan 22, 2024

bauom commented Jan 22, 2024

Basic support for Custom Kernels. #28

Basic support for Custom Kernels. #28

Comments

bauom commented Jan 15, 2024 • edited Loading

EmilyBourne commented Jan 19, 2024

bauom commented Jan 22, 2024 • edited Loading

EmilyBourne commented Jan 22, 2024

bauom commented Jan 22, 2024

bauom commented Jan 15, 2024 •

edited

Loading

bauom commented Jan 22, 2024 •

edited

Loading