Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic support for Custom Kernels. #28

Closed
Tracked by #25
bauom opened this issue Jan 15, 2024 · 4 comments · Fixed by #42
Closed
Tracked by #25

Basic support for Custom Kernels. #28

bauom opened this issue Jan 15, 2024 · 4 comments · Fixed by #42
Assignees

Comments

@bauom
Copy link
Collaborator

bauom commented Jan 15, 2024

This issue aims to add the feature of creating Custom Kernels in the Numba style.
below you can find an example of a kernel definition which can be called in the code in the following format.
increment_by_one[BN, TPB](args):
BN : is the number of blocks to be dispatched on the GPU.
TPB: is the number of threads on each block.
this can be implemented by checking if IndexedElement in the semantic stage is a FunctionCall and replace it in the AST with a KernelCall node.
a KernelCall can be detected if an IndexedElement contains a FunctionCall which is decorated by the kernel decorator.

Numba code:

from numba import cuda
@cuda.jit
def increment_by_one(an_array):
    # Thread id in a 1D block
    tx = cuda.threadIdx.x
    # Block id in a 1D grid
    ty = cuda.blockIdx.x
    # Block width, i.e. number of threads per block
    bw = cuda.blockDim.x
    # Compute flattened index inside the array
    pos = tx + ty * bw
    if pos < an_array.size:  # Check array boundaries
        an_array[pos] += 1

Pyccel code:

from pyccel.internals import cuda
from pyccel.decorators import kernel
@kernel
def increment_by_one(an_array):
    # Thread id in a 1D block
    tx = cuda.threadIdx(0)
    # Block id in a 1D grid
    ty = cuda.blockIdx(0)
    # Block width, i.e. number of threads per block
    bw = cuda.blockDim(0)
    # Compute flattened index inside the array
    pos = tx + ty * bw
    if pos < an_array.size:  # Check array boundaries
        an_array[pos] += 1
@bauom bauom mentioned this issue Jan 15, 2024
22 tasks
@EmilyBourne
Copy link
Member

from pyccel.decorators import kernel
@kernel
def increment_by_one(an_array):
    # Thread id in a 1D block
    tx = cuda.threadIdx(0)
    # Block id in a 1D grid
    ty = cuda.blockIdx(0)
    # Block width, i.e. number of threads per block
    bw = cuda.blockDim(0)
    # Compute flattened index inside the array
    pos = tx + ty * bw
    if pos < an_array.size:  # Check array boundaries
        an_array[pos] += 1

Your code looks a little problematic to me. cuda is not defined anywhere. Did you want to treat it as a built-in?

I would have expected code such as:

from numba import cuda
from pyccel.decorators import kernel

@kernel
def increment_by_one(an_array):
    # Thread id in a 1D block
    tx = cuda.threadIdx.x
    # Block id in a 1D grid
    ty = cuda.blockIdx.x
    # Block width, i.e. number of threads per block
    bw = cuda.blockDim.x
    # Compute flattened index inside the array
    pos = tx + ty * bw
    if pos < an_array.size:  # Check array boundaries
        an_array[pos] += 1

Would the latter run in pure Python at all ?

@bauom
Copy link
Collaborator Author

bauom commented Jan 22, 2024

from pyccel import cuda
from pyccel.decorators import kernel
@kernel
def increment_by_one(an_array):
    # Thread id in a 1D block
    tx = cuda.threadIdx(0)
    # Block id in a 1D grid
    ty = cuda.blockIdx(0)
    # Block width, i.e. number of threads per block
    bw = cuda.blockDim(0)
    # Compute flattened index inside the array
    pos = tx + ty * bw
    if pos < an_array.size:  # Check array boundaries
        an_array[pos] += 1

sorry I missed a from pyccel import cuda
as the aim in this first step is just to have a cuda Pyccel internal library that is not specific to a library that we can later use it with all the libraries.

@EmilyBourne
Copy link
Member

as the aim in this first step is just to have a cuda Pyccel internal library that is not specific to a library

In that case shouldn't it be from pyccel.internals import cuda to match what is done for the other internal libraries?

@bauom
Copy link
Collaborator Author

bauom commented Jan 22, 2024

as the aim in this first step is just to have a cuda Pyccel internal library that is not specific to a library

In that case shouldn't it be from pyccel.internals import cuda to match what is done for the other internal libraries?

yeah that would be better 👍 will change it.

@smazouz42 smazouz42 assigned smazouz42 and unassigned smazouz42 Feb 7, 2024
@bauom bauom added the blocked label Mar 7, 2024
@smazouz42 smazouz42 linked a pull request May 19, 2024 that will close this issue
EmilyBourne added a commit that referenced this issue Jun 27, 2024
This pull request addresses issue #28 by implementing a new feature in
Pyccel that allows users to define custom GPU kernels. The syntax for
creating these kernels is inspired by Numba. and I also need to fix
issue #45 for testing purposes

**Commit Summary**

- Introduced KernelCall class
- Added cuda printer methods _print_KernelCall and _print_FunctionDef to
generate the corresponding CUDA representation for both kernel calls and
definitions
- Added IndexedFunctionCall  represents an indexed function call
- Added CUDA module and cuda.synchronize()
- Fixing a bug that I found in the header: it does not import the
necessary header for the used function

---------

Co-authored-by: EmilyBourne <louise.bourne@gmail.com>
Co-authored-by: bauom <40796259+bauom@users.noreply.github.com>
Co-authored-by: Emily Bourne <emily.bourne@epfl.ch>
EmilyBourne added a commit that referenced this issue Jul 11, 2024
This pull request addresses issue #28 by implementing a new feature in
Pyccel that allows users to define custom GPU kernels. The syntax for
creating these kernels is inspired by Numba. and I also need to fix
issue #45 for testing purposes

**Commit Summary**

- Introduced KernelCall class
- Added cuda printer methods _print_KernelCall and _print_FunctionDef to
generate the corresponding CUDA representation for both kernel calls and
definitions
- Added IndexedFunctionCall  represents an indexed function call
- Added CUDA module and cuda.synchronize()
- Fixing a bug that I found in the header: it does not import the
necessary header for the used function

---------

Co-authored-by: EmilyBourne <louise.bourne@gmail.com>
Co-authored-by: bauom <40796259+bauom@users.noreply.github.com>
Co-authored-by: Emily Bourne <emily.bourne@epfl.ch>
EmilyBourne added a commit that referenced this issue Jul 15, 2024
This pull request addresses issue #28 by implementing a new feature in
Pyccel that allows users to define custom GPU kernels. The syntax for
creating these kernels is inspired by Numba. and I also need to fix
issue #45 for testing purposes

**Commit Summary**

- Introduced KernelCall class
- Added cuda printer methods _print_KernelCall and _print_FunctionDef to
generate the corresponding CUDA representation for both kernel calls and
definitions
- Added IndexedFunctionCall  represents an indexed function call
- Added CUDA module and cuda.synchronize()
- Fixing a bug that I found in the header: it does not import the
necessary header for the used function

---------

Co-authored-by: EmilyBourne <louise.bourne@gmail.com>
Co-authored-by: bauom <40796259+bauom@users.noreply.github.com>
Co-authored-by: Emily Bourne <emily.bourne@epfl.ch>
EmilyBourne added a commit that referenced this issue Jul 17, 2024
This pull request addresses issue #28 by implementing a new feature in
Pyccel that allows users to define custom GPU kernels. The syntax for
creating these kernels is inspired by Numba. and I also need to fix
issue #45 for testing purposes

**Commit Summary**

- Introduced KernelCall class
- Added cuda printer methods _print_KernelCall and _print_FunctionDef to
generate the corresponding CUDA representation for both kernel calls and
definitions
- Added IndexedFunctionCall  represents an indexed function call
- Added CUDA module and cuda.synchronize()
- Fixing a bug that I found in the header: it does not import the
necessary header for the used function

---------

Co-authored-by: EmilyBourne <louise.bourne@gmail.com>
Co-authored-by: bauom <40796259+bauom@users.noreply.github.com>
Co-authored-by: Emily Bourne <emily.bourne@epfl.ch>
EmilyBourne added a commit that referenced this issue Jul 19, 2024
This pull request addresses issue #28 by implementing a new feature in
Pyccel that allows users to define custom GPU kernels. The syntax for
creating these kernels is inspired by Numba. and I also need to fix
issue #45 for testing purposes

**Commit Summary**

- Introduced KernelCall class
- Added cuda printer methods _print_KernelCall and _print_FunctionDef to
generate the corresponding CUDA representation for both kernel calls and
definitions
- Added IndexedFunctionCall  represents an indexed function call
- Added CUDA module and cuda.synchronize()
- Fixing a bug that I found in the header: it does not import the
necessary header for the used function

---------

Co-authored-by: EmilyBourne <louise.bourne@gmail.com>
Co-authored-by: bauom <40796259+bauom@users.noreply.github.com>
Co-authored-by: Emily Bourne <emily.bourne@epfl.ch>
EmilyBourne added a commit that referenced this issue Jul 26, 2024
This pull request addresses issue #28 by implementing a new feature in
Pyccel that allows users to define custom GPU kernels. The syntax for
creating these kernels is inspired by Numba. and I also need to fix
issue #45 for testing purposes

**Commit Summary**

- Introduced KernelCall class
- Added cuda printer methods _print_KernelCall and _print_FunctionDef to
generate the corresponding CUDA representation for both kernel calls and
definitions
- Added IndexedFunctionCall  represents an indexed function call
- Added CUDA module and cuda.synchronize()
- Fixing a bug that I found in the header: it does not import the
necessary header for the used function

---------

Co-authored-by: EmilyBourne <louise.bourne@gmail.com>
Co-authored-by: bauom <40796259+bauom@users.noreply.github.com>
Co-authored-by: Emily Bourne <emily.bourne@epfl.ch>
EmilyBourne added a commit that referenced this issue Aug 29, 2024
This pull request addresses issue #28 by implementing a new feature in
Pyccel that allows users to define custom GPU kernels. The syntax for
creating these kernels is inspired by Numba. and I also need to fix
issue #45 for testing purposes

**Commit Summary**

- Introduced KernelCall class
- Added cuda printer methods _print_KernelCall and _print_FunctionDef to
generate the corresponding CUDA representation for both kernel calls and
definitions
- Added IndexedFunctionCall  represents an indexed function call
- Added CUDA module and cuda.synchronize()
- Fixing a bug that I found in the header: it does not import the
necessary header for the used function

---------

Co-authored-by: EmilyBourne <louise.bourne@gmail.com>
Co-authored-by: bauom <40796259+bauom@users.noreply.github.com>
Co-authored-by: Emily Bourne <emily.bourne@epfl.ch>
EmilyBourne added a commit that referenced this issue Sep 3, 2024
This pull request addresses issue #28 by implementing a new feature in
Pyccel that allows users to define custom GPU kernels. The syntax for
creating these kernels is inspired by Numba. and I also need to fix
issue #45 for testing purposes

**Commit Summary**

- Introduced KernelCall class
- Added cuda printer methods _print_KernelCall and _print_FunctionDef to
generate the corresponding CUDA representation for both kernel calls and
definitions
- Added IndexedFunctionCall  represents an indexed function call
- Added CUDA module and cuda.synchronize()
- Fixing a bug that I found in the header: it does not import the
necessary header for the used function

---------

Co-authored-by: EmilyBourne <louise.bourne@gmail.com>
Co-authored-by: bauom <40796259+bauom@users.noreply.github.com>
Co-authored-by: Emily Bourne <emily.bourne@epfl.ch>
EmilyBourne added a commit that referenced this issue Sep 25, 2024
This pull request addresses issue #28 by implementing a new feature in
Pyccel that allows users to define custom GPU kernels. The syntax for
creating these kernels is inspired by Numba. and I also need to fix
issue #45 for testing purposes

**Commit Summary**

- Introduced KernelCall class
- Added cuda printer methods _print_KernelCall and _print_FunctionDef to
generate the corresponding CUDA representation for both kernel calls and
definitions
- Added IndexedFunctionCall  represents an indexed function call
- Added CUDA module and cuda.synchronize()
- Fixing a bug that I found in the header: it does not import the
necessary header for the used function

---------

Co-authored-by: EmilyBourne <louise.bourne@gmail.com>
Co-authored-by: bauom <40796259+bauom@users.noreply.github.com>
Co-authored-by: Emily Bourne <emily.bourne@epfl.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants