2024-03-20

Agenda

Priority issues and pull requests
Proposal for a Kernel Language WG
Expanding runtime API (new features)

Notes

Proposal for a Kernel Language WG

A dedicated time to discuss topics will make sure this happens
Include new transpiler developers
Initial meeting to map out discussions, meeting frequency
Noel has tentatively volunteered to lead this WG

Potential Topics

Constant memory?
Collectives
Advanced instructions (e.g., fma's, tensor inst.)

Expanding Runtime API

Q: Should backend-specific functionality be supported? How?
Copy-to-symbol? (global/constant module scope variables)
- Is this supported by most backend programming models?
Query kernel properties
Fill/memset (potentially outside of core scope, OCCA BLAS?)
Aligned alloc
- Q: What is the default alignment for each backend?
Rectangular (2D/3D) memory allocation, copy
- Potentially limited use cases?
Stream-ordered allocation
- Convenience for programmer vs. performance improvement?
- Potential use case: non-blocking streams
Suggested/auto @inner size (e.g., occupancy-based)
- Would potentially lose benefit from launch-bounds
- Maybe limited use cases--target is simple kernels where block size is less critical
Modules/kernel-bundles
Host-side callbacks
- E.g., Coordinate with MPI calls?
Execution graphs
- Need to gather use cases
Dynamic @shared sizes
- E.g., to overcome limitation on size of static size shared memory
- Potential pitfall for performance
Cache-configuration (L1-to-SHARED ratio)
- Binary option
- Causes device flush?
- Potential for users to hurt performance if used unwisely
- Highly likely to be backend specific

Next Meeting

2024-04-17