- Priority issues and pull requests
- Proposal for a Kernel Language WG
- Expanding runtime API (new features)
- A dedicated time to discuss topics will make sure this happens
- Include new transpiler developers
- Initial meeting to map out discussions, meeting frequency
- Noel has tentatively volunteered to lead this WG
- Constant memory?
- Collectives
- Advanced instructions (e.g., fma's, tensor inst.)
- Q: Should backend-specific functionality be supported? How?
- Copy-to-symbol? (global/constant module scope variables)
- Is this supported by most backend programming models?
- Query kernel properties
- Fill/memset (potentially outside of core scope, OCCA BLAS?)
- Aligned alloc
- Q: What is the default alignment for each backend?
- Rectangular (2D/3D) memory allocation, copy
- Potentially limited use cases?
- Stream-ordered allocation
- Convenience for programmer vs. performance improvement?
- Potential use case: non-blocking streams
- Suggested/auto
@inner
size (e.g., occupancy-based)- Would potentially lose benefit from launch-bounds
- Maybe limited use cases--target is simple kernels where block size is less critical
- Modules/kernel-bundles
- Host-side callbacks
- E.g., Coordinate with MPI calls?
- Execution graphs
- Need to gather use cases
- Dynamic
@shared
sizes- E.g., to overcome limitation on size of static size shared memory
- Potential pitfall for performance
- Cache-configuration (L1-to-SHARED ratio)
- Binary option
- Causes device flush?
- Potential for users to hurt performance if used unwisely
- Highly likely to be backend specific