Skip to content

Releases: HiPerCoRe/KTT

Version 2.1

25 Jan 11:02
c967ff5
Compare
Choose a tag to compare
  • Introduced KTT Python bindings making it possible to utilize KTT API in Python
  • Added onboarding guide for KTT which describes core KTT features and their usage
  • Added new methods for compute queue management
  • Added new methods for synchronization to main tuner API
  • Added non-templated versions of methods for scalar and user buffer kernel arguments addition
  • Added support for constant memory variables in CUDA
  • Updated CUPTI implementation to utilize newer API functions introduced in CUDA Toolkit 11.3
  • Updated and optimized MCMC searcher
  • Kernel run mode can now be queried through compute interface
  • Fixed linking issue under Windows caused by unexported methods
  • Improved error messages when attempting to add kernel arguments with unsupported data types
  • Added Python version of tutorials and certain examples showcasing the usage of new Python bindings

Version 2.0.1

21 Jun 09:24
a0c252e
Compare
Choose a tag to compare
  • Added more kernel result status categories to distinguish kernel runs which failed due to compiler error or device limits being exceeded
  • Fixed problem with tuner sometimes getting stuck on generating configurations
  • Fixed issue with tuning in Vulkan ending prematurely with an error

Version 2.0

09 Jun 10:02
0229111
Compare
Choose a tag to compare
  • Major release with significant changes to public API as well as internal functionality, code utilizing v1 API has
    to be updated
  • KTT now requires C++17 compiler
  • Tuning manipulator API was replaced with kernel launchers and compute interface which are more straightforward
    and convenient to use
  • Reference class API was replaced with reference function which is easier to use
  • Unified API methods for working with simple and composite kernels, there is now only one set of methods which is
    used for both types of kernels
  • Adopted new algorithm for generating and storing tuning configurations - search of very large configuration spaces
    is now possible
  • Extended and improved searcher API - new functionality includes easy retrieval of neighbouring configurations
  • Tuner now supports two formats for kernel result output - JSON and XML
  • CSV format was deprecated, it is possible to utilize bundled Python script to partially convert XML output to CSV
  • Added full support for loading of kernel results, which can be used in improved simulated tuning method
  • Kernel results now contain metadata such as version of KTT framework, compute API and timestamp
  • Kernel results can now contain additional user data as pairs of keys and values
  • Added support for name mangling and templates in CUDA kernels
  • Added support for multiple kernel thread modifiers in the same dimension
  • Added methods for removing kernels and kernel arguments from tuner
  • Added new exception type for exceptions thrown by KTT framework
  • Improved argument handling functionality, introduced option to manage all buffers manually without any framework
    interference
  • Improved logging messages, added more debug level logging
  • Significantly improved performance of result validation when only a part of argument is validated
  • When trying to profile unsupported metrics, a warning is now issued instead of an error
  • Added new tutorials and examples, most of the old examples were updated to utilize new tuner API

Version 1.3 with profile-based searcher

11 Nov 15:44
Compare
Choose a tag to compare
  • Added profile-based searcher allowing faster tuning space search when historical tuning data are available

Version 1.3

18 Oct 13:53
Compare
Choose a tag to compare
  • Added public API for configuration searchers
  • Added support for user-provided compute context, queues and buffers
  • Added support for unified memory buffers
  • Added divide ceil thread modifier
  • CUDA kernel GPU architecture version is now set based on utilized device
  • Fixed incorrect handling of zero-copy kernel arguments in OpenCL backend
  • Fixed incorrectly reported kernel duration with kernel profiling enabled on newer Nvidia GPUs
  • Fixed missing kernel compilation data when kernel profiling is enabled
  • Fixed CSV printing of kernel compilation data for certain kernel compositions
  • Added new examples for user-provided structures

Version 1.2

23 Feb 13:16
cc2d3c7
Compare
Choose a tag to compare
  • Added support for AMD GPA profiling API, kernel profiling on AMD GPUs is now supported
  • Added support for new CUPTI profiling API, kernel profiling on newer Nvidia GPUs is now supported
  • Profiling API version can now be specified in premake
  • Added support for kernel compilation data retrieval
  • Significantly improved performance of kernel output validation for large buffers
  • Added support for scalar kernel arguments in Vulkan backend
  • Improved stop condition API
  • Fixed bug where retrieving best computation result could return invalid result
  • Duplicit results are no longer printed when kernel profiling is enabled
  • Fixed memory leak in old CUPTI profiling API
  • Fixed incorrect tuner behavior after failing to launch a kernel when kernel profiling is enabled
  • Added more examples that support kernel profiling

Version 1.1

21 Apr 10:06
Compare
Choose a tag to compare
  • Introduced support for kernel profiling on Nvidia GPUs (currently for generations up to and including Volta), kernel profiling allows collection of performance counters which can be utilized by searchers and stop conditions to better predict performance of kernel configurations
  • Introduced experimental Vulkan support, tuning of GLSL compute shaders is supported
  • Added support for tuning parameter packs - sets of tuning parameters which can be tuned independently and thus reduce the total number of tuning configurations
  • Stop conditions can now utilize additional information about specific kernel runs such as values of tuning parameters
  • Added an option to clear kernel tuning data (configurations, results, etc.)
  • Computation results for offline tuning methods can now be retrieved through API
  • Added API method for enabling output validation for specific workloads (offline tuning, online tuning, regular computation)
  • Improvements to MCMC searcher
  • API method for setting time unit now also affects tuner status messages
  • Improved performance of generating configurations when many constraints are utilized
  • Minor performance improvements by utilizing return by reference rather than by value in more getter methods
  • Additions and improvements to examples
  • Removed 32-bit library support

Version 1.0

20 Jul 13:00
d70330d
Compare
Choose a tag to compare
  • First official release
  • Significantly improved logging system - added support for multiple logging levels and enhanced configuration possibilities
  • Added new debug level logging messages
  • Separated tuning parameter and thread modifier definition, a single modifier can now utilize multiple parameters
  • Thread modifiers and local memory modifiers can now be specified with a function, similar to constraints
  • Added buffer resize method to tuning manipulator API
  • Added new examples, updated old examples to utilize recently introduced KTT features

Version 0.7 RC2

19 May 09:57
065e1cc
Compare
Choose a tag to compare
Version 0.7 RC2 Pre-release
Pre-release
  • Introduced stop condition API for offline tuning
  • Added support for persistent kernel arguments
  • Added global kernel cache, its capacity can be controlled through API
  • Significant improvements to online tuning capabilities and performance
  • Improvements to asynchronous functionality in tuning manipulator
  • Online tuning and kernel running methods now return information about computation status and duration
  • Fixed bug in device synchronization method in tuning manipulator
  • Fixed memory leak in CUDA backend
  • Fixed incorrect handling of invalid kernel results in some situations
  • Added new examples
  • Improvements to sort and reduction examples

Version 0.6 RC1

19 Feb 14:47
Compare
Choose a tag to compare
Version 0.6 RC1 Pre-release
Pre-release
  • Added support for multiple compute queues and asynchronous operations
  • Added support for online autotuning - kernel tuning combined with regular kernel running
  • Added support for kernel arguments with user-defined data types
  • Users now have greater control over kernel argument handling, tuner run modes were deprecated as a result
  • Validated kernel arguments can now have user-defined comparator
  • Added MCMC searcher
  • Added local memory argument modifiers which work similarly to kernel thread size modifiers
  • Added new buffer handling methods to tuning manipulator API
  • Added support for floating-point kernel parameters
  • Added method for retrieving kernel source code for specified kernel configuration
  • Implemented caching of compiled kernels when using tuning manipulator
  • Fixed several bugs in kernel composition methods
  • Fixed several rare bugs which could occur while using tuning manipulator
  • Added tutorials and several new examples
  • Fixed paths to kernel files in examples on Linux
  • Significantly improved documentation and added FAQ
  • Added macro definitions for KTT version