Releases · HiPerCoRe/KTT

25 Jan 11:02

v2.1

c967ff5

Version 2.1 Latest

Latest

Introduced KTT Python bindings making it possible to utilize KTT API in Python
Added onboarding guide for KTT which describes core KTT features and their usage
Added new methods for compute queue management
Added new methods for synchronization to main tuner API
Added non-templated versions of methods for scalar and user buffer kernel arguments addition
Added support for constant memory variables in CUDA
Updated CUPTI implementation to utilize newer API functions introduced in CUDA Toolkit 11.3
Updated and optimized MCMC searcher
Kernel run mode can now be queried through compute interface
Fixed linking issue under Windows caused by unexported methods
Improved error messages when attempting to add kernel arguments with unsupported data types
Added Python version of tutorials and certain examples showcasing the usage of new Python bindings

Assets 2

21 Jun 09:24

v2.0.1

a0c252e

Version 2.0.1

Added more kernel result status categories to distinguish kernel runs which failed due to compiler error or device limits being exceeded
Fixed problem with tuner sometimes getting stuck on generating configurations
Fixed issue with tuning in Vulkan ending prematurely with an error

Assets 2

09 Jun 10:02

v2.0

0229111

Version 2.0

Major release with significant changes to public API as well as internal functionality, code utilizing v1 API has
to be updated
KTT now requires C++17 compiler
Tuning manipulator API was replaced with kernel launchers and compute interface which are more straightforward
and convenient to use
Reference class API was replaced with reference function which is easier to use
Unified API methods for working with simple and composite kernels, there is now only one set of methods which is
used for both types of kernels
Adopted new algorithm for generating and storing tuning configurations - search of very large configuration spaces
is now possible
Extended and improved searcher API - new functionality includes easy retrieval of neighbouring configurations
Tuner now supports two formats for kernel result output - JSON and XML
CSV format was deprecated, it is possible to utilize bundled Python script to partially convert XML output to CSV
Added full support for loading of kernel results, which can be used in improved simulated tuning method
Kernel results now contain metadata such as version of KTT framework, compute API and timestamp
Kernel results can now contain additional user data as pairs of keys and values
Added support for name mangling and templates in CUDA kernels
Added support for multiple kernel thread modifiers in the same dimension
Added methods for removing kernels and kernel arguments from tuner
Added new exception type for exceptions thrown by KTT framework
Improved argument handling functionality, introduced option to manage all buffers manually without any framework
interference
Improved logging messages, added more debug level logging
Significantly improved performance of result validation when only a part of argument is validated
When trying to profile unsupported metrics, a warning is now issued instead of an error
Added new tutorials and examples, most of the old examples were updated to utilize new tuner API

Assets 2

11 Nov 15:44

jiri-filipovic

v1.3-profile-searcher

c188f30

Version 1.3 with profile-based searcher

Added profile-based searcher allowing faster tuning space search when historical tuning data are available

Assets 2

18 Oct 13:53

v1.3-hotfix

7f6f1a1

Version 1.3

Added public API for configuration searchers
Added support for user-provided compute context, queues and buffers
Added support for unified memory buffers
Added divide ceil thread modifier
CUDA kernel GPU architecture version is now set based on utilized device
Fixed incorrect handling of zero-copy kernel arguments in OpenCL backend
Fixed incorrectly reported kernel duration with kernel profiling enabled on newer Nvidia GPUs
Fixed missing kernel compilation data when kernel profiling is enabled
Fixed CSV printing of kernel compilation data for certain kernel compositions
Added new examples for user-provided structures

Assets 2

23 Feb 13:16

v1.2

cc2d3c7

Version 1.2

Added support for AMD GPA profiling API, kernel profiling on AMD GPUs is now supported
Added support for new CUPTI profiling API, kernel profiling on newer Nvidia GPUs is now supported
Profiling API version can now be specified in premake
Added support for kernel compilation data retrieval
Significantly improved performance of kernel output validation for large buffers
Added support for scalar kernel arguments in Vulkan backend
Improved stop condition API
Fixed bug where retrieving best computation result could return invalid result
Duplicit results are no longer printed when kernel profiling is enabled
Fixed memory leak in old CUPTI profiling API
Fixed incorrect tuner behavior after failing to launch a kernel when kernel profiling is enabled
Added more examples that support kernel profiling

Assets 2

21 Apr 10:06

v1.1

6cbfc13

Version 1.1

Introduced support for kernel profiling on Nvidia GPUs (currently for generations up to and including Volta), kernel profiling allows collection of performance counters which can be utilized by searchers and stop conditions to better predict performance of kernel configurations
Introduced experimental Vulkan support, tuning of GLSL compute shaders is supported
Added support for tuning parameter packs - sets of tuning parameters which can be tuned independently and thus reduce the total number of tuning configurations
Stop conditions can now utilize additional information about specific kernel runs such as values of tuning parameters
Added an option to clear kernel tuning data (configurations, results, etc.)
Computation results for offline tuning methods can now be retrieved through API
Added API method for enabling output validation for specific workloads (offline tuning, online tuning, regular computation)
Improvements to MCMC searcher
API method for setting time unit now also affects tuner status messages
Improved performance of generating configurations when many constraints are utilized
Minor performance improvements by utilizing return by reference rather than by value in more getter methods
Additions and improvements to examples
Removed 32-bit library support

Assets 9

20 Jul 13:00

v1.0

d70330d

Version 1.0

First official release
Significantly improved logging system - added support for multiple logging levels and enhanced configuration possibilities
Added new debug level logging messages
Separated tuning parameter and thread modifier definition, a single modifier can now utilize multiple parameters
Thread modifiers and local memory modifiers can now be specified with a function, similar to constraints
Added buffer resize method to tuning manipulator API
Added new examples, updated old examples to utilize recently introduced KTT features

Assets 6

19 May 09:57

v0.7

065e1cc

Version 0.7 RC2 Pre-release

Pre-release

Introduced stop condition API for offline tuning
Added support for persistent kernel arguments
Added global kernel cache, its capacity can be controlled through API
Significant improvements to online tuning capabilities and performance
Improvements to asynchronous functionality in tuning manipulator
Online tuning and kernel running methods now return information about computation status and duration
Fixed bug in device synchronization method in tuning manipulator
Fixed memory leak in CUDA backend
Fixed incorrect handling of invalid kernel results in some situations
Added new examples
Improvements to sort and reduction examples

Assets 6

19 Feb 14:47

v0.6

4f6939c

Version 0.6 RC1 Pre-release

Pre-release

Added support for multiple compute queues and asynchronous operations
Added support for online autotuning - kernel tuning combined with regular kernel running
Added support for kernel arguments with user-defined data types
Users now have greater control over kernel argument handling, tuner run modes were deprecated as a result
Validated kernel arguments can now have user-defined comparator
Added MCMC searcher
Added local memory argument modifiers which work similarly to kernel thread size modifiers
Added new buffer handling methods to tuning manipulator API
Added support for floating-point kernel parameters
Added method for retrieving kernel source code for specified kernel configuration
Implemented caching of compiled kernels when using tuning manipulator
Fixed several bugs in kernel composition methods
Fixed several rare bugs which could occur while using tuning manipulator
Added tutorials and several new examples
Fixed paths to kernel files in examples on Linux
Significantly improved documentation and added FAQ
Added macro definitions for KTT version

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: HiPerCoRe/KTT

Version 2.1

Version 2.0.1

Version 2.0

Version 1.3 with profile-based searcher

Version 1.3

Version 1.2

Version 1.1

Version 1.0

Version 0.7 RC2

Version 0.6 RC1