rwkv.cpp

This is a port of BlinkDL/RWKV-LM to ggerganov/ggml.

Besides the usual FP32, it supports FP16, quantized INT4, INT5 and INT8 inference. This project is focused on CPU, but cuBLAS is also supported.

This project provides a C library rwkv.h and a convinient Python wrapper for it.

RWKV is a novel large language model architecture, with the largest model in the family having 14B parameters. In contrast to Transformer with O(n^2) attention, RWKV requires only state from previous step to calculate logits. This makes RWKV very CPU-friendly on large context lenghts.

Loading LoRA checkpoints in Blealtan's format is supported through merge_lora_into_ggml.py script.

Quality and performance

If you use rwkv.cpp for anything serious, please test all available formats for perplexity and latency on a representative dataset, and decide which trade-off is best for you.

Below table is for reference only. Measurements were made on 4C/8T x86 CPU with AVX2, 4 threads.

Format	Perplexity (169M)	Latency, ms (1.5B)	File size, GB (1.5B)
`Q4_0`	17.507	76	1.53
`Q4_1`	17.187	72	1.68
`Q5_0`	16.194	78	1.60
`Q5_1`	15.851	81	1.68
`Q8_0`	15.652	89	2.13
`FP16`	15.623	117	2.82
`FP32`	15.623	198	5.64

With cuBLAS

Measurements were made on Intel i7 13700K & NVIDIA 3060 Ti 8G. Latency per token shown.

Model	Layers on GPU	Format	24 Threads	8 Threads	4 Threads	2 Threads	1 Threads
`RWKV-4-Pile-169M`	12	`Q4_0`	20.6 ms	8.6 ms	6.9 ms	6.2 ms	7.9 ms
`RWKV-4-Pile-169M`	12	`Q4_1`	21.4 ms	8.6 ms	6.9 ms	6.7 ms	7.8 ms
`RWKV-4-Pile-169M`	12	`Q5_1`	22.2 ms	9.0 ms	6.9 ms	6.7 ms	8.1 ms
`RWKV-4-Raven-7B-v11`	32	`Q4_0`	94.9 ms	54.3 ms	50.2 ms	51.6 ms	59.2 ms
`RWKV-4-Raven-7B-v11`	32	`Q4_1`	94.5 ms	54.3 ms	49.7 ms	51.8 ms	59.2 ms
`RWKV-4-Raven-7B-v11`	32	`Q5_1`	101.6 ms	72.3 ms	67.2 ms	69.3 ms	77.0 ms

Note: since cuBLAS is supported only for ggml_mul_mat(), we still need to use few CPU resources to execute remaining operations.

How to use

1. Clone the repo

Requirements: git.

git clone --recursive https://github.com/saharNooby/rwkv.cpp.git
cd rwkv.cpp

Name		Name	Last commit message	Last commit date
Latest commit History 368 Commits
.github/workflows		.github/workflows
extras		extras
ggml @ f52d2a0		ggml @ f52d2a0
rwkv		rwkv
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CODE_STYLE.md		CODE_STYLE.md
FILE_FORMAT.md		FILE_FORMAT.md
LICENSE		LICENSE
README.md		README.md
rwkv.cpp		rwkv.cpp
rwkv.h		rwkv.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rwkv.cpp

Quality and performance

With cuBLAS

How to use

1. Clone the repo

2. Get the rwkv.cpp library

Option 2.1. Download a pre-compiled library

License

ArEnSc/rwkv.cpp

Folders and files

Latest commit

History

Repository files navigation

rwkv.cpp

Quality and performance

With cuBLAS

How to use

1. Clone the repo

2. Get the rwkv.cpp library

Option 2.1. Download a pre-compiled library