Merge pull request #355 from janimo/export-vocab-size

Export vocab size and Code Llama usage docs
karpathy · Aug 26, 2023 · e47bacd · e47bacd
2 parents 49daf18 + 604d3c5
commit e47bacd
Show file tree

Hide file tree

Showing 2 changed files with 18 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -95,6 +95,22 @@ Then chat with it by specifying the chat mode using the `-m` flag, e.g.:
 ./run llama2_7b_chat.bin -m chat
 ```
 
+You can also try Meta's Code Llama models even if support for them is incomplete.
+Make sure to build the tokenizer for the plain and instruct variants and pass it when doing inference.
+
+```bash
+python export.py codellama2_7b.bin --meta-llama /path/to/CodeLlama-7b
+python tokenizer.py --tokenizer-model=/path/to/CodeLlama-7b/tokenizer.model
+./run codellama2_7b.bin -z /path/to/CodeLlama-7b/tokenizer.bin
+```
+
+Chat with Code Llama Instruct:
+
+```bash
+python export.py codellama2_7b_instruct.bin --meta-llama /path/to/CodeLlama-7b-Instruct
+python tokenizer.py --tokenizer-model=/path/to/CodeLlama-7b-Instruct/tokenizer.model
+./run codellama2_7b_instruct.bin -m chat -z /path/to/CodeLlama-7b-Instruct/tokenizer.bin
+
 ## hugginface models
 
 We can load any huggingface models that use the Llama 2 architecture. See the script [export.py](export.py) and the `--hf` flag to export the model .bin file.

diff --git a/export.py b/export.py
@@ -323,9 +323,10 @@ def concat_weights(models):
     config.multiple_of = params["multiple_of"]
     config.norm_eps = params["norm_eps"]
 
-    config.vocab_size = 32000
+    config.vocab_size = state_dict['tok_embeddings.weight'].shape[0]
     config.max_seq_len = 2048
 
+
     # create a new Transformer object and set weights
     model = Transformer(config)