models: new MPT model file without duplicated token_embd.weight #2006

cebtenzzre · 2024-02-22T22:20:33Z

Building on ggerganov/llama.cpp#4978 and ggerganov/llama.cpp#5650, I was finally able to implement a version of ggerganov/llama.cpp#3626 that upstream was satisfied by in ggerganov/llama.cpp#5670.

Now MPT Chat has gone from 3.64 GiB to 3.54 GiB on disk, without breaking upstream compatibility in either direction.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre · 2024-02-26T18:16:01Z

Due to the model3.json change, I'll hold off on merging this until we're ready to make a new release.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

models: new MPT model file without duplicated token_embd.weight

fbef3a7

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre requested a review from manyoso February 22, 2024 22:20

manyoso approved these changes Feb 26, 2024

View reviewed changes

cebtenzzre added 2 commits March 8, 2024 17:11

models3.json: use removedIn to keep old MPT model

81703da

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

models3.json: bump MPT Chat requires since this PR missed v2.7.2

0b938ee

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre merged commit 5c248db into main Mar 8, 2024
6 of 17 checks passed

cebtenzzre deleted the mpt-tied-output branch March 8, 2024 22:18

dlippold mentioned this pull request May 9, 2024

[Feature] Crash: Support old MPT GGUF conversions with duplicated output tensor #2329

Open

Provide feedback