Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support quantization for non-multiples of 32. #328

Closed
mzbac opened this issue Dec 31, 2023 · 10 comments
Closed

Support quantization for non-multiples of 32. #328

mzbac opened this issue Dec 31, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@mzbac
Copy link
Contributor

mzbac commented Dec 31, 2023

Yayi 30b k,v layer has [input_dims=7168, out_dims=112], so it failed to quantize due to error all dimensions should be divisible by 32 for now. FYI, here is the implementation of the yayi2 30b model's k,v layer -> https://huggingface.co/wenge-research/yayi2-30b/blob/main/modeling_yayi.py#L180

@awni awni added the enhancement New feature or request label Dec 31, 2023
@awni
Copy link
Member

awni commented Dec 31, 2023

Yes that's a known limitation of our quantization at the moment. Will mark this as an ehnancment.

In the meantime one thing you could do as a workaround so you aren't blocked by this is to concatenate the K and V projections and do them as a single matmul (112 * 2 / 32 = 7).

@mzbac
Copy link
Contributor Author

mzbac commented Dec 31, 2023

Yes that's a known limitation of our quantization at the moment. Will mark this as an ehnancment.

In the meantime one thing you could do as a workaround so you aren't blocked by this is to concatenate the K and V projections and do them as a single matmul (112 * 2 / 32 = 7).

Hi Awni, thanks for the reply. Would you be able to give a bit more detail about the workaround? By the way, happy new year! :)

@awni
Copy link
Member

awni commented Dec 31, 2023

Yes so there are two projections, one for the keys and one for the values.

k = x @ Wk.T
v = x @ Wv.T

Instead you could do something like:

k, v = mx.split(x @ mx.concatenate([Wk, Wv], axis=0).T, 2)

Now you can pre compute the concatenated matrix and quantize it since its dimensions should be divisible by 32.

@mzbac
Copy link
Contributor Author

mzbac commented Jan 1, 2024

Thanks @awni , this workaround works. I have added it to the yayi2 example. :)

@awni awni changed the title Quantization failed for the yayi2 30b model Support quantization for non-multiples of 32. Jan 1, 2024
@awni
Copy link
Member

awni commented Jan 1, 2024

Nice.

I think (at least for the non-group axis) fixing this is mostly a matter of bounds checking in the matmul.. but @angeloskath can say more about if / when we could support more flexible quantization.

@awni
Copy link
Member

awni commented Jan 10, 2024

Aslo ml-explore/mlx-examples#279

@awni
Copy link
Member

awni commented Jan 19, 2024

@mzbac not sure you saw, in 0.0.10 this should be fixed. We can add yayi2 to mlx-lm now 😃

@mzbac
Copy link
Contributor Author

mzbac commented Jan 19, 2024

@mzbac not sure you saw, in 0.0.10 this should be fixed. We can add yayi2 to mlx-lm now 😃

Nice, I will take a look. Hopefully, it just needs to be mapped to the llama arch and it will work.

@awni
Copy link
Member

awni commented Jan 19, 2024

Oh that would be amazing.

@mzbac
Copy link
Contributor Author

mzbac commented Jan 26, 2024

Please close this now as MLX supports non-32 dims quantization and I have tested it on the yayi2 model, and it works as expected.

@mzbac mzbac closed this as completed Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants