Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help me... #745

Closed
AndreyRGW opened this issue Apr 3, 2023 · 8 comments
Closed

Help me... #745

AndreyRGW opened this issue Apr 3, 2023 · 8 comments

Comments

@AndreyRGW
Copy link

AndreyRGW commented Apr 3, 2023

Starting the web UI...
Warning: --gptq_bits is deprecated and will be removed. Use --wbits instead.
Warning: --gptq_pre_layer is deprecated and will be removed. Use --prelayer instead.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Loading binary C:\Users\RGWyo\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.dll...
C:\Users\RGWyo\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
The following models are available:

1. alpaca-13b
2. chatgpt4all
3. codegen-6B-multi
4. llama-13b-hf-int4
5. llama-7b-hf
6. llama-7b-hf-int4
7. rugpt3large_based_on_gpt2

Which one do you want to load? 1-7

6

Loading llama-7b-hf-int4...
CUDA extension not installed.
Loading model ...
Traceback (most recent call last):
  File "F:\WBC\text-generation-webui\text-generation-webui\server.py", line 276, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "F:\WBC\text-generation-webui\text-generation-webui\modules\models.py", line 102, in load_model
    model = load_quantized(model_name)
  File "F:\WBC\text-generation-webui\text-generation-webui\modules\GPTQ_loader.py", line 114, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "F:\WBC\text-generation-webui\text-generation-webui\modules\GPTQ_loader.py", line 45, in _load_quant
    model.load_state_dict(torch.load(checkpoint))
  File "C:\Users\RGWyo\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\RGWyo\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1172, in _load
    result = unpickler.load()
  File "C:\Users\RGWyo\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1142, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "C:\Users\RGWyo\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1116, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "C:\Users\RGWyo\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 217, in default_restore_location
    result = fn(storage, location)
  File "C:\Users\RGWyo\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 182, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "C:\Users\RGWyo\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 166, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I just installed webui in clear folder.

Win11

@AndreyRGW
Copy link
Author

Why does webui use my python and not the environment in micromamba?

@AndreyRGW
Copy link
Author

AndreyRGW commented Apr 3, 2023

Tried making my environment in anaconda3, the error is exactly the same, except now the folder with anaconda3 is used, not my python.

@AndreyRGW
Copy link
Author

Loading llama-7b-hf-int4...
Traceback (most recent call last):
  File "F:\WBC\text-generation-webui\text-generation-webui\server.py", line 276, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "F:\WBC\text-generation-webui\text-generation-webui\modules\models.py", line 102, in load_model
    model = load_quantized(model_name)
  File "F:\WBC\text-generation-webui\text-generation-webui\modules\GPTQ_loader.py", line 114, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "F:\WBC\text-generation-webui\text-generation-webui\modules\GPTQ_loader.py", line 36, in _load_quant
    make_quant(model, layers, wbits, groupsize, faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold)
TypeError: make_quant() got an unexpected keyword argument 'faster'

Got a new error

@AndreyRGW
Copy link
Author

Now I have the error with LLaMa:

RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        Missing key(s) in state_dict: "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.mlp.down_proj.qzeros", "model.layers.1.mlp.gate_proj.qzeros", "model.layers.1.mlp.up_proj.qzeros", "model.layers.2.self_attn.k_proj.qzeros", "model.layers.2.self_attn.o_proj.qzeros", "model.layers.2.self_attn.q_proj.qzeros", "model.layers.2.self_attn.v_proj.qzeros", "model.layers.2.mlp.down_proj.qzeros", "model.layers.2.mlp.gate_proj.qzeros", "model.layers.2.mlp.up_proj.qzeros", "model.layers.3.self_attn.k_proj.qzeros", "model.layers.3.self_attn.o_proj.qzeros", "model.layers.3.self_attn.q_proj.qzeros", "model.layers.3.self_attn.v_proj.qzeros", "model.layers.3.mlp.down_proj.qzeros", "model.layers.3.mlp.gate_proj.qzeros", "model.layers.3.mlp.up_proj.qzeros", "model.layers.4.self_attn.k_proj.qzeros", "model.layers.4.self_attn.o_proj.qzeros", "model.layers.4.self_attn.q_proj.qzeros", "model.layers.4.self_attn.v_proj.qzeros", "model.layers.4.mlp.down_proj.qzeros", "model.layers.4.mlp.gate_proj.qzeros", "model.layers.4.mlp.up_proj.qzeros", "model.layers.5.self_attn.k_proj.qzeros", "model.layers.5.self_attn.o_proj.qzeros", "model.layers.5.self_attn.q_proj.qzeros", "model.layers.5.self_attn.v_proj.qzeros", "model.layers.5.mlp.down_proj.qzeros", "model.layers.5.mlp.gate_proj.qzeros", "model.layers.5.mlp.up_proj.qzeros", "model.layers.6.self_attn.k_proj.qzeros", "model.layers.6.self_attn.o_proj.qzeros", "model.layers.6.self_attn.q_proj.qzeros", "model.layers.6.self_attn.v_proj.qzeros", "model.layers.6.mlp.down_proj.qzeros", "model.layers.6.mlp.gate_proj.qzeros", "model.layers.6.mlp.up_proj.qzeros", "model.layers.7.self_attn.k_proj.qzeros", "model.layers.7.self_attn.o_proj.qzeros", "model.layers.7.self_attn.q_proj.qzeros", "model.layers.7.self_attn.v_proj.qzeros", "model.layers.7.mlp.down_proj.qzeros", etc...

@AndreyRGW
Copy link
Author

Downloaded LLaMa from here.

No errors so far

@AndreyRGW
Copy link
Author

Loading llama-7b-4bit...
Loading model ...
Done.
Loaded the model in 13.98 seconds.
Adding the LoRA chatgpt4all to the model...
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\gradio\routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\gradio\blocks.py", line 1108, in process_api
    result = await self.call_function(
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\gradio\blocks.py", line 915, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "F:\WBC\text-generation-webui\text-generation-webui\server.py", line 76, in load_lora_wrapper
    add_lora_to_model(selected_lora)
  File "F:\WBC\text-generation-webui\text-generation-webui\modules\LoRA.py", line 34, in add_lora_to_model
    shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_name}"), **params)
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\peft\peft_model.py", line 143, in from_pretrained
    model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config)
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\peft\peft_model.py", line 514, in __init__
    super().__init__(model, peft_config)
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\peft\peft_model.py", line 79, in __init__
    self.base_model = LoraModel(peft_config, model)
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\peft\tuners\lora.py", line 118, in __init__
    self._find_and_replace()
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\peft\tuners\lora.py", line 179, in _find_and_replace
    self._replace_module(parent, target_name, new_module, target)
UnboundLocalError: local variable 'new_module' referenced before assignment

Got errors with lora

@AndreyRGW
Copy link
Author

AndreyRGW commented Apr 3, 2023

Again, errors with Alpaca-13B-int4:

Big error
Loading alpaca-13b...
Loading model ...
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\gradio\routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\gradio\blocks.py", line 1108, in process_api
    result = await self.call_function(
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\gradio\blocks.py", line 915, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "F:\WBC\text-generation-webui\text-generation-webui\server.py", line 71, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "F:\WBC\text-generation-webui\text-generation-webui\modules\models.py", line 102, in load_model
    model = load_quantized(model_name)
  File "F:\WBC\text-generation-webui\text-generation-webui\modules\GPTQ_loader.py", line 114, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "F:\WBC\text-generation-webui\text-generation-webui\modules\GPTQ_loader.py", line 45, in _load_quant
    model.load_state_dict(torch.load(checkpoint))
  File "C:\ProgramData\Anaconda3\envs\textgen2\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        Missing key(s) in state_dict: "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.mlp.down_proj.qzeros", "model.layers.1.mlp.gate_proj.qzeros", "model.layers.1.mlp.up_proj.qzeros", "model.layers.2.self_attn.k_proj.qzeros", "model.layers.2.self_attn.o_proj.qzeros", "model.layers.2.self_attn.q_proj.qzeros", "model.layers.2.self_attn.v_proj.qzeros", "model.layers.2.mlp.down_proj.qzeros", "model.layers.2.mlp.gate_proj.qzeros", "model.layers.2.mlp.up_proj.qzeros", "model.layers.3.self_attn.k_proj.qzeros", "model.layers.3.self_attn.o_proj.qzeros", "model.layers.3.self_attn.q_proj.qzeros", "model.layers.3.self_attn.v_proj.qzeros", "model.layers.3.mlp.down_proj.qzeros", "model.layers.3.mlp.gate_proj.qzeros", "model.layers.3.mlp.up_proj.qzeros", "model.layers.4.self_attn.k_proj.qzeros", "model.layers.4.self_attn.o_proj.qzeros", "model.layers.4.self_attn.q_proj.qzeros", "model.layers.4.self_attn.v_proj.qzeros", "model.layers.4.mlp.down_proj.qzeros", "model.layers.4.mlp.gate_proj.qzeros", "model.layers.4.mlp.up_proj.qzeros", "model.layers.5.self_attn.k_proj.qzeros", "model.layers.5.self_attn.o_proj.qzeros", "model.layers.5.self_attn.q_proj.qzeros", "model.layers.5.self_attn.v_proj.qzeros", "model.layers.5.mlp.down_proj.qzeros", "model.layers.5.mlp.gate_proj.qzeros", "model.layers.5.mlp.up_proj.qzeros", "model.layers.6.self_attn.k_proj.qzeros", "model.layers.6.self_attn.o_proj.qzeros", "model.layers.6.self_attn.q_proj.qzeros", "model.layers.6.self_attn.v_proj.qzeros", "model.layers.6.mlp.down_proj.qzeros", "model.layers.6.mlp.gate_proj.qzeros", "model.layers.6.mlp.up_proj.qzeros", "model.layers.7.self_attn.k_proj.qzeros", "model.layers.7.self_attn.o_proj.qzeros", "model.layers.7.self_attn.q_proj.qzeros", "model.layers.7.self_attn.v_proj.qzeros", "model.layers.7.mlp.down_proj.qzeros", "model.layers.7.mlp.gate_proj.qzeros", "model.layers.7.mlp.up_proj.qzeros", "model.layers.8.self_attn.k_proj.qzeros", "model.layers.8.self_attn.o_proj.qzeros", "model.layers.8.self_attn.q_proj.qzeros", "model.layers.8.self_attn.v_proj.qzeros", "model.layers.8.mlp.down_proj.qzeros", "model.layers.8.mlp.gate_proj.qzeros", "model.layers.8.mlp.up_proj.qzeros", "model.layers.9.self_attn.k_proj.qzeros", "model.layers.9.self_attn.o_proj.qzeros", "model.layers.9.self_attn.q_proj.qzeros", "model.layers.9.self_attn.v_proj.qzeros", "model.layers.9.mlp.down_proj.qzeros", "model.layers.9.mlp.gate_proj.qzeros", "model.layers.9.mlp.up_proj.qzeros", "model.layers.10.self_attn.k_proj.qzeros", "model.layers.10.self_attn.o_proj.qzeros", "model.layers.10.self_attn.q_proj.qzeros", "model.layers.10.self_attn.v_proj.qzeros", "model.layers.10.mlp.down_proj.qzeros", "model.layers.10.mlp.gate_proj.qzeros", "model.layers.10.mlp.up_proj.qzeros", "model.layers.11.self_attn.k_proj.qzeros", "model.layers.11.self_attn.o_proj.qzeros", "model.layers.11.self_attn.q_proj.qzeros", "model.layers.11.self_attn.v_proj.qzeros", "model.layers.11.mlp.down_proj.qzeros", "model.layers.11.mlp.gate_proj.qzeros", "model.layers.11.mlp.up_proj.qzeros", "model.layers.12.self_attn.k_proj.qzeros", "model.layers.12.self_attn.o_proj.qzeros", "model.layers.12.self_attn.q_proj.qzeros", "model.layers.12.self_attn.v_proj.qzeros", "model.layers.12.mlp.down_proj.qzeros", "model.layers.12.mlp.gate_proj.qzeros", "model.layers.12.mlp.up_proj.qzeros", "model.layers.13.self_attn.k_proj.qzeros", "model.layers.13.self_attn.o_proj.qzeros", "model.layers.13.self_attn.q_proj.qzeros", "model.layers.13.self_attn.v_proj.qzeros", "model.layers.13.mlp.down_proj.qzeros", "model.layers.13.mlp.gate_proj.qzeros", "model.layers.13.mlp.up_proj.qzeros", "model.layers.14.self_attn.k_proj.qzeros", "model.layers.14.self_attn.o_proj.qzeros", "model.layers.14.self_attn.q_proj.qzeros", "model.layers.14.self_attn.v_proj.qzeros", "model.layers.14.mlp.down_proj.qzeros", "model.layers.14.mlp.gate_proj.qzeros", "model.layers.14.mlp.up_proj.qzeros", "model.layers.15.self_attn.k_proj.qzeros", "model.layers.15.self_attn.o_proj.qzeros", "model.layers.15.self_attn.q_proj.qzeros", "model.layers.15.self_attn.v_proj.qzeros", "model.layers.15.mlp.down_proj.qzeros", "model.layers.15.mlp.gate_proj.qzeros", "model.layers.15.mlp.up_proj.qzeros", "model.layers.16.self_attn.k_proj.qzeros", "model.layers.16.self_attn.o_proj.qzeros", "model.layers.16.self_attn.q_proj.qzeros", "model.layers.16.self_attn.v_proj.qzeros", "model.layers.16.mlp.down_proj.qzeros", "model.layers.16.mlp.gate_proj.qzeros", "model.layers.16.mlp.up_proj.qzeros", "model.layers.17.self_attn.k_proj.qzeros", "model.layers.17.self_attn.o_proj.qzeros", "model.layers.17.self_attn.q_proj.qzeros", "model.layers.17.self_attn.v_proj.qzeros", "model.layers.17.mlp.down_proj.qzeros", "model.layers.17.mlp.gate_proj.qzeros", "model.layers.17.mlp.up_proj.qzeros", "model.layers.18.self_attn.k_proj.qzeros", "model.layers.18.self_attn.o_proj.qzeros", "model.layers.18.self_attn.q_proj.qzeros", "model.layers.18.self_attn.v_proj.qzeros", "model.layers.18.mlp.down_proj.qzeros", "model.layers.18.mlp.gate_proj.qzeros", "model.layers.18.mlp.up_proj.qzeros", "model.layers.19.self_attn.k_proj.qzeros", "model.layers.19.self_attn.o_proj.qzeros", "model.layers.19.self_attn.q_proj.qzeros", "model.layers.19.self_attn.v_proj.qzeros", "model.layers.19.mlp.down_proj.qzeros", "model.layers.19.mlp.gate_proj.qzeros", "model.layers.19.mlp.up_proj.qzeros", "model.layers.20.self_attn.k_proj.qzeros", "model.layers.20.self_attn.o_proj.qzeros", "model.layers.20.self_attn.q_proj.qzeros", "model.layers.20.self_attn.v_proj.qzeros", "model.layers.20.mlp.down_proj.qzeros", "model.layers.20.mlp.gate_proj.qzeros", "model.layers.20.mlp.up_proj.qzeros", "model.layers.21.self_attn.k_proj.qzeros", "model.layers.21.self_attn.o_proj.qzeros", "model.layers.21.self_attn.q_proj.qzeros", "model.layers.21.self_attn.v_proj.qzeros", "model.layers.21.mlp.down_proj.qzeros", "model.layers.21.mlp.gate_proj.qzeros", "model.layers.21.mlp.up_proj.qzeros", "model.layers.22.self_attn.k_proj.qzeros", "model.layers.22.self_attn.o_proj.qzeros", "model.layers.22.self_attn.q_proj.qzeros", "model.layers.22.self_attn.v_proj.qzeros", "model.layers.22.mlp.down_proj.qzeros", "model.layers.22.mlp.gate_proj.qzeros", "model.layers.22.mlp.up_proj.qzeros", "model.layers.23.self_attn.k_proj.qzeros", "model.layers.23.self_attn.o_proj.qzeros", "model.layers.23.self_attn.q_proj.qzeros", "model.layers.23.self_attn.v_proj.qzeros", "model.layers.23.mlp.down_proj.qzeros", "model.layers.23.mlp.gate_proj.qzeros", "model.layers.23.mlp.up_proj.qzeros", "model.layers.24.self_attn.k_proj.qzeros", "model.layers.24.self_attn.o_proj.qzeros", "model.layers.24.self_attn.q_proj.qzeros", "model.layers.24.self_attn.v_proj.qzeros", "model.layers.24.mlp.down_proj.qzeros", "model.layers.24.mlp.gate_proj.qzeros", "model.layers.24.mlp.up_proj.qzeros", "model.layers.25.self_attn.k_proj.qzeros", "model.layers.25.self_attn.o_proj.qzeros", "model.layers.25.self_attn.q_proj.qzeros", "model.layers.25.self_attn.v_proj.qzeros", "model.layers.25.mlp.down_proj.qzeros", "model.layers.25.mlp.gate_proj.qzeros", "model.layers.25.mlp.up_proj.qzeros", "model.layers.26.self_attn.k_proj.qzeros", "model.layers.26.self_attn.o_proj.qzeros", "model.layers.26.self_attn.q_proj.qzeros", "model.layers.26.self_attn.v_proj.qzeros", "model.layers.26.mlp.down_proj.qzeros", "model.layers.26.mlp.gate_proj.qzeros", "model.layers.26.mlp.up_proj.qzeros", "model.layers.27.self_attn.k_proj.qzeros", "model.layers.27.self_attn.o_proj.qzeros", "model.layers.27.self_attn.q_proj.qzeros", "model.layers.27.self_attn.v_proj.qzeros", "model.layers.27.mlp.down_proj.qzeros", "model.layers.27.mlp.gate_proj.qzeros", "model.layers.27.mlp.up_proj.qzeros", "model.layers.28.self_attn.k_proj.qzeros", "model.layers.28.self_attn.o_proj.qzeros", "model.layers.28.self_attn.q_proj.qzeros", "model.layers.28.self_attn.v_proj.qzeros", "model.layers.28.mlp.down_proj.qzeros", "model.layers.28.mlp.gate_proj.qzeros", "model.layers.28.mlp.up_proj.qzeros", "model.layers.29.self_attn.k_proj.qzeros", "model.layers.29.self_attn.o_proj.qzeros", "model.layers.29.self_attn.q_proj.qzeros", "model.layers.29.self_attn.v_proj.qzeros", "model.layers.29.mlp.down_proj.qzeros", "model.layers.29.mlp.gate_proj.qzeros", "model.layers.29.mlp.up_proj.qzeros", "model.layers.30.self_attn.k_proj.qzeros", "model.layers.30.self_attn.o_proj.qzeros", "model.layers.30.self_attn.q_proj.qzeros", "model.layers.30.self_attn.v_proj.qzeros", "model.layers.30.mlp.down_proj.qzeros", "model.layers.30.mlp.gate_proj.qzeros", "model.layers.30.mlp.up_proj.qzeros", "model.layers.31.self_attn.k_proj.qzeros", "model.layers.31.self_attn.o_proj.qzeros", "model.layers.31.self_attn.q_proj.qzeros", "model.layers.31.self_attn.v_proj.qzeros", "model.layers.31.mlp.down_proj.qzeros", "model.layers.31.mlp.gate_proj.qzeros", "model.layers.31.mlp.up_proj.qzeros".
        Unexpected key(s) in state_dict: "model.layers.32.self_attn.q_proj.zeros", "model.layers.32.self_attn.q_proj.scales", "model.layers.32.self_attn.q_proj.bias", "model.layers.32.self_attn.q_proj.qweight", "model.layers.32.self_attn.k_proj.zeros", "model.layers.32.self_attn.k_proj.scales", "model.layers.32.self_attn.k_proj.bias", "model.layers.32.self_attn.k_proj.qweight", "model.layers.32.self_attn.v_proj.zeros", "model.layers.32.self_attn.v_proj.scales", "model.layers.32.self_attn.v_proj.bias", "model.layers.32.self_attn.v_proj.qweight", "model.layers.32.self_attn.o_proj.zeros", "model.layers.32.self_attn.o_proj.scales", "model.layers.32.self_attn.o_proj.bias", "model.layers.32.self_attn.o_proj.qweight", "model.layers.32.self_attn.rotary_emb.inv_freq", "model.layers.32.mlp.gate_proj.zeros", "model.layers.32.mlp.gate_proj.scales", "model.layers.32.mlp.gate_proj.bias", "model.layers.32.mlp.gate_proj.qweight", "model.layers.32.mlp.down_proj.zeros", "model.layers.32.mlp.down_proj.scales", "model.layers.32.mlp.down_proj.bias", "model.layers.32.mlp.down_proj.qweight", "model.layers.32.mlp.up_proj.zeros", "model.layers.32.mlp.up_proj.scales", "model.layers.32.mlp.up_proj.bias", "model.layers.32.mlp.up_proj.qweight", "model.layers.32.input_layernorm.weight", "model.layers.32.post_attention_layernorm.weight", "model.layers.33.self_attn.q_proj.zeros", "model.layers.33.self_attn.q_proj.scales", "model.layers.33.self_attn.q_proj.bias", "model.layers.33.self_attn.q_proj.qweight", "model.layers.33.self_attn.k_proj.zeros", "model.layers.33.self_attn.k_proj.scales", "model.layers.33.self_attn.k_proj.bias", "model.layers.33.self_attn.k_proj.qweight", "model.layers.33.self_attn.v_proj.zeros", "model.layers.33.self_attn.v_proj.scales", "model.layers.33.self_attn.v_proj.bias", "model.layers.33.self_attn.v_proj.qweight", "model.layers.33.self_attn.o_proj.zeros", "model.layers.33.self_attn.o_proj.scales", "model.layers.33.self_attn.o_proj.bias", "model.layers.33.self_attn.o_proj.qweight", "model.layers.33.self_attn.rotary_emb.inv_freq", "model.layers.33.mlp.gate_proj.zeros", "model.layers.33.mlp.gate_proj.scales", "model.layers.33.mlp.gate_proj.bias", "model.layers.33.mlp.gate_proj.qweight", "model.layers.33.mlp.down_proj.zeros", "model.layers.33.mlp.down_proj.scales", "model.layers.33.mlp.down_proj.bias", "model.layers.33.mlp.down_proj.qweight", "model.layers.33.mlp.up_proj.zeros", "model.layers.33.mlp.up_proj.scales", "model.layers.33.mlp.up_proj.bias", "model.layers.33.mlp.up_proj.qweight", "model.layers.33.input_layernorm.weight", "model.layers.33.post_attention_layernorm.weight", "model.layers.34.self_attn.q_proj.zeros", "model.layers.34.self_attn.q_proj.scales", "model.layers.34.self_attn.q_proj.bias", "model.layers.34.self_attn.q_proj.qweight", "model.layers.34.self_attn.k_proj.zeros", "model.layers.34.self_attn.k_proj.scales", "model.layers.34.self_attn.k_proj.bias", "model.layers.34.self_attn.k_proj.qweight", "model.layers.34.self_attn.v_proj.zeros", "model.layers.34.self_attn.v_proj.scales", "model.layers.34.self_attn.v_proj.bias", "model.layers.34.self_attn.v_proj.qweight", "model.layers.34.self_attn.o_proj.zeros", "model.layers.34.self_attn.o_proj.scales", "model.layers.34.self_attn.o_proj.bias", "model.layers.34.self_attn.o_proj.qweight", "model.layers.34.self_attn.rotary_emb.inv_freq", "model.layers.34.mlp.gate_proj.zeros", "model.layers.34.mlp.gate_proj.scales", "model.layers.34.mlp.gate_proj.bias", "model.layers.34.mlp.gate_proj.qweight", "model.layers.34.mlp.down_proj.zeros", "model.layers.34.mlp.down_proj.scales", "model.layers.34.mlp.down_proj.bias", "model.layers.34.mlp.down_proj.qweight", "model.layers.34.mlp.up_proj.zeros", "model.layers.34.mlp.up_proj.scales", "model.layers.34.mlp.up_proj.bias", "model.layers.34.mlp.up_proj.qweight", "model.layers.34.input_layernorm.weight", "model.layers.34.post_attention_layernorm.weight", "model.layers.35.self_attn.q_proj.zeros", "model.layers.35.self_attn.q_proj.scales", "model.layers.35.self_attn.q_proj.bias", "model.layers.35.self_attn.q_proj.qweight", "model.layers.35.self_attn.k_proj.zeros", "model.layers.35.self_attn.k_proj.scales", "model.layers.35.self_attn.k_proj.bias", "model.layers.35.self_attn.k_proj.qweight", "model.layers.35.self_attn.v_proj.zeros", "model.layers.35.self_attn.v_proj.scales", "model.layers.35.self_attn.v_proj.bias", "model.layers.35.self_attn.v_proj.qweight", "model.layers.35.self_attn.o_proj.zeros", "model.layers.35.self_attn.o_proj.scales", "model.layers.35.self_attn.o_proj.bias", "model.layers.35.self_attn.o_proj.qweight", "model.layers.35.self_attn.rotary_emb.inv_freq", "model.layers.35.mlp.gate_proj.zeros", "model.layers.35.mlp.gate_proj.scales", "model.layers.35.mlp.gate_proj.bias", "model.layers.35.mlp.gate_proj.qweight", "model.layers.35.mlp.down_proj.zeros", "model.layers.35.mlp.down_proj.scales", "model.layers.35.mlp.down_proj.bias", "model.layers.35.mlp.down_proj.qweight", "model.layers.35.mlp.up_proj.zeros", "model.layers.35.mlp.up_proj.scales", "model.layers.35.mlp.up_proj.bias", "model.layers.35.mlp.up_proj.qweight", "model.layers.35.input_layernorm.weight", "model.layers.35.post_attention_layernorm.weight", "model.layers.36.self_attn.q_proj.zeros", "model.layers.36.self_attn.q_proj.scales", "model.layers.36.self_attn.q_proj.bias", "model.layers.36.self_attn.q_proj.qweight", "model.layers.36.self_attn.k_proj.zeros", "model.layers.36.self_attn.k_proj.scales", "model.layers.36.self_attn.k_proj.bias", "model.layers.36.self_attn.k_proj.qweight", "model.layers.36.self_attn.v_proj.zeros", "model.layers.36.self_attn.v_proj.scales", "model.layers.36.self_attn.v_proj.bias", "model.layers.36.self_attn.v_proj.qweight", "model.layers.36.self_attn.o_proj.zeros", "model.layers.36.self_attn.o_proj.scales", "model.layers.36.self_attn.o_proj.bias", "model.layers.36.self_attn.o_proj.qweight", "model.layers.36.self_attn.rotary_emb.inv_freq", "model.layers.36.mlp.gate_proj.zeros", "model.layers.36.mlp.gate_proj.scales", "model.layers.36.mlp.gate_proj.bias", "model.layers.36.mlp.gate_proj.qweight", "model.layers.36.mlp.down_proj.zeros", "model.layers.36.mlp.down_proj.scales", "model.layers.36.mlp.down_proj.bias", "model.layers.36.mlp.down_proj.qweight", "model.layers.36.mlp.up_proj.zeros", "model.layers.36.mlp.up_proj.scales", "model.layers.36.mlp.up_proj.bias", "model.layers.36.mlp.up_proj.qweight", "model.layers.36.input_layernorm.weight", "model.layers.36.post_attention_layernorm.weight", "model.layers.37.self_attn.q_proj.zeros", "model.layers.37.self_attn.q_proj.scales", "model.layers.37.self_attn.q_proj.bias", "model.layers.37.self_attn.q_proj.qweight", "model.layers.37.self_attn.k_proj.zeros", "model.layers.37.self_attn.k_proj.scales", "model.layers.37.self_attn.k_proj.bias", "model.layers.37.self_attn.k_proj.qweight", "model.layers.37.self_attn.v_proj.zeros", "model.layers.37.self_attn.v_proj.scales", "model.layers.37.self_attn.v_proj.bias", "model.layers.37.self_attn.v_proj.qweight", "model.layers.37.self_attn.o_proj.zeros", "model.layers.37.self_attn.o_proj.scales", "model.layers.37.self_attn.o_proj.bias", "model.layers.37.self_attn.o_proj.qweight", "model.layers.37.self_attn.rotary_emb.inv_freq", "model.layers.37.mlp.gate_proj.zeros", "model.layers.37.mlp.gate_proj.scales", "model.layers.37.mlp.gate_proj.bias", "model.layers.37.mlp.gate_proj.qweight", "model.layers.37.mlp.down_proj.zeros", "model.layers.37.mlp.down_proj.scales", "model.layers.37.mlp.down_proj.bias", "model.layers.37.mlp.down_proj.qweight", "model.layers.37.mlp.up_proj.zeros", "model.layers.37.mlp.up_proj.scales", "model.layers.37.mlp.up_proj.bias", "model.layers.37.mlp.up_proj.qweight", "model.layers.37.input_layernorm.weight", "model.layers.37.post_attention_layernorm.weight", "model.layers.38.self_attn.q_proj.zeros", "model.layers.38.self_attn.q_proj.scales", "model.layers.38.self_attn.q_proj.bias", "model.layers.38.self_attn.q_proj.qweight", "model.layers.38.self_attn.k_proj.zeros", "model.layers.38.self_attn.k_proj.scales", "model.layers.38.self_attn.k_proj.bias", "model.layers.38.self_attn.k_proj.qweight", "model.layers.38.self_attn.v_proj.zeros", "model.layers.38.self_attn.v_proj.scales", "model.layers.38.self_attn.v_proj.bias", "model.layers.38.self_attn.v_proj.qweight", "model.layers.38.self_attn.o_proj.zeros", "model.layers.38.self_attn.o_proj.scales", "model.layers.38.self_attn.o_proj.bias", "model.layers.38.self_attn.o_proj.qweight", "model.layers.38.self_attn.rotary_emb.inv_freq", "model.layers.38.mlp.gate_proj.zeros", "model.layers.38.mlp.gate_proj.scales", "model.layers.38.mlp.gate_proj.bias", "model.layers.38.mlp.gate_proj.qweight", "model.layers.38.mlp.down_proj.zeros", "model.layers.38.mlp.down_proj.scales", "model.layers.38.mlp.down_proj.bias", "model.layers.38.mlp.down_proj.qweight", "model.layers.38.mlp.up_proj.zeros", "model.layers.38.mlp.up_proj.scales", "model.layers.38.mlp.up_proj.bias", "model.layers.38.mlp.up_proj.qweight", "model.layers.38.input_layernorm.weight", "model.layers.38.post_attention_layernorm.weight", "model.layers.39.self_attn.q_proj.zeros", "model.layers.39.self_attn.q_proj.scales", "model.layers.39.self_attn.q_proj.bias", "model.layers.39.self_attn.q_proj.qweight", "model.layers.39.self_attn.k_proj.zeros", "model.layers.39.self_attn.k_proj.scales", "model.layers.39.self_attn.k_proj.bias", "model.layers.39.self_attn.k_proj.qweight", "model.layers.39.self_attn.v_proj.zeros", "model.layers.39.self_attn.v_proj.scales", "model.layers.39.self_attn.v_proj.bias", "model.layers.39.self_attn.v_proj.qweight", "model.layers.39.self_attn.o_proj.zeros", "model.layers.39.self_attn.o_proj.scales", "model.layers.39.self_attn.o_proj.bias", "model.layers.39.self_attn.o_proj.qweight", "model.layers.39.self_attn.rotary_emb.inv_freq", "model.layers.39.mlp.gate_proj.zeros", "model.layers.39.mlp.gate_proj.scales", "model.layers.39.mlp.gate_proj.bias", "model.layers.39.mlp.gate_proj.qweight", "model.layers.39.mlp.down_proj.zeros", "model.layers.39.mlp.down_proj.scales", "model.layers.39.mlp.down_proj.bias", "model.layers.39.mlp.down_proj.qweight", "model.layers.39.mlp.up_proj.zeros", "model.layers.39.mlp.up_proj.scales", "model.layers.39.mlp.up_proj.bias", "model.layers.39.mlp.up_proj.qweight", "model.layers.39.input_layernorm.weight", "model.layers.39.post_attention_layernorm.weight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.o_proj.zeros", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.v_proj.zeros", "model.layers.0.mlp.down_proj.zeros", "model.layers.0.mlp.gate_proj.zeros", "model.layers.0.mlp.up_proj.zeros", "model.layers.1.self_attn.k_proj.zeros", "model.layers.1.self_attn.o_proj.zeros", "model.layers.1.self_attn.q_proj.zeros", "model.layers.1.self_attn.v_proj.zeros", "model.layers.1.mlp.down_proj.zeros", "model.layers.1.mlp.gate_proj.zeros", "model.layers.1.mlp.up_proj.zeros", "model.layers.2.self_attn.k_proj.zeros", "model.layers.2.self_attn.o_proj.zeros", "model.layers.2.self_attn.q_proj.zeros", "model.layers.2.self_attn.v_proj.zeros", "model.layers.2.mlp.down_proj.zeros", "model.layers.2.mlp.gate_proj.zeros", "model.layers.2.mlp.up_proj.zeros", "model.layers.3.self_attn.k_proj.zeros", "model.layers.3.self_attn.o_proj.zeros", "model.layers.3.self_attn.q_proj.zeros", "model.layers.3.self_attn.v_proj.zeros", "model.layers.3.mlp.down_proj.zeros", "model.layers.3.mlp.gate_proj.zeros", "model.layers.3.mlp.up_proj.zeros", "model.layers.4.self_attn.k_proj.zeros", "model.layers.4.self_attn.o_proj.zeros", "model.layers.4.self_attn.q_proj.zeros", "model.layers.4.self_attn.v_proj.zeros", "model.layers.4.mlp.down_proj.zeros", "model.layers.4.mlp.gate_proj.zeros", "model.layers.4.mlp.up_proj.zeros", "model.layers.5.self_attn.k_proj.zeros", "model.layers.5.self_attn.o_proj.zeros", "model.layers.5.self_attn.q_proj.zeros", "model.layers.5.self_attn.v_proj.zeros", "model.layers.5.mlp.down_proj.zeros", "model.layers.5.mlp.gate_proj.zeros", "model.layers.5.mlp.up_proj.zeros", "model.layers.6.self_attn.k_proj.zeros", "model.layers.6.self_attn.o_proj.zeros", "model.layers.6.self_attn.q_proj.zeros", "model.layers.6.self_attn.v_proj.zeros", "model.layers.6.mlp.down_proj.zeros", "model.layers.6.mlp.gate_proj.zeros", "model.layers.6.mlp.up_proj.zeros", "model.layers.7.self_attn.k_proj.zeros", "model.layers.7.self_attn.o_proj.zeros", "model.layers.7.self_attn.q_proj.zeros", "model.layers.7.self_attn.v_proj.zeros", "model.layers.7.mlp.down_proj.zeros", "model.layers.7.mlp.gate_proj.zeros", "model.layers.7.mlp.up_proj.zeros", "model.layers.8.self_attn.k_proj.zeros", "model.layers.8.self_attn.o_proj.zeros", "model.layers.8.self_attn.q_proj.zeros", "model.layers.8.self_attn.v_proj.zeros", "model.layers.8.mlp.down_proj.zeros", "model.layers.8.mlp.gate_proj.zeros", "model.layers.8.mlp.up_proj.zeros", "model.layers.9.self_attn.k_proj.zeros", "model.layers.9.self_attn.o_proj.zeros", "model.layers.9.self_attn.q_proj.zeros", "model.layers.9.self_attn.v_proj.zeros", "model.layers.9.mlp.down_proj.zeros", "model.layers.9.mlp.gate_proj.zeros", "model.layers.9.mlp.up_proj.zeros", "model.layers.10.self_attn.k_proj.zeros", "model.layers.10.self_attn.o_proj.zeros", "model.layers.10.self_attn.q_proj.zeros", "model.layers.10.self_attn.v_proj.zeros", "model.layers.10.mlp.down_proj.zeros", "model.layers.10.mlp.gate_proj.zeros", "model.layers.10.mlp.up_proj.zeros", "model.layers.11.self_attn.k_proj.zeros", "model.layers.11.self_attn.o_proj.zeros", "model.layers.11.self_attn.q_proj.zeros", "model.layers.11.self_attn.v_proj.zeros", "model.layers.11.mlp.down_proj.zeros", "model.layers.11.mlp.gate_proj.zeros", "model.layers.11.mlp.up_proj.zeros", "model.layers.12.self_attn.k_proj.zeros", "model.layers.12.self_attn.o_proj.zeros", "model.layers.12.self_attn.q_proj.zeros", "model.layers.12.self_attn.v_proj.zeros", "model.layers.12.mlp.down_proj.zeros", "model.layers.12.mlp.gate_proj.zeros", "model.layers.12.mlp.up_proj.zeros", "model.layers.13.self_attn.k_proj.zeros", "model.layers.13.self_attn.o_proj.zeros", "model.layers.13.self_attn.q_proj.zeros", "model.layers.13.self_attn.v_proj.zeros", "model.layers.13.mlp.down_proj.zeros", "model.layers.13.mlp.gate_proj.zeros", "model.layers.13.mlp.up_proj.zeros", "model.layers.14.self_attn.k_proj.zeros", "model.layers.14.self_attn.o_proj.zeros", "model.layers.14.self_attn.q_proj.zeros", "model.layers.14.self_attn.v_proj.zeros", "model.layers.14.mlp.down_proj.zeros", "model.layers.14.mlp.gate_proj.zeros", "model.layers.14.mlp.up_proj.zeros", "model.layers.15.self_attn.k_proj.zeros", "model.layers.15.self_attn.o_proj.zeros", "model.layers.15.self_attn.q_proj.zeros", "model.layers.15.self_attn.v_proj.zeros", "model.layers.15.mlp.down_proj.zeros", "model.layers.15.mlp.gate_proj.zeros", "model.layers.15.mlp.up_proj.zeros", "model.layers.16.self_attn.k_proj.zeros", "model.layers.16.self_attn.o_proj.zeros", "model.layers.16.self_attn.q_proj.zeros", "model.layers.16.self_attn.v_proj.zeros", "model.layers.16.mlp.down_proj.zeros", "model.layers.16.mlp.gate_proj.zeros", "model.layers.16.mlp.up_proj.zeros", "model.layers.17.self_attn.k_proj.zeros", "model.layers.17.self_attn.o_proj.zeros", "model.layers.17.self_attn.q_proj.zeros", "model.layers.17.self_attn.v_proj.zeros", "model.layers.17.mlp.down_proj.zeros", "model.layers.17.mlp.gate_proj.zeros", "model.layers.17.mlp.up_proj.zeros", "model.layers.18.self_attn.k_proj.zeros", "model.layers.18.self_attn.o_proj.zeros", "model.layers.18.self_attn.q_proj.zeros", "model.layers.18.self_attn.v_proj.zeros", "model.layers.18.mlp.down_proj.zeros", "model.layers.18.mlp.gate_proj.zeros", "model.layers.18.mlp.up_proj.zeros", "model.layers.19.self_attn.k_proj.zeros", "model.layers.19.self_attn.o_proj.zeros", "model.layers.19.self_attn.q_proj.zeros", "model.layers.19.self_attn.v_proj.zeros", "model.layers.19.mlp.down_proj.zeros", "model.layers.19.mlp.gate_proj.zeros", "model.layers.19.mlp.up_proj.zeros", "model.layers.20.self_attn.k_proj.zeros", "model.layers.20.self_attn.o_proj.zeros", "model.layers.20.self_attn.q_proj.zeros", "model.layers.20.self_attn.v_proj.zeros", "model.layers.20.mlp.down_proj.zeros", "model.layers.20.mlp.gate_proj.zeros", "model.layers.20.mlp.up_proj.zeros", "model.layers.21.self_attn.k_proj.zeros", "model.layers.21.self_attn.o_proj.zeros", "model.layers.21.self_attn.q_proj.zeros", "model.layers.21.self_attn.v_proj.zeros", "model.layers.21.mlp.down_proj.zeros", "model.layers.21.mlp.gate_proj.zeros", "model.layers.21.mlp.up_proj.zeros", "model.layers.22.self_attn.k_proj.zeros", "model.layers.22.self_attn.o_proj.zeros", "model.layers.22.self_attn.q_proj.zeros", "model.layers.22.self_attn.v_proj.zeros", "model.layers.22.mlp.down_proj.zeros", "model.layers.22.mlp.gate_proj.zeros", "model.layers.22.mlp.up_proj.zeros", "model.layers.23.self_attn.k_proj.zeros", "model.layers.23.self_attn.o_proj.zeros", "model.layers.23.self_attn.q_proj.zeros", "model.layers.23.self_attn.v_proj.zeros", "model.layers.23.mlp.down_proj.zeros", "model.layers.23.mlp.gate_proj.zeros", "model.layers.23.mlp.up_proj.zeros", "model.layers.24.self_attn.k_proj.zeros", "model.layers.24.self_attn.o_proj.zeros", "model.layers.24.self_attn.q_proj.zeros", "model.layers.24.self_attn.v_proj.zeros", "model.layers.24.mlp.down_proj.zeros", "model.layers.24.mlp.gate_proj.zeros", "model.layers.24.mlp.up_proj.zeros", "model.layers.25.self_attn.k_proj.zeros", "model.layers.25.self_attn.o_proj.zeros", "model.layers.25.self_attn.q_proj.zeros", "model.layers.25.self_attn.v_proj.zeros", "model.layers.25.mlp.down_proj.zeros", "model.layers.25.mlp.gate_proj.zeros", "model.layers.25.mlp.up_proj.zeros", "model.layers.26.self_attn.k_proj.zeros", "model.layers.26.self_attn.o_proj.zeros", "model.layers.26.self_attn.q_proj.zeros", "model.layers.26.self_attn.v_proj.zeros", "model.layers.26.mlp.down_proj.zeros", "model.layers.26.mlp.gate_proj.zeros", "model.layers.26.mlp.up_proj.zeros", "model.layers.27.self_attn.k_proj.zeros", "model.layers.27.self_attn.o_proj.zeros", "model.layers.27.self_attn.q_proj.zeros", "model.layers.27.self_attn.v_proj.zeros", "model.layers.27.mlp.down_proj.zeros", "model.layers.27.mlp.gate_proj.zeros", "model.layers.27.mlp.up_proj.zeros", "model.layers.28.self_attn.k_proj.zeros", "model.layers.28.self_attn.o_proj.zeros", "model.layers.28.self_attn.q_proj.zeros", "model.layers.28.self_attn.v_proj.zeros", "model.layers.28.mlp.down_proj.zeros", "model.layers.28.mlp.gate_proj.zeros", "model.layers.28.mlp.up_proj.zeros", "model.layers.29.self_attn.k_proj.zeros", "model.layers.29.self_attn.o_proj.zeros", "model.layers.29.self_attn.q_proj.zeros", "model.layers.29.self_attn.v_proj.zeros", "model.layers.29.mlp.down_proj.zeros", "model.layers.29.mlp.gate_proj.zeros", "model.layers.29.mlp.up_proj.zeros", "model.layers.30.self_attn.k_proj.zeros", "model.layers.30.self_attn.o_proj.zeros", "model.layers.30.self_attn.q_proj.zeros", "model.layers.30.self_attn.v_proj.zeros", "model.layers.30.mlp.down_proj.zeros", "model.layers.30.mlp.gate_proj.zeros", "model.layers.30.mlp.up_proj.zeros", "model.layers.31.self_attn.k_proj.zeros", "model.layers.31.self_attn.o_proj.zeros", "model.layers.31.self_attn.q_proj.zeros", "model.layers.31.self_attn.v_proj.zeros", "model.layers.31.mlp.down_proj.zeros", "model.layers.31.mlp.gate_proj.zeros", "model.layers.31.mlp.up_proj.zeros".
        size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32000, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
        size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.0.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.0.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.0.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.0.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.0.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.0.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.0.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.0.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.0.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.0.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.0.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.0.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.0.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.0.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.0.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.0.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.0.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.0.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.1.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.1.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.1.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.1.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.1.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.1.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.1.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.1.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.1.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.1.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.1.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.1.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.1.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.1.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.1.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.1.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.1.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.1.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.1.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.1.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.1.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.1.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.1.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.2.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.2.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.2.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.2.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.2.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.2.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.2.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.2.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.2.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.2.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.2.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.2.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.2.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.2.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.2.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.2.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.2.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.2.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.2.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.2.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.2.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.2.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.2.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.3.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.3.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.3.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.3.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.3.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.3.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.3.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.3.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.3.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.3.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.3.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.3.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.3.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.3.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.3.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.3.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.3.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.3.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.3.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.3.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.3.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.3.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.3.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.4.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.4.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.4.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.4.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.4.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.4.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.4.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.4.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.4.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.4.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.4.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.4.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.4.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.4.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.4.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.4.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.4.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.4.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.4.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.4.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.4.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.4.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.4.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.5.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.5.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.5.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.5.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.5.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.5.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.5.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.5.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.5.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.5.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.5.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.5.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.5.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.5.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.5.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.5.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.5.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.5.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.5.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.5.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.5.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.5.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.5.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.6.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.6.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.6.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.6.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.6.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.6.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.6.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.6.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.6.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.6.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.6.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.6.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.6.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.6.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.6.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.6.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.6.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.6.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.6.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.6.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.6.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.6.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.6.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.7.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.7.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.7.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.7.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.7.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.7.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.7.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.7.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.7.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.7.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.7.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.7.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.7.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.7.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.7.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.7.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.7.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.7.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.7.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.7.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.7.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.7.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.7.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.8.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.8.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.8.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.8.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.8.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.8.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.8.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.8.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.8.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.8.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.8.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.8.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.8.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.8.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.8.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.8.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.8.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.8.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.8.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.8.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.8.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.8.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.8.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.9.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.9.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.9.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.9.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.9.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.9.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.9.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.9.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.9.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.9.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.9.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.9.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.9.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.9.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.9.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.9.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.9.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.9.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.9.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.9.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.9.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.9.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.9.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.10.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.10.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.10.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.10.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.10.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.10.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.10.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.10.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.10.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.10.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.10.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.10.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.10.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.10.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.10.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.10.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.10.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.10.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.10.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.10.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.10.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.10.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.10.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.11.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.11.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.11.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.11.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.11.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.11.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.11.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.11.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.11.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.11.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.11.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.11.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.11.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.11.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.11.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.11.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.11.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.11.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.11.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.11.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.11.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.11.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.11.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.12.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.12.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.12.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.12.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.12.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.12.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.12.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.12.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.12.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.12.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.12.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.12.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.12.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.12.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.12.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.12.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.12.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.12.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.12.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.12.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.12.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.12.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.12.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.13.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.13.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.13.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.13.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.13.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.13.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.13.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.13.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.13.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.13.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.13.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.13.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.13.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.13.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.13.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.13.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.13.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.13.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.13.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.13.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.13.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.13.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.13.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.14.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.14.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.14.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.14.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.14.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.14.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.14.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.14.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.14.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.14.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.14.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.14.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.14.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.14.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.14.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.14.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.14.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.14.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.14.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.14.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.14.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.14.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.14.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.15.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.15.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.15.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.15.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.15.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.15.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.15.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.15.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.15.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.15.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.15.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.15.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.15.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.15.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.15.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.15.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.15.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.15.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.15.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.15.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.15.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.15.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.15.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.16.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.16.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.16.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.16.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.16.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.16.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.16.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.16.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.16.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.16.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.16.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.16.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.16.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.16.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.16.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.16.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.16.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.16.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.16.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.16.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.16.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.16.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.16.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.17.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.17.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.17.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.17.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.17.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.17.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.17.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.17.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.17.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.17.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.17.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.17.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.17.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.17.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.17.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.17.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.17.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.17.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.17.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.17.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.17.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.17.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.17.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.18.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.18.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.18.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.18.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.18.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.18.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.18.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.18.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.18.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.18.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.18.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.18.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.18.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.18.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.18.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.18.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.18.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.18.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.18.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.18.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.18.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.18.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.18.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.19.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.19.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.19.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.19.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.19.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.19.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.19.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.19.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.19.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.19.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.19.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.19.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.19.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.19.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.19.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.19.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.19.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.19.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.19.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.19.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.19.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.19.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.19.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.20.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.20.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.20.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.20.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.20.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.20.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.20.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.20.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.20.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.20.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.20.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.20.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.20.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.20.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.20.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.20.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.20.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.20.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.20.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.20.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.20.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.20.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.20.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.21.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.21.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.21.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.21.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.21.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.21.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.21.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.21.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.21.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.21.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.21.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.21.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.21.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.21.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.21.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.21.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.21.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.21.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.21.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.21.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.21.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.21.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.21.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.22.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.22.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.22.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.22.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.22.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.22.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.22.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.22.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.22.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.22.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.22.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.22.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.22.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.22.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.22.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.22.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.22.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.22.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.22.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.22.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.22.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.22.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.22.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.23.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.23.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.23.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.23.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.23.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.23.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.23.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.23.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.23.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.23.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.23.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.23.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.23.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.23.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.23.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.23.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.23.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.23.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.23.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.23.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.23.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.23.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.23.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.24.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.24.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.24.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.24.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.24.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.24.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.24.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.24.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.24.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.24.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.24.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.24.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.24.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.24.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.24.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.24.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.24.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.24.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.24.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.24.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.24.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.24.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.24.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.25.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.25.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.25.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.25.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.25.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.25.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.25.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.25.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.25.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.25.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.25.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.25.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.25.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.25.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.25.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.25.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.25.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.25.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.25.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.25.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.25.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.25.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.25.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.26.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.26.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.26.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.26.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.26.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.26.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.26.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.26.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.26.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.26.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.26.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.26.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.26.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.26.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.26.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.26.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.26.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.26.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.26.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.26.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.26.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.26.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.26.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.27.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.27.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.27.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.27.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.27.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.27.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.27.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.27.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.27.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.27.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.27.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.27.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.27.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.27.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.27.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.27.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.27.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.27.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.27.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.27.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.27.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.27.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.27.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.28.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.28.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.28.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.28.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.28.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.28.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.28.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.28.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.28.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.28.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.28.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.28.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.28.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.28.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.28.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.28.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.28.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.28.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.28.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.28.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.28.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.28.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.28.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.29.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.29.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.29.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.29.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.29.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.29.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.29.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.29.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.29.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.29.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.29.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.29.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.29.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.29.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.29.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.29.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.29.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.29.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.29.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.29.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.29.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.29.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.29.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.30.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.30.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.30.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.30.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.30.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.30.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.30.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.30.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.30.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.30.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.30.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.30.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.30.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.30.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.30.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.30.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.30.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.30.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.30.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.30.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.30.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.30.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.30.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.31.self_attn.k_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.31.self_attn.k_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.31.self_attn.k_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.31.self_attn.o_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.31.self_attn.o_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.31.self_attn.o_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.31.self_attn.q_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.31.self_attn.q_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.31.self_attn.q_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.31.self_attn.v_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.31.self_attn.v_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.31.self_attn.v_proj.qweight: copying a param with shape torch.Size([640, 5120]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
        size mismatch for model.layers.31.mlp.down_proj.scales: copying a param with shape torch.Size([5120, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for model.layers.31.mlp.down_proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.31.mlp.down_proj.qweight: copying a param with shape torch.Size([1728, 5120]) from checkpoint, the shape in current model is torch.Size([1376, 4096]).
        size mismatch for model.layers.31.mlp.gate_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.31.mlp.gate_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.31.mlp.gate_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.31.mlp.up_proj.scales: copying a param with shape torch.Size([13824, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
        size mismatch for model.layers.31.mlp.up_proj.bias: copying a param with shape torch.Size([13824]) from checkpoint, the shape in current model is torch.Size([11008]).
        size mismatch for model.layers.31.mlp.up_proj.qweight: copying a param with shape torch.Size([640, 13824]) from checkpoint, the shape in current model is torch.Size([512, 11008]).
        size mismatch for model.layers.31.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.layers.31.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for model.norm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for lm_head.weight: copying a param with shape torch.Size([32000, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

@bartman081523
Copy link

bartman081523 commented Apr 5, 2023

Big error

I have this error too with all Llama-type models and 4-bit mode. I have updated today and reinstalled GPTQ from @oobabooga and transformers from huggingface github.

RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
Missing key(s) in state_dict: "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros",
...
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32000, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

I think here is the info that helped me
#734 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants