Multi GPU 4-bit doesn't work on either ooba GPTQ or qwopqwop200 GPTQ #1112

Panchovix · 2023-04-12T23:33:46Z

Describe the bug

The issue consist that, while using any 4bit model like LLaMa, Alpaca, etc, 2 issues can happen depending of the version of GPTQ that you use while generating a message.

This happens on either newest or "older" (older with group size but not with the latest quant). For the older models, used the models from here #530 (comment)

For the new models, used the models from here: https://huggingface.co/Neko-Institute-of-Science

If using ooba GPTQ the issue is related to "TypeError: vecquant4matmul(): incompatible function arguments." and it generates just 1 token and it stops working. GPTQ build used is: https://github.com/oobabooga/GPTQ-for-LLaMa

If using qwopqwop200 GPTQ the issue is "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!". GPTQ build used is: https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda

Is there an existing issue for this?

I have searched the existing issues

Reproduction

Use any 4-bit model on 2 GPUs and the issue should happen on either ooba GTPQ or qwopqwop200 GTPQ.

(For example, python server.py --chat --extensions api --listen --wbits 4 --listen-port 7990 --gpu-memory 10 10 and then choose any 4-bit 30b model on the webui, or gpu mem 5 5 and any 4-bit 13b model)

Then, try to generate any message or impersonate, and the issues should arise.

Screenshot

For the ooba GPTQ, this is the issue.

For the qwopqwop200 GPTQ, this is the issue.

Logs

For ooba GPTQ:

Loading llama-30b-4bit...
Found the following quantized model: models\llama-30b-4bit\llama-30b-4bit.safetensors
Loading model ...
Done.
Using the following device map for the 4-bit model: {'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 0, 'model.layers.12': 0, 'model.layers.13': 0, 'model.layers.14': 0, 'model.layers.15': 0, 'model.layers.16': 0, 'model.layers.17': 0, 'model.layers.18': 0, 'model.layers.19': 0, 'model.layers.20': 0, 'model.layers.21': 0, 'model.layers.22': 0, 'model.layers.23': 0, 'model.layers.24': 0, 'model.layers.25': 0, 'model.layers.26': 0, 'model.layers.27': 0, 'model.layers.28': 0, 'model.layers.29': 0, 'model.layers.30': 0, 'model.layers.31': 0, 'model.layers.32': 0, 'model.layers.33': 0, 'model.layers.34': 0, 'model.layers.35': 1, 'model.layers.36': 1, 'model.layers.37': 1, 'model.layers.38': 1, 'model.layers.39': 1, 'model.layers.40': 1, 'model.layers.41': 1, 'model.layers.42': 1, 'model.layers.43': 1, 'model.layers.44': 1, 'model.layers.45': 1, 'model.layers.46': 1, 'model.layers.47': 1, 'model.layers.48': 1, 'model.layers.49': 1, 'model.layers.50': 1, 'model.layers.51': 1, 'model.layers.52': 1, 'model.layers.53': 1, 'model.layers.54': 1, 'model.layers.55': 1, 'model.layers.56': 1, 'model.layers.57': 1, 'model.layers.58': 1, 'model.layers.59': 1, 'model.norm': 1, 'lm_head': 1}
Replaced attention with sdp_attention
Loaded the model in 49.97 seconds.
Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\text_generation.py", line 245, in generate_with_callback
    shared.model.generate(**kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
    return self.sample(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
    outputs = self(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\llama_attn_hijack.py", line 122, in sdp_attention_forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 426, in forward
    quant_cuda.vecquant4matmul(x, self.qweight, y, self.scales, self.qzeros, self.groupsize)
TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: torch.Tensor) -> None

Invoked with: tensor([[-0.0400,  0.0132, -0.0013,  ..., -0.0043, -0.0217, -0.0052]],
       device='cuda:0'), tensor([[ 1718253433,  2005432169,  1234789529,  ..., -2018924424,
         -1502128569,  2037938296],
        [ 2019911271,  1987475319,  1750504568,  ..., -1736930890,
           965175170, -1465345654],
        [-1753778313, -2005497737, -1215801432,  ...,  2022066057,
          1183230325, -2020972614],
        ...,
        [ 1987605622,  2004317831,  1790801831,  ..., -1984522089,
          1935956105, -1986422154],
        [ 1719162504,  1987470983,   897218455,  ...,  2053601446,
         -1752841096,  1284090982],
        [ 1736931431,  2022209399,  2017958294,  ...,  2037999496,
         -2023323559, -1231513656]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0'), tensor([[0.0111, 0.0150, 0.0077,  ..., 0.0194, 0.0119, 0.0131]],
       device='cuda:0'), tensor([[ 2004252518,  2022139750,  1735874423,  1986491768,  1987470950,
          2003199623,  2003269238, -2039060634,  1987536487,  1734760022,
          1719105128,  1986418310,  1717986936,  1717986934,  1986422903,
          1734768487,  2003199863,  1719039607,  1717991031,  1734833783,
          1986488183,  2004317815,  1986483815,  1717003893,  2003265127,
          1734833766,  1717991030,  2003199590,  1718052455,  1484224118,
          1986356598,  1987536503,  1734825590, -2055837833,  1986488422,
          1720153959,  1717991270,  1734829925,  1986487925,  1717987191,
          1734825575,  1734768214,  1734838134,  1735878263,  1735882616,
          2004252263,  1986492278,  1717991031,  2004248438,  1718122887,
          1718122343,  1734825862,  1718060647,  1701209958,  1734829926,
          1734768486,  1988585063, -2021161097,  1986483560,  2004186487,
          1969710711,  1703310966,  1719035766,  1734829942, -2039978395,
          1986488438,  2004318071,  1701217910,  2004318086,  1718056807,
          2003265126,  1718056551,  1987470471,  1720088165,  1718052710,
          1751603063,  1719036008,  1719035750,  1735747175,  2004248423,
          2020046695,  2021025654,  2003203958,  1986492279, -2022341017,
         -1770555545,  1986496103,  1986426471,  2003129719,  2019981190,
          1985378151,  1987606151, -2040043930,  2021095013,  1986426742,
          1734829670,  1734829926,  2021029736,  1986487927,  1734833766,
          1718056567,  1717991287,  1987475302,  1717925494,  2004317574,
          1718052471,  1734698870, -2005437577,  1718052710,  1987471190,
          1987540838, -2040105098,  2004252278,  1719035750,  2003199863,
          2003265398,  1735812709,  1986487910,  1717986919,  2004317814,
          1734764406,  1735882343,  1736930950,  1717007990,  1467512678,
          1734768487,  1986426470,  1734768246,  1734764134,  1734833766,
          1987475302,  2003203943,  1701340550,  1987540598,  1717991030,
          1719039591,  2021095286,  1699182182,  1735816822, -2054723977,
          1719035510,  1719044199,  1986488182,  1734764406,  1735882598,
          1987405416,  1717921638,  1734829942,  1752590455, -2039060889,
          1987471207,  1719039847,  1720088167,  1987475047,  1734768742,
          2003265382,  1718052695,  1719035767,  1719100790,  2004248422,
          1986488166,  2003265142,  1735812982,  1719039863,  2004252534,
          1719035767,  1987536486,  1986488182,  1986430823, -2022217866,
          1986426759,  1986426471,  2004252518,  1735816808,  1719101285,
          1735816823,  1987536759,  1987536486,  2005296743,  1735812727,
          1734834023,  1970693495,  1734768743,  2003269494,  1735878502,
          1719039591,  2003269478,  1717991014,  1719109494,  1735878519,
          1987541095,  2021029734,  2004313703,  1702323575,  2004317799,
          1986492006,  2003203942,  1986426487,  1718056550,  1735878502,
          1717986662,  1988654967,  1717003894,  2004313702,  1986487927,
          1717991286,  1717991271,  1719039846,  1751611238,  1734698854,
          2004317798,  1717986937,  1449617254,  1716938598,  1717008246,
          1484154726, -2024376713,  2004182904,  1733785462,  1735817575,
          1466267495,  1986426486, -2040044168,  2019980901,  1719039590,
          1734764647,  1719101014,  1985443431, -2021234841,  1751610981,
          2004313447,  2003269495, -2039056537,  1717003880,  1986365063,
          1701209720,  1753708661,  1987536247,  1734764391,  1734834038,
          1733785190,  2004252535,  2003269223,  1987409526,  2003268965,
          1987409768,  1717991015,  1734698870,  1717987174,  1987475302,
          1987536487,  1734768261,  1717991030,  1734829686,  1986422375,
         -2040039833,  1717991031,  1734772583, -2024310666,  1717991271,
          2020046199,  1988589191,  2004186999,  1716942438,  1720083832,
         -2023328155,  1734694504,  1718978390, -2038995338,  2003203670,
          1987475062,  2003199847,  1734895222,  1735878247,  2003195750,
         -2039978122,  2004317814,  1735947879,  1719039847,  1717987191,
         -2055833736,  1719101063,  1719105127,  1968662118,  1986422391,
          1987475064,  1986422630,  1986422391,  2003199590, -2040039786,
         -2037950635,  1752721270,  1718056583,  1702328165,  1735878519,
          1986487910, -2038925720,  1986426741,  1969650023,  2005292885,
          1986488166,  2003269239,  1986492262,  1718052727,  1986492006,
          1969710694,  1735817063, -2024245418,  2003330663,  2004248167,
          1718056823,  1751541606,  1734768503,  2003334791,  1450665846,
          1985513592,  1718118504,  1734768470,  2004248167,  1717986935,
          1718052486,  2004252279,  1733781095,  1735882598,  1735812470,
          2002216549,  2003203687,  1987475046,  2004239734,  1451648631,
          1987540840,  1735817335,  1735812950,  1718970231,  1719105144,
          1735817079, -2040109465,  1466263430, -2023266698,  2003269479,
         -2022349192,  1719105143,  1718052727,  1986422391,  1987471223,
          1988515670,  1735882358,  1734834054,  1969579895,  1751545462,
          1987536742,  2021095015,  2005366646,  2002151013,  1717921399,
          1986553447,  1987475047,  2004248167,  1986422390,  1719039862,
          1735882615,  1987540854,  1969772408,  1467451254,  1719105126,
          1987471206,  2003203702,  1719101062,  2004248167, -2023266698,
          2004186999,  1718970230,  1719105127,  1717991287,  2004313958,
         -2022214281,  1987532407,  1734834007,  1988519528,  1733719671,
          2020046167,  1718056789,  1717986935, -2023332233,  1735878246,
          2003269752,  1752659318,  1736996711,  2004317799,  1717982838,
          2005297015,  1717986934,  2003203958,  1719039846,  1986426213,
          1717004134,  1719031654,  1987536247,  1719039606,  1717991271,
          2004248166,  2003269495,  1719101287,  1719043959,  1483044439,
          1717995366,  1719097222,  1752659317,  1987602279,  1735882615,
          1969715303,  1734833798,  1985369719,  1734764408,  2003269478,
          1735878247,  1717987175,  1987475318,  1735878263,  2002155110,
          1735948375,  1717991270,  2004317798,  1734829655,  1987536487,
          1987536742,  1986492023,  1970763366,  1750562679,  2004248167,
          1735882583,  1735882342,  1720149606,  1717987174,  1717987175,
          1719035767,  1986422630,  1987471206,  2005366645,  1718052470,
          2004313959,  1719035510,  1719035494,  1969710712, -2040043640,
          1735944040,  2004313960,  1986426727,  2005231223,  1719166823,
          1735817063,  1734833800,  2003199878,  1988589430,  1720088182,
          2003273573,  1483106166,  1717986918,  1734768503,  2003203959,
          1735878263,  1752655735,  1751610999,  1735882359,  2003134311,
         -2023327882,  1735816791,  1986422134,  2004318054,  1733781350,
          1987470950,  1718052455,  1734698631,  1482053206,  1987536519,
         -2040043658,  2003265399,  1734698598,  1719031399,  1736865654,
          2019976823,  1718052454,  1734833783,  2019980919,  1988589413,
          1751611255,  1734834279,  1751545719,  1969641335,  1718122344,
          1468495975,  1987606103,  1701279590,  1986492278,  1986492022,
          1735817079,  1717987190,  1719043686,  1987409766,  1985443704,
          1986492023,  2003203703,  1719039863,  1987471207,  1987540599,
          1719035765,  1986422375,  2004383334,  2004318311,  1735817061,
          1986492007,  1719105382,  1733847142,  2001172087,  2005362279,
          1734833783,  1717986662,  1719105398,  1987536503,  1719039847,
          1718052486,  2004252278,  1970697832,  1717990758,  1986422647,
          1734764392,  2021025654,  1734768230,  1467447414,  1734768519,
          2004314214,  1986356839,  1987536758,  1734825573,  1735878246,
          1734768230,  1718052455,  1986431094,  1719035510,  2004313703,
         -2039056778,  2004248391,  1987470966,  2019981174,  1719039606,
          1734833782,  1986487911,  1734760039,  1734764406,  2004248423,
          1735816805,  2004318071,  1987470967,  1736861303,  1717986935,
          1716938342,  1752590215,  1702258534,  1717986919,  2003269479,
          1986487911,  1449424503,  1701209942,  1717991030,  1719035495,
          1985439319,  1968662374,  2018997862,  1735878519,  1987536503,
          1988646757,  1987471479,  2004321911, -2055838105,  1987536500,
          1735882598,  1987610744,  1719101287,  1986487910, -2023262345,
          1734768230,  1717073512,  1734833784,  1735812983,  2003269479,
          2004256390,  1988589431,  1751475831,  1986492007,  1986492007,
          1735878247,  2003269494,  1736791926, -2007595385,  1436047495,
          1733785431,  1717912952,  1986492261,  2003330935,  1734829687,
          1467446902,  2004190821,  1702192520,  2017748358, -2023454602,
          1719105127,  1735878246,  2004248167,  1449686631,  1733719670,
          2003269462,  1734825832,  1733781110,  1717007974,  1986422647,
          1719035510,  2003203702,  2002220903,  1450665830,  1988580983,
          1735812966,  1719105144,  1735878503,  1735882359,  1987471223,
          2019977062,  2019977062,  1970759781,  1719040103,  2004313975,
          2003269222,  1986487926,  1717003878,  1719101303,  1968662373,
          1986492007,  1986426486,  2004252504,  1734899574,  1986426742,
         -2022344858,  1735874150,  1701209703,  1717069416,  1718060630,
          1735816806,  1719101030,  1734833767,  1719039591,  1987536486,
          1719105143,  2004317782,  1433826917,  1734767990, -2040105081,
          1733785445,  1987537014,  1467508600,  1987475303,  1719039591,
          1752659574,  1717991287,  1717986935,  2004248423,  1986492262,
          2004186727,  1734829670,  1985439607,  1734764152,  2004248150,
          1988519542,  1987471206,  1719039608,  1719035767,  1719039846,
          1718056823,  1987540582,  1987475062,  2004318087,  1734764407,
         -2022340743,  1717986950,  1719101287,  1717986919,  2003203703,
          1987540582,  1986422647,  1734764150,  1717991287,  1735812966,
          1752589671,  1987536742,  2004313959,  1987471192,  1717921637,
          1717991271,  1718052455,  1986430567,  2004313719,  1700230774,
          2002220391,  1735882599,  2019976808,  1735812983,  1986553462,
          1752659831,  2004187015,  1719105398,  2002216822,  2003265383,
          1467373175,  1735817062,  1718056821,  1734838119,  1735882615,
          1718056806,  1987536487,  1735882870,  1987536502,  2004248182,
          1988523607, -2022274985,  1734698854,  1467442805,  1985312631,
          1718052454,  2004248679,  1718056823,  1718052743,  1733658486,
          1986487912,  1735817063,  1449485942,  1987540327,  2003203943,
          1987475046,  1719101046,  2004244327,  2003269493,  1719166309,
          1718052439,  1733715575,  1735878246,  1717987191,  1720084101,
          1753708150,  1719035750,  1719039624,  2003265127,  1988589175,
          1986492277,  2003269254,  2004318327,  1987606390,  1987536758,
          2020038775,  1449551496,  1986553478,  1719105414,  1734834023,
          1986422647,  1987540598,  1735935592,  1970763637, -2022274985,
          1735878774,  1987536759,  1734764135,  1987602294,  1734764151,
          1734764135,  2005366903,  1719109735,  2020042359,  1735812983,
          2004252247,  2003339126,  1987536486,  1718052455,  1987536758,
          2003265143,  1986492279,  1969714790,  2003203943,  1987532406,
          2004252536,  2004252278,  1449555830,  2003273334,  2003199590,
          1987540599,  1987540342,  1483040887,  1987409527,  1733781078,
          1719101287,  1734829686,  2003265126,  1717991286,  1719101287,
          2004318055,  1719035766,  1987536743,  1717991286,  1735882343,
          1735878264,  2004314231,  1702323830,  1987475575,  1734768231,
          1985377911,  1734768247,  1717987191,  2004318070,  2003199862,
          1467446886,  1701275480,  1986422375,  1735882343,  2004313959,
          1701283461,  1986426743,  1987475063,  1717991303,  1752594295,
          2004317814,  1734838119,  1735817335,  1986492263,  1969706853,
          1970767462,  1987471223]], device='cuda:0', dtype=torch.int32), 6656
Output generated in 2.68 seconds (0.37 tokens/s, 1 tokens, context 1172, seed 1812379680)

For qwopqwop200 GPTQ:

Loading alpaca-13b-lora-int4-128g...
Found the following quantized model: models\alpaca-13b-lora-int4-128g\4bit-128g.safetensors
Loading model ...
F:\ChatIAs\oobabooga\venv\lib\site-packages\safetensors\torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
Done.
Using the following device map for the 4-bit model: {'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 0, 'model.layers.12': 0, 'model.layers.13': 0, 'model.layers.14': 0, 'model.layers.15': 0, 'model.layers.16': 0, 'model.layers.17': 0, 'model.layers.18': 0, 'model.layers.19': 0, 'model.layers.20': 0, 'model.layers.21': 0, 'model.layers.22': 0, 'model.layers.23': 0, 'model.layers.24': 0, 'model.layers.25': 0, 'model.layers.26': 0, 'model.layers.27': 1, 'model.layers.28': 1, 'model.layers.29': 1, 'model.layers.30': 1, 'model.layers.31': 1, 'model.layers.32': 1, 'model.layers.33': 1, 'model.layers.34': 1, 'model.layers.35': 1, 'model.layers.36': 1, 'model.layers.37': 1, 'model.layers.38': 1, 'model.layers.39': 1, 'model.norm': 1, 'lm_head': 1}
Replaced attention with sdp_attention
Loaded the model in 4.13 seconds.
Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\text_generation.py", line 245, in generate_with_callback
    shared.model.generate(**kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
    return self.sample(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
    outputs = self(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 204, in forward
    query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 138, in apply_rotary_pos_emb
    q_embed = (q * cos) + (rotate_half(q) * sin)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Exception in thread Thread-4 (gentask):
Traceback (most recent call last):
  File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\callbacks.py", line 73, in gentask
    clear_torch_cache()
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\callbacks.py", line 105, in clear_torch_cache
    torch.cuda.empty_cache()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\cuda\memory.py", line 137, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

System Info

OS: Windows 11 and Ubuntu 22.04
CPU: Ryzen 7 5800X
RAM: 64 DDR4
Swap: 170GB swap
GPU: RTX 4090x2

Panchovix · 2023-04-13T19:00:32Z

Fixed the issue by downgrading from torch+cu118 to torch+cu117

aaaleeexTLC · 2023-05-06T14:12:23Z

Hi there, how did you managed to specify the GPU you want to use? I know is something related to "Device: cuda:0" or "Device='cuda:1'", but where do you define that value? or do you pass it as parameter??

Thanks

Panchovix added the bug Something isn't working label Apr 12, 2023

Panchovix mentioned this issue Apr 13, 2023

Add support for triton branch of GPTQ. #1073

Merged

Panchovix closed this as completed Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi GPU 4-bit doesn't work on either ooba GPTQ or qwopqwop200 GPTQ #1112

Multi GPU 4-bit doesn't work on either ooba GPTQ or qwopqwop200 GPTQ #1112

Panchovix commented Apr 12, 2023 •

edited

Loading

Panchovix commented Apr 13, 2023

aaaleeexTLC commented May 6, 2023

Multi GPU 4-bit doesn't work on either ooba GPTQ or qwopqwop200 GPTQ #1112

Multi GPU 4-bit doesn't work on either ooba GPTQ or qwopqwop200 GPTQ #1112

Comments

Panchovix commented Apr 12, 2023 • edited Loading

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

Panchovix commented Apr 13, 2023

aaaleeexTLC commented May 6, 2023

Panchovix commented Apr 12, 2023 •

edited

Loading