Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi GPU 4-bit doesn't work on either ooba GPTQ or qwopqwop200 GPTQ #1112

Closed
1 task done
Panchovix opened this issue Apr 12, 2023 · 2 comments
Closed
1 task done
Labels
bug Something isn't working

Comments

@Panchovix
Copy link
Contributor

Panchovix commented Apr 12, 2023

Describe the bug

The issue consist that, while using any 4bit model like LLaMa, Alpaca, etc, 2 issues can happen depending of the version of GPTQ that you use while generating a message.

This happens on either newest or "older" (older with group size but not with the latest quant). For the older models, used the models from here #530 (comment)

For the new models, used the models from here: https://huggingface.co/Neko-Institute-of-Science

If using ooba GPTQ the issue is related to "TypeError: vecquant4matmul(): incompatible function arguments." and it generates just 1 token and it stops working. GPTQ build used is: https://github.com/oobabooga/GPTQ-for-LLaMa

If using qwopqwop200 GPTQ the issue is "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!". GPTQ build used is: https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Use any 4-bit model on 2 GPUs and the issue should happen on either ooba GTPQ or qwopqwop200 GTPQ.

(For example, python server.py --chat --extensions api --listen --wbits 4 --listen-port 7990 --gpu-memory 10 10 and then choose any 4-bit 30b model on the webui, or gpu mem 5 5 and any 4-bit 13b model)

Then, try to generate any message or impersonate, and the issues should arise.

Screenshot

For the ooba GPTQ, this is the issue.
image
image

For the qwopqwop200 GPTQ, this is the issue.
image

Logs

For ooba GPTQ:

Loading llama-30b-4bit...
Found the following quantized model: models\llama-30b-4bit\llama-30b-4bit.safetensors
Loading model ...
Done.
Using the following device map for the 4-bit model: {'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 0, 'model.layers.12': 0, 'model.layers.13': 0, 'model.layers.14': 0, 'model.layers.15': 0, 'model.layers.16': 0, 'model.layers.17': 0, 'model.layers.18': 0, 'model.layers.19': 0, 'model.layers.20': 0, 'model.layers.21': 0, 'model.layers.22': 0, 'model.layers.23': 0, 'model.layers.24': 0, 'model.layers.25': 0, 'model.layers.26': 0, 'model.layers.27': 0, 'model.layers.28': 0, 'model.layers.29': 0, 'model.layers.30': 0, 'model.layers.31': 0, 'model.layers.32': 0, 'model.layers.33': 0, 'model.layers.34': 0, 'model.layers.35': 1, 'model.layers.36': 1, 'model.layers.37': 1, 'model.layers.38': 1, 'model.layers.39': 1, 'model.layers.40': 1, 'model.layers.41': 1, 'model.layers.42': 1, 'model.layers.43': 1, 'model.layers.44': 1, 'model.layers.45': 1, 'model.layers.46': 1, 'model.layers.47': 1, 'model.layers.48': 1, 'model.layers.49': 1, 'model.layers.50': 1, 'model.layers.51': 1, 'model.layers.52': 1, 'model.layers.53': 1, 'model.layers.54': 1, 'model.layers.55': 1, 'model.layers.56': 1, 'model.layers.57': 1, 'model.layers.58': 1, 'model.layers.59': 1, 'model.norm': 1, 'lm_head': 1}
Replaced attention with sdp_attention
Loaded the model in 49.97 seconds.
Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\text_generation.py", line 245, in generate_with_callback
    shared.model.generate(**kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
    return self.sample(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
    outputs = self(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\llama_attn_hijack.py", line 122, in sdp_attention_forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 426, in forward
    quant_cuda.vecquant4matmul(x, self.qweight, y, self.scales, self.qzeros, self.groupsize)
TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: torch.Tensor) -> None

Invoked with: tensor([[-0.0400,  0.0132, -0.0013,  ..., -0.0043, -0.0217, -0.0052]],
       device='cuda:0'), tensor([[ 1718253433,  2005432169,  1234789529,  ..., -2018924424,
         -1502128569,  2037938296],
        [ 2019911271,  1987475319,  1750504568,  ..., -1736930890,
           965175170, -1465345654],
        [-1753778313, -2005497737, -1215801432,  ...,  2022066057,
          1183230325, -2020972614],
        ...,
        [ 1987605622,  2004317831,  1790801831,  ..., -1984522089,
          1935956105, -1986422154],
        [ 1719162504,  1987470983,   897218455,  ...,  2053601446,
         -1752841096,  1284090982],
        [ 1736931431,  2022209399,  2017958294,  ...,  2037999496,
         -2023323559, -1231513656]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0'), tensor([[0.0111, 0.0150, 0.0077,  ..., 0.0194, 0.0119, 0.0131]],
       device='cuda:0'), tensor([[ 2004252518,  2022139750,  1735874423,  1986491768,  1987470950,
          2003199623,  2003269238, -2039060634,  1987536487,  1734760022,
          1719105128,  1986418310,  1717986936,  1717986934,  1986422903,
          1734768487,  2003199863,  1719039607,  1717991031,  1734833783,
          1986488183,  2004317815,  1986483815,  1717003893,  2003265127,
          1734833766,  1717991030,  2003199590,  1718052455,  1484224118,
          1986356598,  1987536503,  1734825590, -2055837833,  1986488422,
          1720153959,  1717991270,  1734829925,  1986487925,  1717987191,
          1734825575,  1734768214,  1734838134,  1735878263,  1735882616,
          2004252263,  1986492278,  1717991031,  2004248438,  1718122887,
          1718122343,  1734825862,  1718060647,  1701209958,  1734829926,
          1734768486,  1988585063, -2021161097,  1986483560,  2004186487,
          1969710711,  1703310966,  1719035766,  1734829942, -2039978395,
          1986488438,  2004318071,  1701217910,  2004318086,  1718056807,
          2003265126,  1718056551,  1987470471,  1720088165,  1718052710,
          1751603063,  1719036008,  1719035750,  1735747175,  2004248423,
          2020046695,  2021025654,  2003203958,  1986492279, -2022341017,
         -1770555545,  1986496103,  1986426471,  2003129719,  2019981190,
          1985378151,  1987606151, -2040043930,  2021095013,  1986426742,
          1734829670,  1734829926,  2021029736,  1986487927,  1734833766,
          1718056567,  1717991287,  1987475302,  1717925494,  2004317574,
          1718052471,  1734698870, -2005437577,  1718052710,  1987471190,
          1987540838, -2040105098,  2004252278,  1719035750,  2003199863,
          2003265398,  1735812709,  1986487910,  1717986919,  2004317814,
          1734764406,  1735882343,  1736930950,  1717007990,  1467512678,
          1734768487,  1986426470,  1734768246,  1734764134,  1734833766,
          1987475302,  2003203943,  1701340550,  1987540598,  1717991030,
          1719039591,  2021095286,  1699182182,  1735816822, -2054723977,
          1719035510,  1719044199,  1986488182,  1734764406,  1735882598,
          1987405416,  1717921638,  1734829942,  1752590455, -2039060889,
          1987471207,  1719039847,  1720088167,  1987475047,  1734768742,
          2003265382,  1718052695,  1719035767,  1719100790,  2004248422,
          1986488166,  2003265142,  1735812982,  1719039863,  2004252534,
          1719035767,  1987536486,  1986488182,  1986430823, -2022217866,
          1986426759,  1986426471,  2004252518,  1735816808,  1719101285,
          1735816823,  1987536759,  1987536486,  2005296743,  1735812727,
          1734834023,  1970693495,  1734768743,  2003269494,  1735878502,
          1719039591,  2003269478,  1717991014,  1719109494,  1735878519,
          1987541095,  2021029734,  2004313703,  1702323575,  2004317799,
          1986492006,  2003203942,  1986426487,  1718056550,  1735878502,
          1717986662,  1988654967,  1717003894,  2004313702,  1986487927,
          1717991286,  1717991271,  1719039846,  1751611238,  1734698854,
          2004317798,  1717986937,  1449617254,  1716938598,  1717008246,
          1484154726, -2024376713,  2004182904,  1733785462,  1735817575,
          1466267495,  1986426486, -2040044168,  2019980901,  1719039590,
          1734764647,  1719101014,  1985443431, -2021234841,  1751610981,
          2004313447,  2003269495, -2039056537,  1717003880,  1986365063,
          1701209720,  1753708661,  1987536247,  1734764391,  1734834038,
          1733785190,  2004252535,  2003269223,  1987409526,  2003268965,
          1987409768,  1717991015,  1734698870,  1717987174,  1987475302,
          1987536487,  1734768261,  1717991030,  1734829686,  1986422375,
         -2040039833,  1717991031,  1734772583, -2024310666,  1717991271,
          2020046199,  1988589191,  2004186999,  1716942438,  1720083832,
         -2023328155,  1734694504,  1718978390, -2038995338,  2003203670,
          1987475062,  2003199847,  1734895222,  1735878247,  2003195750,
         -2039978122,  2004317814,  1735947879,  1719039847,  1717987191,
         -2055833736,  1719101063,  1719105127,  1968662118,  1986422391,
          1987475064,  1986422630,  1986422391,  2003199590, -2040039786,
         -2037950635,  1752721270,  1718056583,  1702328165,  1735878519,
          1986487910, -2038925720,  1986426741,  1969650023,  2005292885,
          1986488166,  2003269239,  1986492262,  1718052727,  1986492006,
          1969710694,  1735817063, -2024245418,  2003330663,  2004248167,
          1718056823,  1751541606,  1734768503,  2003334791,  1450665846,
          1985513592,  1718118504,  1734768470,  2004248167,  1717986935,
          1718052486,  2004252279,  1733781095,  1735882598,  1735812470,
          2002216549,  2003203687,  1987475046,  2004239734,  1451648631,
          1987540840,  1735817335,  1735812950,  1718970231,  1719105144,
          1735817079, -2040109465,  1466263430, -2023266698,  2003269479,
         -2022349192,  1719105143,  1718052727,  1986422391,  1987471223,
          1988515670,  1735882358,  1734834054,  1969579895,  1751545462,
          1987536742,  2021095015,  2005366646,  2002151013,  1717921399,
          1986553447,  1987475047,  2004248167,  1986422390,  1719039862,
          1735882615,  1987540854,  1969772408,  1467451254,  1719105126,
          1987471206,  2003203702,  1719101062,  2004248167, -2023266698,
          2004186999,  1718970230,  1719105127,  1717991287,  2004313958,
         -2022214281,  1987532407,  1734834007,  1988519528,  1733719671,
          2020046167,  1718056789,  1717986935, -2023332233,  1735878246,
          2003269752,  1752659318,  1736996711,  2004317799,  1717982838,
          2005297015,  1717986934,  2003203958,  1719039846,  1986426213,
          1717004134,  1719031654,  1987536247,  1719039606,  1717991271,
          2004248166,  2003269495,  1719101287,  1719043959,  1483044439,
          1717995366,  1719097222,  1752659317,  1987602279,  1735882615,
          1969715303,  1734833798,  1985369719,  1734764408,  2003269478,
          1735878247,  1717987175,  1987475318,  1735878263,  2002155110,
          1735948375,  1717991270,  2004317798,  1734829655,  1987536487,
          1987536742,  1986492023,  1970763366,  1750562679,  2004248167,
          1735882583,  1735882342,  1720149606,  1717987174,  1717987175,
          1719035767,  1986422630,  1987471206,  2005366645,  1718052470,
          2004313959,  1719035510,  1719035494,  1969710712, -2040043640,
          1735944040,  2004313960,  1986426727,  2005231223,  1719166823,
          1735817063,  1734833800,  2003199878,  1988589430,  1720088182,
          2003273573,  1483106166,  1717986918,  1734768503,  2003203959,
          1735878263,  1752655735,  1751610999,  1735882359,  2003134311,
         -2023327882,  1735816791,  1986422134,  2004318054,  1733781350,
          1987470950,  1718052455,  1734698631,  1482053206,  1987536519,
         -2040043658,  2003265399,  1734698598,  1719031399,  1736865654,
          2019976823,  1718052454,  1734833783,  2019980919,  1988589413,
          1751611255,  1734834279,  1751545719,  1969641335,  1718122344,
          1468495975,  1987606103,  1701279590,  1986492278,  1986492022,
          1735817079,  1717987190,  1719043686,  1987409766,  1985443704,
          1986492023,  2003203703,  1719039863,  1987471207,  1987540599,
          1719035765,  1986422375,  2004383334,  2004318311,  1735817061,
          1986492007,  1719105382,  1733847142,  2001172087,  2005362279,
          1734833783,  1717986662,  1719105398,  1987536503,  1719039847,
          1718052486,  2004252278,  1970697832,  1717990758,  1986422647,
          1734764392,  2021025654,  1734768230,  1467447414,  1734768519,
          2004314214,  1986356839,  1987536758,  1734825573,  1735878246,
          1734768230,  1718052455,  1986431094,  1719035510,  2004313703,
         -2039056778,  2004248391,  1987470966,  2019981174,  1719039606,
          1734833782,  1986487911,  1734760039,  1734764406,  2004248423,
          1735816805,  2004318071,  1987470967,  1736861303,  1717986935,
          1716938342,  1752590215,  1702258534,  1717986919,  2003269479,
          1986487911,  1449424503,  1701209942,  1717991030,  1719035495,
          1985439319,  1968662374,  2018997862,  1735878519,  1987536503,
          1988646757,  1987471479,  2004321911, -2055838105,  1987536500,
          1735882598,  1987610744,  1719101287,  1986487910, -2023262345,
          1734768230,  1717073512,  1734833784,  1735812983,  2003269479,
          2004256390,  1988589431,  1751475831,  1986492007,  1986492007,
          1735878247,  2003269494,  1736791926, -2007595385,  1436047495,
          1733785431,  1717912952,  1986492261,  2003330935,  1734829687,
          1467446902,  2004190821,  1702192520,  2017748358, -2023454602,
          1719105127,  1735878246,  2004248167,  1449686631,  1733719670,
          2003269462,  1734825832,  1733781110,  1717007974,  1986422647,
          1719035510,  2003203702,  2002220903,  1450665830,  1988580983,
          1735812966,  1719105144,  1735878503,  1735882359,  1987471223,
          2019977062,  2019977062,  1970759781,  1719040103,  2004313975,
          2003269222,  1986487926,  1717003878,  1719101303,  1968662373,
          1986492007,  1986426486,  2004252504,  1734899574,  1986426742,
         -2022344858,  1735874150,  1701209703,  1717069416,  1718060630,
          1735816806,  1719101030,  1734833767,  1719039591,  1987536486,
          1719105143,  2004317782,  1433826917,  1734767990, -2040105081,
          1733785445,  1987537014,  1467508600,  1987475303,  1719039591,
          1752659574,  1717991287,  1717986935,  2004248423,  1986492262,
          2004186727,  1734829670,  1985439607,  1734764152,  2004248150,
          1988519542,  1987471206,  1719039608,  1719035767,  1719039846,
          1718056823,  1987540582,  1987475062,  2004318087,  1734764407,
         -2022340743,  1717986950,  1719101287,  1717986919,  2003203703,
          1987540582,  1986422647,  1734764150,  1717991287,  1735812966,
          1752589671,  1987536742,  2004313959,  1987471192,  1717921637,
          1717991271,  1718052455,  1986430567,  2004313719,  1700230774,
          2002220391,  1735882599,  2019976808,  1735812983,  1986553462,
          1752659831,  2004187015,  1719105398,  2002216822,  2003265383,
          1467373175,  1735817062,  1718056821,  1734838119,  1735882615,
          1718056806,  1987536487,  1735882870,  1987536502,  2004248182,
          1988523607, -2022274985,  1734698854,  1467442805,  1985312631,
          1718052454,  2004248679,  1718056823,  1718052743,  1733658486,
          1986487912,  1735817063,  1449485942,  1987540327,  2003203943,
          1987475046,  1719101046,  2004244327,  2003269493,  1719166309,
          1718052439,  1733715575,  1735878246,  1717987191,  1720084101,
          1753708150,  1719035750,  1719039624,  2003265127,  1988589175,
          1986492277,  2003269254,  2004318327,  1987606390,  1987536758,
          2020038775,  1449551496,  1986553478,  1719105414,  1734834023,
          1986422647,  1987540598,  1735935592,  1970763637, -2022274985,
          1735878774,  1987536759,  1734764135,  1987602294,  1734764151,
          1734764135,  2005366903,  1719109735,  2020042359,  1735812983,
          2004252247,  2003339126,  1987536486,  1718052455,  1987536758,
          2003265143,  1986492279,  1969714790,  2003203943,  1987532406,
          2004252536,  2004252278,  1449555830,  2003273334,  2003199590,
          1987540599,  1987540342,  1483040887,  1987409527,  1733781078,
          1719101287,  1734829686,  2003265126,  1717991286,  1719101287,
          2004318055,  1719035766,  1987536743,  1717991286,  1735882343,
          1735878264,  2004314231,  1702323830,  1987475575,  1734768231,
          1985377911,  1734768247,  1717987191,  2004318070,  2003199862,
          1467446886,  1701275480,  1986422375,  1735882343,  2004313959,
          1701283461,  1986426743,  1987475063,  1717991303,  1752594295,
          2004317814,  1734838119,  1735817335,  1986492263,  1969706853,
          1970767462,  1987471223]], device='cuda:0', dtype=torch.int32), 6656
Output generated in 2.68 seconds (0.37 tokens/s, 1 tokens, context 1172, seed 1812379680)

For qwopqwop200 GPTQ:

Loading alpaca-13b-lora-int4-128g...
Found the following quantized model: models\alpaca-13b-lora-int4-128g\4bit-128g.safetensors
Loading model ...
F:\ChatIAs\oobabooga\venv\lib\site-packages\safetensors\torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
Done.
Using the following device map for the 4-bit model: {'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 0, 'model.layers.12': 0, 'model.layers.13': 0, 'model.layers.14': 0, 'model.layers.15': 0, 'model.layers.16': 0, 'model.layers.17': 0, 'model.layers.18': 0, 'model.layers.19': 0, 'model.layers.20': 0, 'model.layers.21': 0, 'model.layers.22': 0, 'model.layers.23': 0, 'model.layers.24': 0, 'model.layers.25': 0, 'model.layers.26': 0, 'model.layers.27': 1, 'model.layers.28': 1, 'model.layers.29': 1, 'model.layers.30': 1, 'model.layers.31': 1, 'model.layers.32': 1, 'model.layers.33': 1, 'model.layers.34': 1, 'model.layers.35': 1, 'model.layers.36': 1, 'model.layers.37': 1, 'model.layers.38': 1, 'model.layers.39': 1, 'model.norm': 1, 'lm_head': 1}
Replaced attention with sdp_attention
Loaded the model in 4.13 seconds.
Traceback (most recent call last):
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\text_generation.py", line 245, in generate_with_callback
    shared.model.generate(**kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
    return self.sample(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
    outputs = self(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 204, in forward
    query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 138, in apply_rotary_pos_emb
    q_embed = (q * cos) + (rotate_half(q) * sin)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Exception in thread Thread-4 (gentask):
Traceback (most recent call last):
  File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\callbacks.py", line 73, in gentask
    clear_torch_cache()
  File "F:\ChatIAs\oobabooga\text-generation-webui\modules\callbacks.py", line 105, in clear_torch_cache
    torch.cuda.empty_cache()
  File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\cuda\memory.py", line 137, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

System Info

OS: Windows 11 and Ubuntu 22.04
CPU: Ryzen 7 5800X
RAM: 64 DDR4
Swap: 170GB swap
GPU: RTX 4090x2
@Panchovix Panchovix added the bug Something isn't working label Apr 12, 2023
@Panchovix
Copy link
Contributor Author

Fixed the issue by downgrading from torch+cu118 to torch+cu117

@aaaleeexTLC
Copy link

Hi there, how did you managed to specify the GPU you want to use? I know is something related to "Device: cuda:0" or "Device='cuda:1'", but where do you define that value? or do you pass it as parameter??

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants