Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when Multi-GPU is set #209

Open
therealjjj77 opened this issue Jan 26, 2021 · 2 comments
Open

Issue when Multi-GPU is set #209

therealjjj77 opened this issue Jan 26, 2021 · 2 comments

Comments

@therealjjj77
Copy link

I'm having this issue using Torch 1.7.1+cu110. Please see below:

(venv) C:\Users\Jerr\PycharmProjects\pythonProject1>stylegan2_pytorch --data C:/Transfer/Downloads/Processed/Compressed/Compressed --network-capacity 256 --trunc-psi 0.5 --aug-prob 0.25 --attn-layers 1 --top-k-training --generate-top-k
-frac 0.5 --generate-top-k-gamma 0.99 --no-pl-reg --calculate-fid-every 5000 --multi-gpus --num_workers 32
Traceback (most recent call last):
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\Jerr\PycharmProjects\pythonProject1\venv\Scripts\stylegan2_pytorch.exe_main.py", line 7, in
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\stylegan2_pytorch\cli.py", line 172, in main
fire.Fire(train_from_folder)
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\fire\core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\fire\core.py", line 468, in _Fire
target=component.name)
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\fire\core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\stylegan2_pytorch\cli.py", line 169, in train_from_folder
join=True)
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 157, in start_processes
while not context.join():
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 19, in _wrap
fn(i, *args)
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\stylegan2_pytorch\cli.py", line 39, in run_training
dist.init_process_group('nccl', rank=rank, world_size=world_size)
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group
init_method, rank, world_size, timeout=timeout
File "c:\users\jerr\pycharmprojects\pythonproject1\venv\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for env://
@therealjjj77
Copy link
Author

I'm running on Windows 10 Home, I have a Tesla K80(it's really two 12GB GPUs) and a GeForce RTX 2070 Super. I'm trying to run this on the Tesla K80. I have successfully tested that they work with Pytorch via the DataParallel method. So I'm not sure why multi-gpu isn't working for this.

@metaphorz
Copy link

Let me know if you found a solution since your post. I recently posted this on the Nvidia github. I am trying gpus=2 on a node with two V100s. gpus=1 works fine. gpus=2 on train.py fails with similar traceback errors to what you describe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants