Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container build fails for 3D-UNet-99 #78

Closed
WarrenSchultz opened this issue Jun 19, 2024 · 2 comments
Closed

Container build fails for 3D-UNet-99 #78

WarrenSchultz opened this issue Jun 19, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@WarrenSchultz
Copy link

WarrenSchultz commented Jun 19, 2024

Running the command for ResNet50 works correctly:
cm run script --tags=run-mlperf,inference,_performance-only,_full --division=open --category=edge --device=cuda --model=resnet50 --precision=float32 --implementation=nvidia --backend=tensorrt --scenario=Offline --execution_mode=valid --power=no --adr.python.version_min=3.8 --clean --compliance=no --quiet --time --docker --docker_cache=no

But 3d-unet-99 fails
cm run script --tags=run-mlperf,inference,_performance-only,_full --division=open --category=edge --device=cuda --model=3d-unet-99 --precision=float32 --implementation=nvidia --backend=tensorrt --scenario=Offline --execution_mode=valid --power=no --adr.python.version_min=3.8 --clean --compliance=no --quiet --time --docker --docker_cache=no

Error log:
`Loading TensorRT plugin from build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 176, in handle
total_engine_build_time += self.build_engine(job)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 159, in build_engine
builder = get_benchmark(job.config)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 83, in get_benchmark
cls = get_cls(G_BENCHMARK_CLASS_MAP[benchmark])
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 66, in get_cls
return getattr(import_module(module_loc.module_path), module_loc.cls_name)
File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 848, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/3d-unet/tensorrt/3d-unet.py", line 25, in
import onnx
ModuleNotFoundError: No module named 'onnx'
[2024-06-19 10:30:07,499 generate_engines.py:173 INFO] Building engines for 3d-unet benchmark in Offline scenario...
Loading TensorRT plugin from build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so
Loading TensorRT plugin from build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so
Loading TensorRT plugin from build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 176, in handle
total_engine_build_time += self.build_engine(job)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 159, in build_engine
builder = get_benchmark(job.config)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 83, in get_benchmark
cls = get_cls(G_BENCHMARK_CLASS_MAP[benchmark])
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 66, in get_cls
return getattr(import_module(module_loc.module_path), module_loc.cls_name)
File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 848, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/3d-unet/tensorrt/3d-unet.py", line 25, in
import onnx
ModuleNotFoundError: No module named 'onnx'
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/main.py", line 231, in
main(main_args, DETECTED_SYSTEM)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/main.py", line 144, in main
dispatch_action(main_args, config_dict, workload_setting)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/main.py", line 202, in dispatch_action
handler.run()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 82, in run
self.handle_failure()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 186, in handle_failure
self.action_handler.handle_failure()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 184, in handle_failure
raise RuntimeError("Building engines failed!")
RuntimeError: Building engines failed!
make: *** [Makefile:37: generate_engines] Error 1

CM error: Portable CM script failed (name = app-mlperf-inference-nvidia, return code = 256)`

However, running 3d-unet-99 within the container built for ResNet50 works correctly.

@arjunsuresh arjunsuresh added the bug Something isn't working label Jun 20, 2024
@arjunsuresh arjunsuresh self-assigned this Jun 20, 2024
@arjunsuresh
Copy link
Contributor

Thanks for reporting this. The problem should be fixed now. We typically launch one docker image for nvidia implementation and run all the benchmarks there - so missed this issue for 3d-unet.

@WarrenSchultz
Copy link
Author

Seems to be working now, thanks!

arjunsuresh added a commit that referenced this issue Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants