Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AOMP failure when compiling programs for NVIDIA GPUs #118

Closed
cdaley opened this issue Jul 19, 2020 · 5 comments
Closed

AOMP failure when compiling programs for NVIDIA GPUs #118

cdaley opened this issue Jul 19, 2020 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@cdaley
Copy link

cdaley commented Jul 19, 2020

Hi all,

I'm trying to build AOMP for the first time so that we can install it on NERSC machines. I'm using the tarball aomp-11.7-0.tar.gz and am targeting a x86 machine with NVIDIA Volta GPUs (https://docs-dev.nersc.gov/cgpu/). NVIDIA Volta GPUs seem to be supported by AOMP by using the script variables AOMP_BUILD_CUDA=1 and NVPTXGPUS=70. The installed clang compiler sucessfully builds and can be used to succesfully compile and execute test programs provided I do not add OpenMP target offload compiler options (-fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70). The compiler fails with a segmentation fault when using OpenMP target offload compiler options even in a serial hello world program. The appropriate bitcode file was installed at lib/libomptarget-nvptx-sm_70.bc. Can you help? The failure is shown below.

+ clang hello.c -o hello
+ srun -n 1 ./hello
hello
+ clang -fopenmp hello.c -o hello
+ srun -n 1 ./hello
hello
+ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70 hello.c -o hello
PLEASE submit a bug report to https://github.com/ROCm-Developer-Tools/aomp and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70 hello.c -o hello 
1.	Compilation construction
2.	Building compilation jobs
3.	Building compilation jobs
4.	Building compilation jobs
5.	Building compilation jobs
6.	Building compilation jobs
7.	Building compilation jobs
8.	Building compilation jobs
9.	Building compilation jobs
 #0 0x00000000020c854a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x20c854a)
 #1 0x00000000020c6444 llvm::sys::RunSignalHandlers() (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x20c6444)
 #2 0x00000000020c6583 SignalHandler(int) (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x20c6583)
 #3 0x00002aaaaace5360 __restore_rt (/lib64/libpthread.so.0+0x12360)
 #4 0x00000000027a3681 clang::driver::tools::AddStaticDeviceLibs(clang::driver::Compilation*, clang::driver::Tool const*, clang::driver::JobAction const*, llvm::SmallVector<clang::driver::InputInfo, 4u> const*, clang::driver::Driver const&, llvm::opt::ArgList const&, llvm::SmallVector<char const*, 16u>&, llvm::StringRef, llvm::StringRef, bool, bool) (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x27a3681)
 #5 0x00000000027a3fc6 clang::driver::tools::AddStaticDeviceLibs(clang::driver::Driver const&, llvm::opt::ArgList const&, llvm::SmallVector<char const*, 16u>&, llvm::StringRef, llvm::StringRef, bool, bool) (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x27a3fc6)
 #6 0x00000000027acc09 clang::driver::toolchains::CudaToolChain::addClangTargetOptions(llvm::opt::ArgList const&, llvm::SmallVector<char const*, 16u>&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x27acc09)
 #7 0x0000000002789042 clang::driver::tools::Clang::ConstructJob(clang::driver::Compilation&, clang::driver::JobAction const&, clang::driver::InputInfo const&, llvm::SmallVector<clang::driver::InputInfo, 4u> const&, llvm::opt::ArgList const&, char const*) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2789042)
 #8 0x0000000002731eca clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2731eca)
 #9 0x0000000002732e92 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2732e92)
#10 0x00000000027310e1 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x27310e1)
#11 0x0000000002732e92 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2732e92)
#12 0x00000000027310e1 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x27310e1)
#13 0x0000000002732e92 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2732e92)
#14 0x0000000002734be6 void llvm::function_ref<void (clang::driver::Action*, clang::driver::ToolChain const*, char const*)>::callback_fn<clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const::'lambda'(clang::driver::Action*, clang::driver::ToolChain const*, char const*)>(long, clang::driver::Action*, clang::driver::ToolChain const*, char const*) (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2734be6)
#15 0x000000000284602c clang::driver::OffloadAction::doOnEachDeviceDependence(llvm::function_ref<void (clang::driver::Action*, clang::driver::ToolChain const*, char const*)> const&) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x284602c)
#16 0x000000000273049a clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x273049a)
#17 0x0000000002732e92 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2732e92)
#18 0x00000000027310e1 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x27310e1)
#19 0x0000000002732e92 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2732e92)
#20 0x00000000027310e1 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x27310e1)
#21 0x0000000002732e92 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2732e92)
#22 0x00000000027310e1 clang::driver::Driver::BuildJobsForActionNoCache(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x27310e1)
#23 0x0000000002732e92 clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, llvm::StringRef, bool, bool, char const*, std::map<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, clang::driver::InputInfo, std::less<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<clang::driver::Action const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, clang::driver::InputInfo> > >&, clang::driver::Action::OffloadKind) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2732e92)
#24 0x000000000273317d clang::driver::Driver::BuildJobs(clang::driver::Compilation&) const (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x273317d)
#25 0x0000000002734861 clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0x2734861)
#26 0x0000000000b7dbba main (/global/project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0/bin/clang-11+0xb7dbba)
#27 0x00002aaaac03cf8a __libc_start_main (/lib64/libc.so.6+0x20f8a)
#28 0x0000000000bff4da _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:122:0
./test-aomp.sh: line 13: 41481 Segmentation fault      clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70 hello.c -o hello

I needed to fix up the AOMP build scripts to build the compiler, modifications shown below.

# Use the CUDA environment variable (do not evaluate here) to avoid the error:
# "head: cannot open '/usr/local/cuda/version.txt' for reading: No such file or directory
#  /tmp/aomp-build.ix16IS/aomp11/aomp/bin/aomp_common_vars: line 90: [: -ge: unary operator expected"
sed -i 's|/usr/local/cuda|${CUDA}|g' \
    aomp/bin/aomp_common_vars

# Comment out test directory to avoid the error:
# "CMake Error at CMakeLists.txt:548 (add_subdirectory):
#   add_subdirectory given source "test" which is not an existing directory."
# "CMake Error at CMakeLists.txt:1043 (add_subdirectory):
#   add_subdirectory given source "test" which is not an existing directory."
# "CMake Error at CMakeLists.txt:422 (add_subdirectory):
#   add_subdirectory given source "test" which is not an existing directory."
sed -i 's|add_subdirectory(test)|# add_subdirectory(test)|' \
    amd-llvm-project/compiler-rt/CMakeLists.txt \
    amd-llvm-project/llvm/CMakeLists.txt \
    flang/CMakeLists.txt

# Add --rocm-path to the compiler options to avoid the error:
# "clang-11: error: cannot find ROCm installation.
#  Provide its path via --rocm-path, or pass -nogpulib and -nogpuinc to build without ROCm device library and HIP includes."
sed -i "s|--cuda-device-only|--cuda-device-only --rocm-path=${AOMP}_${VERSION}|" \
    aomp-extras/aomp-device-libs/libm/CMakeLists.txt \
    aomp-extras/aomp-device-libs/aompextras/CMakeLists.txt
sed -i "s|-fcuda-rdc|-fcuda-rdc --rocm-path=${AOMP}_${VERSION}|" \
    amd-llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt \
    amd-llvm-project/openmp/libomptarget/deviceRTLs/hostcall/CMakeLists.txt
# There is a space character after -fopenmp in the following file
sed -i "s|-fopenmp[ ]*$|-fopenmp --rocm-path=${AOMP}_${VERSION}|" \
    amd-llvm-project/openmp/libomptarget/hostrpc/CMakeLists.txt

Thanks,
Chris

@ronlieb
Copy link
Contributor

ronlieb commented Jul 20, 2020 via email

@cdaley
Copy link
Author

cdaley commented Jul 20, 2020

It is here.

> ls -trl /project/projectdirs/m1759/csdaley/software/cgpu/aomp/11.7-0_11.7-0
total 73
drwxrwx---  4 csdaley csdaley  4096 Jul 19 11:44 hsa
drwxrwx---  2 csdaley csdaley  4096 Jul 19 12:03 libexec
drwxrwx---  3 csdaley csdaley  4096 Jul 19 12:06 amdgcn
drwxrwx---  3 csdaley csdaley  4096 Jul 19 12:08 lib-debug
drwxrwx--- 10 csdaley csdaley  4096 Jul 19 12:10 share
drwxr-xr-x  2 csdaley csdaley  4096 Jul 19 12:10 rocclr
drwxr-xr-x  3 csdaley csdaley  4096 Jul 19 12:10 cmake
drwxrwx---  2 csdaley csdaley 16384 Jul 19 12:11 bin
drwxr-xr-x 17 csdaley csdaley  4096 Jul 19 12:11 include
drwxrwx---  5 csdaley csdaley 16384 Jul 19 12:11 lib
drwxrwx---  3 csdaley csdaley  4096 Jul 19 12:11 lib64

I've attached the full output and my script.
aomp-build.txt
build-aomp-script.txt

Thanks,
Chris

@gregrodgers
Copy link
Contributor

@cdaley Hi Chris, we are doing some cleanup on issues. Can you verify this is failing for the latest release of AOMP 13.0-2?

@gregrodgers gregrodgers added the question Further information is requested label Apr 20, 2021
@gregrodgers gregrodgers self-assigned this Apr 26, 2021
@saipoorna
Copy link

Hi @cdaley : Did you get a chance to verify this issue with the latest release of AOMP 13.0-2?

@gregrodgers
Copy link
Contributor

Update, building for nvptx fails on release aomp_14.0-1. This is fixed in aomp_14.0-2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants