Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory access fault executing on multiple devices #310

Closed
sardonicpresence opened this issue Feb 11, 2021 · 2 comments
Closed

Memory access fault executing on multiple devices #310

sardonicpresence opened this issue Feb 11, 2021 · 2 comments
Assignees

Comments

@sardonicpresence
Copy link

What is the expected behavior

  • I can execute FFTs on multiple devices, with a separate plan per-device, without error.

What actually happens

  • If I execute FFTs on more than 1 device I get a memory access fault:
rocFFT: 1.0.8.966-23-2d35fd6  HIP driver: 321200  HIP runtime: 3212
Planning size 32 FFT on 2 devices...
Executing FFTs...
Memory access fault by GPU node-2 (Agent handle: 0x1c4c990) on address 0x2ac52d60d000. Reason: Page not present or supervisor privilege.
Aborted
  • If I restrict visibility to any single device using HIP_VISIBLE_DEVICES I get no such error:
rocFFT: 1.0.8.966-23-2d35fd6  HIP driver: 321200  HIP runtime: 3212
Planning size 32 FFT on 1 devices...
Executing FFTs...
Done.

How to reproduce

#include <iostream>
#include <hip/hip_runtime.h>
#include <hip/hip_complex.h>
#include <rocfft.h>

using namespace std;

int main() {
    rocfft_setup();

    int devices = 0;
    hipGetDeviceCount(&devices);

    char rocfft_version[80];
    rocfft_get_version_string(rocfft_version, 80);
    int driver_version, runtime_version;
    hipDriverGetVersion(&driver_version);
    hipRuntimeGetVersion(&runtime_version);
    cout << "rocFFT: " << rocfft_version
       << "  HIP driver: " << driver_version
       << "  HIP runtime: " << runtime_version << endl;

    size_t n = 32, nf = n / 2 + 1;
    size_t in_bytes = sizeof(float) * n;
    size_t out_bytes = sizeof(hipComplex) * nf;

    cout << "Planning size " << n << " FFT on " << devices << " devices..." << endl;
    void * d_in[devices], * d_out[devices], * d_work[devices];
    rocfft_plan plan[devices];
    rocfft_execution_info info[devices];
    for (int i = 0; i < devices; ++i) {
        hipSetDevice(i);
        hipMalloc(&d_in[i], in_bytes);
        hipMalloc(&d_out[i], out_bytes);
        rocfft_plan_create(&plan[i], rocfft_placement_notinplace,
            rocfft_transform_type_real_forward, rocfft_precision_single,
            1, &n, 1, NULL);

        size_t work_bytes;
        rocfft_execution_info_create(&info[i]);
        rocfft_plan_get_work_buffer_size(plan[i], &work_bytes);
        if(work_bytes > 0) {
            hipMalloc(&d_work[i], work_bytes);
            rocfft_execution_info_set_work_buffer(info[i], d_work[i], work_bytes);
        }
    }

    cout << "Executing FFTs..." << endl;                                             
    for (int i = 0; i < devices; ++i) {                                              
        hipSetDevice(i);                                                             
        rocfft_execute(plan[i], &d_in[i], &d_out[i], info[i]);                       
    }                                                                                

    for (int i = 0; i < devices; ++i) {
        hipSetDevice(i);
        hipDeviceSynchronize();
    }

    cout << "Done." << endl;                                                         

    return 0;                                                                        
}                                                                                    

Environment

Hardware description
GPU Vega 20
CPU AMD EPYC 7352
Software version
ROCm v4.0.0
rocFFT v1.0.8
@evetsso evetsso self-assigned this Feb 11, 2021
@evetsso
Copy link
Contributor

evetsso commented Feb 11, 2021

Hi @sardonicpresence , thanks for this bug report and the handy reproducer. I've been able to reproduce this issue. Indeed, the plan system is not correctly handling multiple devices.

We'll try to get a fix in to the next release. In the meantime, I don't think there's any easy workaround - you could have one process per device instead of trying to execute plans in multiple devices in one process. Alternatively, if you could somehow ensure that no two devices in the same process will run identical plans (i.e. give different dimensions, directions, transform types to each device), that would also work around the problem.

@evetsso evetsso closed this as completed Feb 26, 2021
@evetsso
Copy link
Contributor

evetsso commented Feb 26, 2021

9374754 fixes this in the develop branch. It should be included in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants