need to break on allocating fab requests when using inject #1338

hppritcha · 2017-04-25T20:55:37Z

We need to have a heuristic for throttling allocation of fab requests.

The following simple OpenSHMEM code will show the problem:

#include <shmem.h>

long long val = 0;

int main(int argc, char **argv) {
    int i;

    shmem_init();

    const int pe = shmem_my_pe();
    const int npes = shmem_n_pes();

    while (1) {
        shmem_longlong_add(&val, 1, (pe + 1) % npes);
    }

    shmem_finalize();
    return 0;
}

you will get killed by OOM. Note the problem is artificial with the while(1) loop, but for a loop with sufficiently big iteration could, you'll eventually get zapped by OOM. The test is using the inject path through the provider. For the inject path, we should definitely try to brake on the number of requests allocated since the app is never going to turn around to read off CQEs to recover them.

@bcernohous you may want to check this with your OpenSHMEM implementation. We observed this using the sandia openshmem (SOS).

hppritcha · 2017-04-25T20:57:18Z

The OpenSHMEM developer says the problem can be illustrated with 1 PE and takes about 2-3 minutes to hit OOM depending on how slow your current OFI libfabric provider is and process memory limits.

bcernohous · 2017-04-25T21:36:26Z

I assume a -EAGAIN when you run out of resources. I then complete all pending requests and retry/continue.

I also have an env that lets me set a nbi block size and I quiet after pending operations.

I brought this up in issues #1285 #1199

hppritcha · 2017-04-25T22:17:37Z

I think for this type of scenario we can just have the GNI provider internally step on the brake and harvest GNI TX CQEs and free up requests.

jswaro · 2017-08-01T20:41:04Z

@hppritcha Can you verify that we can close this?

hppritcha · 2017-08-31T15:30:22Z

yes this can be closed.

hppritcha added the bug label Apr 25, 2017

hppritcha closed this as completed Aug 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need to break on allocating fab requests when using inject #1338

need to break on allocating fab requests when using inject #1338

hppritcha commented Apr 25, 2017

hppritcha commented Apr 25, 2017

bcernohous commented Apr 25, 2017

hppritcha commented Apr 25, 2017

jswaro commented Aug 1, 2017

hppritcha commented Aug 31, 2017

need to break on allocating fab requests when using inject #1338

need to break on allocating fab requests when using inject #1338

Comments

hppritcha commented Apr 25, 2017

hppritcha commented Apr 25, 2017

bcernohous commented Apr 25, 2017

hppritcha commented Apr 25, 2017

jswaro commented Aug 1, 2017

hppritcha commented Aug 31, 2017