-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
need to break on allocating fab requests when using inject #1338
Comments
The OpenSHMEM developer says the problem can be illustrated with 1 PE and takes about 2-3 minutes to hit OOM depending on how slow your current OFI libfabric provider is and process memory limits. |
I think for this type of scenario we can just have the GNI provider internally step on the brake and harvest GNI TX CQEs and free up requests. |
@hppritcha Can you verify that we can close this? |
yes this can be closed. |
We need to have a heuristic for throttling allocation of fab requests.
The following simple OpenSHMEM code will show the problem:
you will get killed by OOM. Note the problem is artificial with the while(1) loop, but for a loop with sufficiently big iteration could, you'll eventually get zapped by OOM. The test is using the inject path through the provider. For the inject path, we should definitely try to brake on the number of requests allocated since the app is never going to turn around to read off CQEs to recover them.
@bcernohous you may want to check this with your OpenSHMEM implementation. We observed this using the sandia openshmem (SOS).
The text was updated successfully, but these errors were encountered: