-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error -9 when training caffe-alexnet model #17
Comments
I'm not entirely sure, but my guess is that this is either your system running out of memory or a problem with it picking incorrectly between cpu vs. gpu. Could be NVIDIA/DIGITS#1402 ? |
Reproduced the error on my docker box. By default docker is allocating 2G memory for the pod on my Macbook, which is insufficient in this case. Seen from DIGITS dashboard, the training is eating up ~3G memory. For my case, increasing memory in docker preference panel works. Navigate through the docker whale icon -> preferences -> advanced -> memory, then increase accordingly. |
@ln3333 thanks for this, I've added a note and pushed it. Closing. I'm in the process of rewriting this for TensorFlow and TensorFlow.js right now in #14, so I think further debugging of DIGITS issues isn't necessary. |
The job run for about 2 mins, but when on process#60 its crashing with the following error.see image attached.
The text was updated successfully, but these errors were encountered: