Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault #3

Closed
mhalber opened this issue Feb 18, 2019 · 2 comments
Closed

Segmentation fault #3

mhalber opened this issue Feb 18, 2019 · 2 comments

Comments

@mhalber
Copy link

mhalber commented Feb 18, 2019

Hi,
Thank you for providing this code!
I have successfully run the script to prepare the data for Scannet, however when attempting to run the training, I am sadly running into a segfault.

The console output before crash:

keyname=instance_normal_augment_2 task=train started
the number of images val 20
the number of images train 1201
the number of images 1201

Through some print statement abuse, I've managed to see that the code seems to be breaking in
function forward( self, coords, faces, colors, instances), file models/instance.py, at line 199

Python, gcc, torch, cuda versions:
Python - 3.7.2
torch - 1.0.0
cuda - 9.0.176
I am attempting to run the code on a system with Tesla K40c, with 12GB of memory

I'd greatly appreciate help in trying to figure out what is going wrong.

Thanks!

@chenliu-wustl
Copy link
Collaborator

Could you please check the value range of all_coords (all_coords.min(0) and all_coords.max(0)). The all_coords should have a shape of Nx4 and all_coords.min(0)[:3] should be greater than 0, all_coords.max(0)[:3] should be smaller than 4096 and all_coords.min(0)[3] = all_coords.max(0)[3] = 0.

@mhalber
Copy link
Author

mhalber commented Feb 19, 2019

Hi - thank you for your reply.

Turns out the fault has been a bit on my side - I think the issue has been due to the python version mismatch. SparseConvNet github page mentions the use of python 3.6.8, so I've switched to that version of python. Additionally, I've noticed mismatch between nvcc version and cuda version in torch on my computer.

After these two changes, the network seems to be training without issues.

I think it would be nice if README.md mentioned the required CUDA/python versions, as without SparseConvNet page I'd be lost.

Anyway, thanks again for the help and I will close the issue.

@mhalber mhalber closed this as completed Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants