train.py throws multiprocessing errors #15

agilebean · 2021-01-02T11:35:04Z

Executing train.py in Google Colabo throws two kinds of errors from multiprocessing.
Here, it would be helpful to know whether these errors invalidate the results or are just informative.

Command executed:

!python train.py /content/sync/data \
--dataset miniimagenet \
--num-ways 5 \
--num-shots 1 \
--step-size 0.1 \
--batch-size 4 \
--num-batches 16 \
--num-epochs 50 \
--num-workers 8 \
--output-folder /content/sync/output \
--use-cuda \
--verbose

ERROR 1:

DEBUG:root:Creating folder `/content/sync/output/2021-01-02_113055`
INFO:root:Saving configuration file in `/content/sync/output/2021-01-02_113055/config.json`
Epoch 1 : 100% 16/16 [00:02<00:00,  5.71it/s, accuracy=0.2315, loss=5.4706]
Epoch 2 : 100% 16/16 [00:02<00:00,  5.67it/s, accuracy=0.2563, loss=3.1795]
Epoch 3 : 100% 16/16 [00:02<00:00,  5.60it/s, accuracy=0.2383, loss=2.8871]
Epoch 4 : 100% 16/16 [00:02<00:00,  5.72it/s, accuracy=0.2448, loss=2.7525]
Training: 100% 16/16 [00:03<00:00,  5.94it/s, accuracy=0.2433, loss=2.1583]Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 240, in _feed
    send_bytes(obj)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

ERROR 2:

Epoch 26: 100% 8/8 [00:01<00:00,  5.01it/s, accuracy=0.2888, loss=1.7093]
Epoch 27: 100% 8/8 [00:01<00:00,  5.07it/s, accuracy=0.2454, loss=1.7756]
Training: 100% 8/8 [00:01<00:00,  5.28it/s, accuracy=0.3267, loss=1.6709]Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 240, in _feed
    send_bytes(obj)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)

The text was updated successfully, but these errors were encountered:

tristandeleu · 2021-01-03T15:18:42Z

Torchmeta is using PyTorch's DataLoader under the hood for data-loading, so this multiprocessing error must come from PyTorch's DataLoader.
I don't know what could cause this issue though unfortunately, but it might be due to Google Colab and how they handle multiprocessing. It might also be related using the synced folder, as in #14.
One solution to prevent this error would be to use --num-workers 0, but that will slow down the data-loading part.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train.py throws multiprocessing errors #15

train.py throws multiprocessing errors #15

agilebean commented Jan 2, 2021

tristandeleu commented Jan 3, 2021

train.py throws multiprocessing errors #15

train.py throws multiprocessing errors #15

Comments

agilebean commented Jan 2, 2021

tristandeleu commented Jan 3, 2021