Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: api.get - robuster fetcher algorithm (allows S3 download on DL repos) and better error messages #10

Merged
merged 3 commits into from
Mar 13, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:
git config --global user.email "email@domain.com"

- run:
name: Run tests (w/o datalad)
name: Run tests (w/o DataLad)
environment:
TEMPLATEFLOW_USE_DATALAD: 0
TEMPLATEFLOW_HOME: "/tmp/data/templateflow"
Expand All @@ -69,7 +69,7 @@ jobs:
pytest -vsx --doctest-modules /tmp/src/templateflow/templateflow

- run:
name: Run tests (w/ datalad)
name: Run tests (w/ DataLad)
environment:
TEMPLATEFLOW_USE_DATALAD: 1
command: |
Expand All @@ -78,6 +78,20 @@ jobs:
virtualenv venv
pytest -vsx --doctest-modules /tmp/src/templateflow/templateflow

- run:
name: Run tests (w/ DataLad, bypassed via S3)
environment:
TEMPLATEFLOW_USE_DATALAD: 1
TEMPLATEFLOW_HOME: /home/circleci/.cache/templateflow-init
command: |
pyenv global 3.5.2
virtualenv venv
cd /tmp/src/templateflow
pip install -e .
python -c "from templateflow import api"
export TEMPLATEFLOW_USE_DATALAD=0
pytest -vsx --doctest-modules /tmp/src/templateflow/templateflow

- run:
name: Test packaging
command: |
Expand Down
43 changes: 32 additions & 11 deletions templateflow/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,21 +25,39 @@ def get(template, **kwargs):
out_file = [Path(p) for p in TF_LAYOUT.get(
template=template, return_type='file', **kwargs)]

# Try plain URL fetch first
for filepath in [p for p in out_file
if p.is_file() and p.stat().st_size == 0]:
_s3_get(filepath)

if TF_USE_DATALAD:
for filepath in [p for p in out_file if not p.is_file()]:
# Try DataLad first
dl_missing = [p for p in out_file if not p.is_file()]
if TF_USE_DATALAD and dl_missing:
for filepath in dl_missing:
_datalad_get(filepath)
dl_missing.remove(filepath)

# Fall-back to S3 if some files are still missing
s3_missing = [p for p in out_file
if p.is_file() and p.stat().st_size == 0]
for filepath in s3_missing + dl_missing:
_s3_get(filepath)

not_fetched = [p for p in out_file
not_fetched = [str(p) for p in out_file
if not p.is_file() or p.stat().st_size == 0]

if any(not_fetched):
raise RuntimeError(
"Could not fetch template files: %s" % ', '.join(not_fetched))
if not_fetched:
msg = "Could not fetch template files: %s." % ', '.join(not_fetched)
if dl_missing and not TF_USE_DATALAD:
msg += """\
The $TEMPLATEFLOW_HOME folder %s seems to contain an initiated DataLad \
dataset, but the environment variable $TEMPLATEFLOW_USE_DATALAD is not \
set or set to one of (false, off, 0). Please set $TEMPLATEFLOW_USE_DATALAD \
on (possible values: true, on, 1).""" % TF_LAYOUT.root

if s3_missing and TF_USE_DATALAD:
msg += """\
The $TEMPLATEFLOW_HOME folder %s seems to contain an plain \
dataset, but the environment variable $TEMPLATEFLOW_USE_DATALAD is \
set to one of (true, on, 1). Please set $TEMPLATEFLOW_USE_DATALAD \
off (possible values: false, off, 0).""" % TF_LAYOUT.root

raise RuntimeError(msg)

if len(out_file) == 1:
return out_file[0]
Expand Down Expand Up @@ -115,6 +133,9 @@ def _s3_get(filepath):
total_size = int(r.headers.get('content-length', 0))
block_size = 1024
wrote = 0
if not filepath.is_file():
filepath.unlink()

with filepath.open('wb') as f:
for data in tqdm(r.iter_content(block_size),
total=ceil(total_size // block_size),
Expand Down