Skip to content

Set rm_files to be a synchronous method #503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion adlfs/spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -1248,7 +1248,7 @@ async def _rm_files(
for file in file_paths:
self.invalidate_cache(self._parent(file))

sync_wrapper(_rm_files)
rm_files = sync_wrapper(_rm_files)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is something that I missed as well when we went over the original GitHub feature request... Based on the fsspec specification there is no rm_files() shared interface; there's only an rm_file(). The original GitHub issue: #497 is also only requesting for rm_file().

So, it is not appropriate to be setting a sync wrapper like this because it does not appear rm_files to be a shared interface across file systems. Instead, it would probably make sense to add an async _rm_file that is a simplified wrapper over the _rm implementation to implement the feature request.

Even more interesting, it seems there used to be a _rm_file() implementation prior to this PR: #383 and because the underlying AsyncFileSystem mirrors methods, I suspect that adlfs might have actually at one point supported rm_file() and could be a regression. It would be great to confirm if adlfs ever supported rm_file() in a version prior to that PR.


async def _separate_directory_markers_for_non_empty_directories(
self, file_paths: typing.Iterable[str]
Expand Down
69 changes: 69 additions & 0 deletions adlfs/tests/test_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
import numpy as np
import pandas as pd
import pytest
from azure.core.exceptions import ResourceNotFoundError
from azure.storage.blob.aio import BlobServiceClient as AIOBlobServiceClient
from packaging.version import parse as parse_version
from pandas.testing import assert_frame_equal

Expand Down Expand Up @@ -2045,3 +2047,70 @@ def test_open_file_x(storage: azure.storage.blob.BlobServiceClient, tmpdir):
with fs.open("data/afile", "xb") as f:
pass
assert fs.cat_file("data/afile") == b"data"


def test_rm_files(storage):
fs = AzureBlobFileSystem(
account_name=storage.account_name,
connection_string=CONN_STR,
)
file_list = [
"top_file.txt",
"root/a/file.txt",
"root/a1/file1.txt",
]

fs.rm_files("data", file_list)
for file in file_list:
with pytest.raises(FileNotFoundError):
fs.ls(f"data/{file}")


def test_rm_files_nonempty_directory_marker(storage):
fs = AzureBlobFileSystem(
account_name=storage.account_name,
connection_string=CONN_STR,
)

with pytest.raises(ResourceNotFoundError):
fs.rm_files("data", ["root/a/"])

assert fs.ls("data/root/a/") == ["data/root/a/file.txt"]


def test_rm_files_delete_directory_markers(storage, mocker):
mock_container = mocker.AsyncMock()
mock_container.delete_blob = mocker.AsyncMock(return_value=None)
mock_get_container_client = mocker.AsyncMock()
mock_get_container_client.__aenter__.return_value = mock_container
mock_get_container_client.__aexit__.return_value = None
mocker.patch.object(
AIOBlobServiceClient,
"get_container_client",
return_value=mock_get_container_client,
)
fs = AzureBlobFileSystem(
account_name=storage.account_name,
connection_string=CONN_STR,
)

files = [blob.name for blob in storage.get_container_client("data").list_blobs()]
directory_markers = [
"root/a/",
"root/a1/",
"root/b/",
"root/c/",
"root/d/",
"root/e+f/",
]

mocker.patch.object(
fs,
"_separate_directory_markers_for_non_empty_directories",
return_value=(files, directory_markers),
)

fs.rm_files("data", files)
expected_calls = [mocker.call(dir) for dir in reversed(directory_markers)]
actual_calls = mock_container.delete_blob.call_args_list[-len(directory_markers) :]
assert actual_calls == expected_calls