Skip to content

[Prototyping] Using rclone lsjson for all searches #530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 57 commits into
base: main
Choose a base branch
from

Conversation

JoeZiminski
Copy link
Member

@JoeZiminski JoeZiminski commented Jun 21, 2025

superseded by #551

This PR is for prototyping the new way of searching files / folders introduced in #407. The main commit here is this one, all the others are from PR #208 which this was branched from for extended testing.

Generally this way is better because we can have one function for all use cases. I tried for a long time to play with RClones `--includeorfilter`` arguments, but I could not get reliable behaviour across folders and files.

For example, if at a search path we have:

sub-001/
  some_files.txt
sub-002/
other_folder/
sub-001.txt

then --include with any search string (e.g. sub-*) would include all folders no matter what. The only want to avoid this was to suffix the search with a /i..esub-*/`. So the search string would be different between files and folders. I may have missed something here, but I think it reflects that rlcone is more built for handling files directly, and the search functions etc. behave more naturally when transferring files but not folderes.

The solution here is to just grab everything from lsjson and then parse it in Python. A benefit is it is more flexible and interpretable. A downside is it might be slower. However, as we are just performing 1-folder level of search (i.e. non-recursive) it should never be too bad as there are unlikely to be tens of thousands of files / folders in a single directory.

Currently the search_for_folders() is set up for testing, but essentially it could be something as simple as:

      config_name = cfg.get_rclone_config_name(cfg["connection_method"]) if local_or_central == "central" else None
      
      all_folder_names, all_filenames = search_gdrive_or_aws_for_folders(  # this func would be renamed
                search_path,
                search_prefix,
                config_name ,
                return_full_path,
        )

@cs7-shrey I think I can make a PR to switch SSH and local filesystem to this method (after a bit more work on it). Then you can use it directory from your AWS/Google drive PR?

@JoeZiminski JoeZiminski changed the title Prototyping using rclone lsjson for all searches [Prototyping] Using rclone lsjson for all searches Jun 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant