Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

umccrise: support S3 outputs #95

Merged
merged 10 commits into from
Sep 13, 2023
Merged

umccrise: support S3 outputs #95

merged 10 commits into from
Sep 13, 2023

Conversation

pdiakumis
Copy link
Member

Adding a bit of a walkthrough for parsing S3 results via presigned URLs, focusing on umccrise here.

  • s3_files_list_filter_relevant: this is equivalent to gds_files_list_filter_relevant but it takes an S3 object directory as input instead of GDS. Surprisingly aws s3 ls --output json cannot generate JSON outputs (bad form from AWS here - Unable to get JSON output from aws s3 ls command aws/aws-cli#709) so had to go with aws --output json s3api list-objects-v2.

  • s3_file_presignedurl: generates presigned URL for the given S3 object via aws s3 presign.

  • s3_search: uses the portal API to search for the given file pattern under s3://umccr-primary-data-prod, e.g. s3_search("multiqc_data.json"), and returns the results in a tidy tibble.

    • There are alternatives to using the portal API; the boto3 Athena client is pretty cool but the outputs are a bit too messy for my liking, so I prefer RAthena which uses boto3 under the hood.
  • the umccrise multi-sample reporter template now generates interactive plots for signature contributions, HRD across CHORD and HRDetect, and summarises the summary table from the cancer report across all samples (449 successful umccrise workflows with the required results on S3).

@pdiakumis pdiakumis merged commit b90cd73 into main Sep 13, 2023
1 check passed
@pdiakumis pdiakumis deleted the hrd_sig_report branch September 13, 2023 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant