-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate seqr project data #593
Conversation
) | ||
samples.append(sample) | ||
|
||
for stype in generate_sequencing_type(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main function seems a bit too long.
I would suggest to wrap those 2 most inner loops in a function (for stype: for _ ). Will be easier to follow.
Maybe even the whole samples list creation can be refactored as a function.
Otherwise looking good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback Milo. Agreed that the main function was too long, I've moved the samples
list creation into its own function and added some better comments which hopefully make it easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd maybe clean it up a bit more. Break down the main function so it just makes function calls. Each commented block should be its own function. It just makes things so much easier to interpret.
I suspect this will be used (and potentially modified) a lot, so it's worth setting is up well if people come back and want to make significant changes later on.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #593 +/- ##
=======================================
Coverage 71.39% 71.39%
=======================================
Files 116 116
Lines 9283 9283
=======================================
Hits 6628 6628
Misses 2655 2655 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main function refactor
Thanks for the feedback @violetbrina. Now across only 4 lines in
I also changed the analyses to initialise as an empty list prior to the project iteration, and then insert in chunks at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me!
Developed this script to help with testing some endpoints being worked on that compile stats from seqr projects.
It largely adapts test/data/generate_data.py, but instead of generating a single project using the pedigree file from the same folder, it generates a dozen projects, populates them with randomly generated pedigrees filled with randomly generated family and participant IDs, and relationships. Each project is then bulked out with random numbers of samples, sequencing groups, assays, and analyses.
It also randomly allocates a subset of the sequencing groups as "aligned" and creates completed
CRAM
analyses for these. A subset of the aligned sequencing groups are allocated as the "joint-called" sequencing groups, and anAnnotateDataset
custom
analysis +es-index
analysis are created containing these sequencing groups.