-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rc vs 778 spike import limit #8197
Conversation
9782194
to
c1dbb93
Compare
1. We do NOT want to assume that the sample ids we want are in the name field. Pass that through as a parameter. 2. We want to explicitly pause every 500 samples, as that's our page size. It slows our requests down enough to not spam the backend server and hit 503 errors, although it does slow down the rate at which we can write the files if the dataset is too big. Which shouldn't be a concern, because as long as it doesn't cause errors it is still a hands off process. 3. We want to account to heterogenous data. In AoU Delta, for instance, the control samples keep their vcf and vcf_index data in a different field. This would cause the whole thing to fail if we weren't accounting for that explicitly, and now we generate an errors.txt file that will hold the row that we couldn't find the correct columns for so they can be examined later
… data table and being slightly more informative in the output of the python script
…ficiency (and handling larger callsets)
c1dbb93
to
dc7d07c
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## ah_var_store #8197 +/- ##
================================================
Coverage ? 83.979%
Complexity ? 34803
================================================
Files ? 2194
Lines ? 167039
Branches ? 18005
================================================
Hits ? 140278
Misses ? 20534
Partials ? 6227 |
Github actions tests reported job failures from actions build 4358183003
|
62795cf
to
15c0966
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass of comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass
|
||
>>> | ||
runtime { | ||
docker: "us.gcr.io/broad-dsde-methods/variantstore:2023-1-20-FOFN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISO 8601 nit, names should be like YYYY-MM-DD
, i.e. always two digits for month and day. Nice for sorting things chronologically and lexically at the same time. 🙂
The goal of this PR is to adjust the ingest in two ways:
There is still work to do around making the bulk ingest process significantly more user-friendly