Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

augur merge #1563

Merged
merged 2 commits into from
Aug 15, 2024
Merged

augur merge #1563

merged 2 commits into from
Aug 15, 2024

Commits on Aug 15, 2024

  1. augur merge

    Support generalized merging of two or more metadata tables.  A long
    desired command.  Behaviour is based on much discussion with the team
    and bespoke implementations like ncov's combine_metadata.py.
    Implementation requirements include handling inputs of arbitrary size
    (i.e. without needing to read any dataset fully into memory) and
    handling more than two inputs.  SQLite is used in the implementation but
    could be replaced by another implementation in the future.
    
    One thing that's notable with this implementation is that it's stupidly
    slow for tiny datasets, e.g. a couple seconds.  That's due to Augur's
    own slow startup time and having to wait for that 2+n times, where n is
    the number of metadata tables being joined, plus once for the initial
    startup of `augur merge` and once more for writing the output.  On large
    datasets, this fixed startup time shouldn't matter, but on small
    datasets it feels really dumb.  Cutting out the additional startup times
    by cutting out the use of `augur read-file` and `augur write-file` makes
    it quite quick, as it should be.  I think we can live with this slowness
    for now, but if it turns out we can't, we can improve startup times or
    take a different approach to handling inputs.
    tsibley committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    4e2cf40 View commit details
    Browse the repository at this point in the history
  2. Always line-buffer stderr for consistency across Python versions

    The change in default from block-buffering in ≤3.8 to line-buffering in
    ≥3.9¹ made a Cram test output vary between Python versions (and thus
    fail).  I could fix the Cram test various ways, but always
    line-buffering stderr makes sense because we're exclusively using it for
    messaging/logging.
    
    ¹ <https://docs.python.org/3/whatsnew/3.9.html#sys>
    tsibley committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    b649f11 View commit details
    Browse the repository at this point in the history