Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXP: work with RankLineageInfo and sourmash tax #2603

Open
wants to merge 8 commits into
base: latest
Choose a base branch
from
Open

Conversation

ctb
Copy link
Contributor

@ctb ctb commented May 2, 2023

I got interested in the refactoring of tuples of LineagePair that happened in #2437 and #2443 and thought I'd write some docs!

See taxonomic-lineages-examples.md on this branch for the potential new docs page.

In the process, I found a few things that didn't make sense to me, so I tried filling them in. Specifically,

  • the str and repr for BaseLineageInfo etc now show lineage_str;
  • lineage_str is now filled in by the init code for all initializers;
  • BaseLineageInfo, RankLineageInfo, and LINLineageInfo no longer take positional arguments. This prevents certain kinds of bad things happening, like lin = RankLineageInfo(tuple_of_lineage_info) from working and assigning tuples to ranks, for one not-so-hypothetical situation!

TODO/to test;

  • lineage_str being empty <=> RankLineageInfo(lineage_str='') works.
  • discuss/describe NCBI taxids, zip_taxid, display_taxid
  • describe zip_lineage
  • document/test is_compatible, is_lineage_match

@codecov
Copy link

codecov bot commented May 2, 2023

Codecov Report

Merging #2603 (7680540) into latest (7311eb5) will increase coverage by 7.38%.
The diff coverage is 84.74%.

@@            Coverage Diff             @@
##           latest    #2603      +/-   ##
==========================================
+ Coverage   85.27%   92.65%   +7.38%     
==========================================
  Files         133      104      -29     
  Lines       15174    12435    -2739     
  Branches     2612     2398     -214     
==========================================
- Hits        12939    11522    -1417     
+ Misses       1934      609    -1325     
- Partials      301      304       +3     
Flag Coverage Δ
hypothesis-py 27.18% <64.40%> (+1.37%) ⬆️
python 92.67% <84.74%> (-0.17%) ⬇️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/utils.py 76.54% <70.83%> (-2.41%) ⬇️
src/sourmash/tax/tax_utils.py 97.90% <94.28%> (-0.19%) ⬇️

... and 34 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@bluegenes
Copy link
Contributor

bluegenes commented May 4, 2023

for easy reference:

Q from me as you're looking through the code - is there a recommended strategy for further abstracting BaseLineageInfo that would help with building an automated testing framework for it and all other LineageInfo classes, as you did with the Index classes?

@ctb
Copy link
Contributor Author

ctb commented May 4, 2023

Items for discussion:

  1. Why does taxdb[accession] not return a RankLineageInfo object?
    (And/or should it?)

  2. what's the difference in use case for lineage_at_rank and pop_to_rank? How does this differ from pop_to_rank? It appears that it just gives you the tuple of LineagePairs rather than giving you a new RankLineageInfo object.

>>> lineage = RankLineageInfo(lineage_str='d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli')
>>> lineage.pop_to_rank('class')
RankLineageInfo(lineage_str='d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria')
>>> lineage.lineage_at_rank('class')
(LineagePair(rank='superkingdom', name='d__Bacteria'), LineagePair(rank='phylum', name='p__Proteobacteria'), LineagePair(rank='class', name='c__Gammaproteobacteria'))

pop_to_rank is used twice in the code base, lineage_at_rank is
used once. Suggest picking one, or making a new method and using
that - maybe lineage_to_rank, which returns a RankLineageInfo
object?

  1. suggest changing check_rank_availability

check_rank_availability is all about raising an exception, it seems -
no caller checks its return value. Is it OK to remove the return value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants