Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not identifying CDS and removed or suppressed by NCBI #49

Open
tharis opened this issue Jun 10, 2024 · 0 comments
Open

Not identifying CDS and removed or suppressed by NCBI #49

tharis opened this issue Jun 10, 2024 · 0 comments

Comments

@tharis
Copy link

tharis commented Jun 10, 2024

Summary:

Finds matching genbank entry, tries using locus tag when it finds one, moves onto GN field after this and then does not find CDS feature.

Some sequences also say that NCBI has removed or suppressed the sequence

Description:

Please describe the issue as clearly as possible, taking as much space as you need.

Reproducible Steps:

using this code

ncfp --unify_seqid -v -s -l out.log in.fasta out.out email -d caches/new_ncfp -c ncfp_cache

will include examples of sequences that produce error

CDS_not_identified.txt

removed_or_suppressed.txt

Current Output:

[WARNING] [ncbi_cds_from_protein.scripts.ncfp]: No record found for sequence input tr|F2YIC4|F2YIC4_METMG/66-85 - please check this sequence manually
[WARNING] [ncbi_cds_from_protein.scripts.ncfp]: This record may have been removed from the NCBI database, or suppressed by NCBI

[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Sequence sp|P54145|AMT1_CAEEL/31-51 matches GenBank entry FO080371.2
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Extracting CDS by locus tag with AA query ID: ('C05E11.4',)
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Did not find feature with locus tag ('C05E11.4',), trying GN field
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Searching for CDS: amt-1
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Could not identify CDS feature for sp|P54145|AMT1_CAEEL/31-51

Expected Output:

To find the CDS feature at least for the ones where there is somesort of match, not really expecting anything from the suppressed sequences

ncfp Version:

merged my version on 07-06-24

Python Version:

3.9.18

Operating System:

mac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant