Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No flaA gene in blast results in version 1.18.0 #17

Closed
Kincekara opened this issue Jul 29, 2024 · 2 comments
Closed

No flaA gene in blast results in version 1.18.0 #17

Kincekara opened this issue Jul 29, 2024 · 2 comments

Comments

@Kincekara
Copy link

Hi,
I faced a problem while I was testing the version 1.18.0. el_gato cannot find ST since it cannot find the flaA gene in the blast. Tested genome is GCF_900119765.1

[07/26/2024 02:04:19 PM | test/ ]  Running command: blastn -query GCF_900119765.1_2532STDY5467631_genomic.fna -db /usr/local/bin/db/all_loci.fasta -outfmt '6 std qlen slen'
[07/26/2024 02:04:19 PM | test/ ]  Running blast
[07/26/2024 02:04:20 PM | test/ ]  Finished running blast
[07/26/2024 02:04:20 PM | test/ ]  Command log for blast:
qseqid         sseqid        pident   length  mismatch  gapopen  qstart   qend     sstart  send  evalue     bitscore  qlen     slen  
NZ_LT632614.1  pilE_10       100.000  333     0         0        721006   721338   333     1     2.48e-174  616       3530817  333   
NZ_LT632614.1  asd_3         100.000  473     0         0        2650534  2651006  473     1     0.0        874       3530817  473   
NZ_LT632614.1  mip_15        100.000  402     0         0        941537   941938   1       402   0.0        743       3530817  402   
NZ_LT632614.1  proA_1        100.000  405     0         0        568377   568781   1       405   0.0        749       3530817  405   
NZ_LT632614.1  neuA_neuAH_6  100.000  354     0         0        898051   898404   1       354   0.0        654       3530817  354   
               

[07/26/2024 02:04:20 PM | test/ ]  Finished running blast
[07/26/2024 02:04:20 PM | test/ ]  The following loci were not found in your assembly: flaA

[07/26/2024 02:04:20 PM | test/ ]  Finished analysis
[07/26/2024 02:04:20 PM | test/ ]  Output = 
GCF_900119765.1_2532STDY5467631_genomic MD-     -       10      3       15      18      1       6
@Alan-Collins
Copy link
Contributor

Thanks for raising this! Looks like this was a BLAST issue relating to the size of the SBT database after we updated it with new sequences. We're now right on the threshold where the max_target_seqs setting of blastn causes issues and I didn't catch it in prerelease testing.

You can see the issue if you blast the assembly you are using against the sbt database:

$ blastn -subject db/all_loci.fasta -query GCF_900119765.1_2532STDY5467631_genomic.fna -outfmt 6 | grep -c fla
0
$ blastn -subject db/all_loci.fasta -query GCF_900119765.1_2532STDY5467631_genomic.fna -outfmt 6 -max_target_seqs 50000 | grep -c fla
44

The default for max_target_seqs is 500. It looks like we get all the hits if we bump it to 545. I've increased it to 50,000 to future-proof against future DB updates and custom DBs.

I pushed a fix to the main branch which should update on bioconda soon.

@Kincekara
Copy link
Author

Thank you very much. It looks like version 1.18.2 works as expected.

[07/30/2024 03:14:17 PM | test/ ]  Running command: blastn -query GCF_900119765.1_2532STDY5467631_genomic.fna -db /usr/local/bin/db/all_loci.fasta -outfmt '6 std qlen slen' -max_target_seqs 50000
[07/30/2024 03:14:17 PM | test/ ]  Running blast
[07/30/2024 03:14:18 PM | test/ ]  Finished running blast
[07/30/2024 03:14:18 PM | test/ ]  Command log for blast:
qseqid         sseqid        pident   length  mismatch  gapopen  qstart   qend     sstart  send  evalue     bitscore  qlen     slen  
NZ_LT632614.1  flaA_8        100.000  182     0         0        1498635  1498816  182     1     2.16e-90   337       3530817  182   
NZ_LT632614.1  pilE_10       100.000  333     0         0        721006   721338   333     1     2.48e-174  616       3530817  333   
NZ_LT632614.1  asd_3         100.000  473     0         0        2650534  2651006  473     1     0.0        874       3530817  473   
NZ_LT632614.1  mip_15        100.000  402     0         0        941537   941938   1       402   0.0        743       3530817  402   
NZ_LT632614.1  proA_1        100.000  405     0         0        568377   568781   1       405   0.0        749       3530817  405   
NZ_LT632614.1  neuA_neuAH_6  100.000  354     0         0        898051   898404   1       354   0.0        654       3530817  354   
               

[07/30/2024 03:14:18 PM | test/ ]  Finished running blast
[07/30/2024 03:14:18 PM | test/ ]  Finished analysis
[07/30/2024 03:14:18 PM | test/ ]  Output = 
GCF_900119765.1_2532STDY5467631_genomic 62      8       10      3       15      18      1       6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants