Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance and Error Issues with VEP v111 Docker Container #1681

Open
Ananya-swi opened this issue May 23, 2024 · 4 comments
Open

Performance and Error Issues with VEP v111 Docker Container #1681

Ananya-swi opened this issue May 23, 2024 · 4 comments
Assignees

Comments

@Ananya-swi
Copy link

Hi,

We are currently using the VEP v111 Docker container to annotate VCF files. However, we are facing the following issues:

  1. Performance Issue: Annotation of a very small sample VCF file takes an unexpectedly long time, even after adjusting the buffer size (50-10000) and fork parameters (12 and 8).
  2. Used Compute Size is 32vcpu and 64 Gb Ram
  3. Error with --fork Option: Using the --fork option alone results in an error. A screenshot of the error is attached for reference.

image

Previously we were using v106.1. Which is taking only 7 minutes to complete the same file. We also tried with v109.3, which leads to same error.

We would appreciate any guidance or suggestions to resolve these issues. Looking forward to your reply. Thank you in advance for your assistance.

Regards,
Ananya Saji
Data Engineer (Bioinformatics)
Semantic Web Tech Pvt. Ltd.

@nuno-agostinho
Copy link
Contributor

Hi Ananya,

Hope you are having a nice day. Sorry for the inconvenience.

Could you show the command that you are using to run VEP? Thank you.

Cheers,
Nuno

@Ananya-swi
Copy link
Author

Ananya-swi commented May 24, 2024

Hi Nuno,

Thank you for your response.

We conducted testing using ARGO.

  • Workflow Engine: ARGO
  • Workflow Environment: Azure Kubernetes Service
  • Configuration:
    • 8 vCPUs and 32 GB RAM (Assigned to vcf files <25 MB in size).
    • 16 vCPUs and 64 GB RAM (Assigned to vcf files >25 MB in size).

VEP Version Used: 111 & 109.3

Considering our dependency on AKS and ARGO, our capabilities are constrained to effectively utilizing 5 vCPUs and 25 GB of RAM for configurations with 8 vCPUs and 32 GB RAM, and 13 vCPUs and 55 GB of RAM for configurations with 16 vCPUs and 64 GB RAM.

Here is the command we are using to run VEP:

vep \
--cache --refseq  \
--CACHE_VERSION 109 \
--dir_plugins /opt/vep/.vep/Plugins \
--no_stats \
-i "/home/admin/test/Test.vcf.gz" \
-o "/home/admin/test/Test.txt" \
--symbol --hgvs --hgvsg --variant_class --gene_phenotype \
--flag_pick_allele_gene --canonical --appris --ccds --numbers --total_length --mane \
--sift p --polyphen p \
--fasta  /opt/vep/.vep/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz \
--species homo_sapiens --assembly GRCh37 \
--af --af_gnomad \
--no_escape \
--plugin SpliceAI,snv=/opt/vep/.vep/Grch37/spliceai_scores.raw.snv.hg19.vcf.gz,indel=/opt/vep/.vep/Grch37/spliceai_scores.raw.indel.hg19.vcf.gz \
--plugin NMD \
--dir_plugins /opt/vep/.vep/Plugins \
--plugin dbNSFP,/opt/vep/.vep/Grch37/dbNSFP4.5a_grch37.gz,PROVEAN_pred,LRT_pred,MutationTaster_pred,\
MutationAssessor_pred,FATHMM_pred,fathmm-MKL_coding_pred,M-CAP_pred,fathmm-XF_coding_pred,\
DANN_score,MutPred_score,PrimateAI_pred,Aloft_pred,BayesDel_addAF_pred,LIST-S2_pred,\
MVP_score,Eigen-phred_coding,SiPhy_29way_logOdds,bStatistic,Interpro_domain,MetaLR_pred,\
GTEx_V8_gene,GTEx_V8_tissue,VEST4_score,REVEL_score,AlphaMissense_score \
--offline --tab --fork 5 --force_overwrite ;

In addition, we utilized nine custom files and the plugins pLI, CADD, and dbscSNV, for our annotation.

We encountered the same error with VEP versions 109.3 and 111, whereas VEP version 106.1 completed the annotation in 7 minutes for the same file.

Looking forward to your assistance. Thank you.

Best regards,
Ananya

@nuno-agostinho
Copy link
Contributor

Hi @Ananya-swi,

Thanks for sending more information. You seem to be using VEP as expected, so I am not really sure why it is taking so much time.

Some ideas/questions about the performance issues:

  • May be related with a slow plugin. You could try disabling all plugins just to check how much time it takes to run.
  • Do you have many and/or long SVs in the input? Maybe there are variants that were not being parsed in previous VEP versions that are now. That could affect VEP's runtime.
  • What do you mean with "using --fork option alone"? Can you send me an command illustrating this?

Looking forward to your reply.

Best,
Nuno

@Ananya-swi
Copy link
Author

Hi @nuno-agostinho ,

Thanks for your response.

I've tried the method you suggested, removing the plugins, but I am still encountering the same issue. When I use --buffer_size 50 --fork 8, the script runs but takes a long time to complete. The VCF file used as input only contains SNVs.

  • The vcf file which we used as input only containing SNVs.

Explanation of Using --fork Alone:

I used the VEP command without the --buffer_size flag. Here is the command:

vep \
--cache --refseq  \
--CACHE_VERSION 109 \
--dir_plugins /opt/vep/.vep/Plugins \
--no_stats \
-i "/home/admin/test/Test.vcf.gz" \
-o "/home/admin/test/Test.txt" \
--symbol --hgvs --hgvsg --variant_class --gene_phenotype \
--flag_pick_allele_gene --canonical --appris --ccds --numbers --total_length --mane \
--sift p --polyphen p \
--fasta  /opt/vep/.vep/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz \
--species homo_sapiens --assembly GRCh37 \
--af --af_gnomad \
--no_escape \
--plugin SpliceAI,snv=/opt/vep/.vep/Grch37/spliceai_scores.raw.snv.hg19.vcf.gz,indel=/opt/vep/.vep/Grch37/spliceai_scores.raw.indel.hg19.vcf.gz \
--plugin NMD \
--dir_plugins /opt/vep/.vep/Plugins \
--plugin dbNSFP,/opt/vep/.vep/Grch37/dbNSFP4.5a_grch37.gz,PROVEAN_pred,LRT_pred,MutationTaster_pred,\
MutationAssessor_pred,FATHMM_pred,fathmm-MKL_coding_pred,M-CAP_pred,fathmm-XF_coding_pred,\
DANN_score,MutPred_score,PrimateAI_pred,Aloft_pred,BayesDel_addAF_pred,LIST-S2_pred,\
MVP_score,Eigen-phred_coding,SiPhy_29way_logOdds,bStatistic,Interpro_domain,MetaLR_pred,\
GTEx_V8_gene,GTEx_V8_tissue,VEST4_score,REVEL_score,AlphaMissense_score \
--offline --tab --fork 5 --force_overwrite ;
 Using this command, I received the following error:

image

Using --buffer_size
When I added the --buffer_size 50 flag, the script ran but took a long time to execute. Here is the command I used:

vep \
--cache --refseq  \
--CACHE_VERSION 109 \
--dir_plugins /opt/vep/.vep/Plugins \
--no_stats \
-i "/home/admin/test/Test.vcf.gz" \
-o "/home/admin/test/Test.txt" \
--symbol --hgvs --hgvsg --variant_class --gene_phenotype \
--flag_pick_allele_gene --canonical --appris --ccds --numbers --total_length --mane \
--sift p --polyphen p \
--fasta  /opt/vep/.vep/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz \
--species homo_sapiens --assembly GRCh37 \
--af --af_gnomad \
--no_escape \
--plugin SpliceAI,snv=/opt/vep/.vep/Grch37/spliceai_scores.raw.snv.hg19.vcf.gz,indel=/opt/vep/.vep/Grch37/spliceai_scores.raw.indel.hg19.vcf.gz \
--plugin NMD \
--dir_plugins /opt/vep/.vep/Plugins \
--plugin dbNSFP,/opt/vep/.vep/Grch37/dbNSFP4.5a_grch37.gz,PROVEAN_pred,LRT_pred,MutationTaster_pred,\
MutationAssessor_pred,FATHMM_pred,fathmm-MKL_coding_pred,M-CAP_pred,fathmm-XF_coding_pred,\
DANN_score,MutPred_score,PrimateAI_pred,Aloft_pred,BayesDel_addAF_pred,LIST-S2_pred,\
MVP_score,Eigen-phred_coding,SiPhy_29way_logOdds,bStatistic,Interpro_domain,MetaLR_pred,\
GTEx_V8_gene,GTEx_V8_tissue,VEST4_score,REVEL_score,AlphaMissense_score \
--offline --tab --buffer_size 50 --fork 5 --force_overwrite ;

Despite following the suggestions, the issue persists. The script runs with a smaller buffer size but takes a significantly longer time to complete. It appears that higher buffer sizes and fork counts lead to process communication issues.

I tried these commands with VEP versions 111 and 109.3, and the same error occurs. However, when using versions 106.0 or 106.1, it works without any issues.

Could you provide further insights or additional configurations that might help resolve this problem?

I look forward to your guidance on this issue.

Thanks,
Ananya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants