Performance and Error Issues with VEP v111 Docker Container #1681

Ananya-swi · 2024-05-23T14:54:44Z

Hi,

We are currently using the VEP v111 Docker container to annotate VCF files. However, we are facing the following issues:

Performance Issue: Annotation of a very small sample VCF file takes an unexpectedly long time, even after adjusting the buffer size (50-10000) and fork parameters (12 and 8).
Used Compute Size is 32vcpu and 64 Gb Ram
Error with --fork Option: Using the --fork option alone results in an error. A screenshot of the error is attached for reference.

Previously we were using v106.1. Which is taking only 7 minutes to complete the same file. We also tried with v109.3, which leads to same error.

We would appreciate any guidance or suggestions to resolve these issues. Looking forward to your reply. Thank you in advance for your assistance.

Regards,
Ananya Saji
Data Engineer (Bioinformatics)
Semantic Web Tech Pvt. Ltd.

nuno-agostinho · 2024-05-23T16:58:51Z

Hi Ananya,

Hope you are having a nice day. Sorry for the inconvenience.

Could you show the command that you are using to run VEP? Thank you.

Cheers,
Nuno

Ananya-swi · 2024-05-24T04:59:06Z

Hi Nuno,

Thank you for your response.

We conducted testing using ARGO.

Workflow Engine: ARGO
Workflow Environment: Azure Kubernetes Service
Configuration:
- 8 vCPUs and 32 GB RAM (Assigned to vcf files <25 MB in size).
- 16 vCPUs and 64 GB RAM (Assigned to vcf files >25 MB in size).

VEP Version Used: 111 & 109.3

Considering our dependency on AKS and ARGO, our capabilities are constrained to effectively utilizing 5 vCPUs and 25 GB of RAM for configurations with 8 vCPUs and 32 GB RAM, and 13 vCPUs and 55 GB of RAM for configurations with 16 vCPUs and 64 GB RAM.

Here is the command we are using to run VEP:

vep \
--cache --refseq  \
--CACHE_VERSION 109 \
--dir_plugins /opt/vep/.vep/Plugins \
--no_stats \
-i "/home/admin/test/Test.vcf.gz" \
-o "/home/admin/test/Test.txt" \
--symbol --hgvs --hgvsg --variant_class --gene_phenotype \
--flag_pick_allele_gene --canonical --appris --ccds --numbers --total_length --mane \
--sift p --polyphen p \
--fasta  /opt/vep/.vep/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz \
--species homo_sapiens --assembly GRCh37 \
--af --af_gnomad \
--no_escape \
--plugin SpliceAI,snv=/opt/vep/.vep/Grch37/spliceai_scores.raw.snv.hg19.vcf.gz,indel=/opt/vep/.vep/Grch37/spliceai_scores.raw.indel.hg19.vcf.gz \
--plugin NMD \
--dir_plugins /opt/vep/.vep/Plugins \
--plugin dbNSFP,/opt/vep/.vep/Grch37/dbNSFP4.5a_grch37.gz,PROVEAN_pred,LRT_pred,MutationTaster_pred,\
MutationAssessor_pred,FATHMM_pred,fathmm-MKL_coding_pred,M-CAP_pred,fathmm-XF_coding_pred,\
DANN_score,MutPred_score,PrimateAI_pred,Aloft_pred,BayesDel_addAF_pred,LIST-S2_pred,\
MVP_score,Eigen-phred_coding,SiPhy_29way_logOdds,bStatistic,Interpro_domain,MetaLR_pred,\
GTEx_V8_gene,GTEx_V8_tissue,VEST4_score,REVEL_score,AlphaMissense_score \
--offline --tab --fork 5 --force_overwrite ;

In addition, we utilized nine custom files and the plugins pLI, CADD, and dbscSNV, for our annotation.

We encountered the same error with VEP versions 109.3 and 111, whereas VEP version 106.1 completed the annotation in 7 minutes for the same file.

Looking forward to your assistance. Thank you.

Best regards,
Ananya

nuno-agostinho · 2024-05-24T14:12:12Z

Hi @Ananya-swi,

Thanks for sending more information. You seem to be using VEP as expected, so I am not really sure why it is taking so much time.

Some ideas/questions about the performance issues:

May be related with a slow plugin. You could try disabling all plugins just to check how much time it takes to run.
Do you have many and/or long SVs in the input? Maybe there are variants that were not being parsed in previous VEP versions that are now. That could affect VEP's runtime.
What do you mean with "using --fork option alone"? Can you send me an command illustrating this?

Looking forward to your reply.

Best,
Nuno

Ananya-swi · 2024-05-24T17:11:51Z

Hi @nuno-agostinho ,

Thanks for your response.

I've tried the method you suggested, removing the plugins, but I am still encountering the same issue. When I use --buffer_size 50 --fork 8, the script runs but takes a long time to complete. The VCF file used as input only contains SNVs.

The vcf file which we used as input only containing SNVs.

Explanation of Using --fork Alone:

I used the VEP command without the --buffer_size flag. Here is the command:

vep \
--cache --refseq  \
--CACHE_VERSION 109 \
--dir_plugins /opt/vep/.vep/Plugins \
--no_stats \
-i "/home/admin/test/Test.vcf.gz" \
-o "/home/admin/test/Test.txt" \
--symbol --hgvs --hgvsg --variant_class --gene_phenotype \
--flag_pick_allele_gene --canonical --appris --ccds --numbers --total_length --mane \
--sift p --polyphen p \
--fasta  /opt/vep/.vep/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz \
--species homo_sapiens --assembly GRCh37 \
--af --af_gnomad \
--no_escape \
--plugin SpliceAI,snv=/opt/vep/.vep/Grch37/spliceai_scores.raw.snv.hg19.vcf.gz,indel=/opt/vep/.vep/Grch37/spliceai_scores.raw.indel.hg19.vcf.gz \
--plugin NMD \
--dir_plugins /opt/vep/.vep/Plugins \
--plugin dbNSFP,/opt/vep/.vep/Grch37/dbNSFP4.5a_grch37.gz,PROVEAN_pred,LRT_pred,MutationTaster_pred,\
MutationAssessor_pred,FATHMM_pred,fathmm-MKL_coding_pred,M-CAP_pred,fathmm-XF_coding_pred,\
DANN_score,MutPred_score,PrimateAI_pred,Aloft_pred,BayesDel_addAF_pred,LIST-S2_pred,\
MVP_score,Eigen-phred_coding,SiPhy_29way_logOdds,bStatistic,Interpro_domain,MetaLR_pred,\
GTEx_V8_gene,GTEx_V8_tissue,VEST4_score,REVEL_score,AlphaMissense_score \
--offline --tab --fork 5 --force_overwrite ;

 Using this command, I received the following error:

Using --buffer_size
When I added the --buffer_size 50 flag, the script ran but took a long time to execute. Here is the command I used:

vep \
--cache --refseq  \
--CACHE_VERSION 109 \
--dir_plugins /opt/vep/.vep/Plugins \
--no_stats \
-i "/home/admin/test/Test.vcf.gz" \
-o "/home/admin/test/Test.txt" \
--symbol --hgvs --hgvsg --variant_class --gene_phenotype \
--flag_pick_allele_gene --canonical --appris --ccds --numbers --total_length --mane \
--sift p --polyphen p \
--fasta  /opt/vep/.vep/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz \
--species homo_sapiens --assembly GRCh37 \
--af --af_gnomad \
--no_escape \
--plugin SpliceAI,snv=/opt/vep/.vep/Grch37/spliceai_scores.raw.snv.hg19.vcf.gz,indel=/opt/vep/.vep/Grch37/spliceai_scores.raw.indel.hg19.vcf.gz \
--plugin NMD \
--dir_plugins /opt/vep/.vep/Plugins \
--plugin dbNSFP,/opt/vep/.vep/Grch37/dbNSFP4.5a_grch37.gz,PROVEAN_pred,LRT_pred,MutationTaster_pred,\
MutationAssessor_pred,FATHMM_pred,fathmm-MKL_coding_pred,M-CAP_pred,fathmm-XF_coding_pred,\
DANN_score,MutPred_score,PrimateAI_pred,Aloft_pred,BayesDel_addAF_pred,LIST-S2_pred,\
MVP_score,Eigen-phred_coding,SiPhy_29way_logOdds,bStatistic,Interpro_domain,MetaLR_pred,\
GTEx_V8_gene,GTEx_V8_tissue,VEST4_score,REVEL_score,AlphaMissense_score \
--offline --tab --buffer_size 50 --fork 5 --force_overwrite ;

Despite following the suggestions, the issue persists. The script runs with a smaller buffer size but takes a significantly longer time to complete. It appears that higher buffer sizes and fork counts lead to process communication issues.

I tried these commands with VEP versions 111 and 109.3, and the same error occurs. However, when using versions 106.0 or 106.1, it works without any issues.

Could you provide further insights or additional configurations that might help resolve this problem?

I look forward to your guidance on this issue.

Thanks,
Ananya

nuno-agostinho self-assigned this May 23, 2024

nuno-agostinho added the docker/singularity label May 23, 2024

nuno-agostinho assigned nakib103 and unassigned nuno-agostinho Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance and Error Issues with VEP v111 Docker Container #1681

Performance and Error Issues with VEP v111 Docker Container #1681

Ananya-swi commented May 23, 2024

nuno-agostinho commented May 23, 2024

Ananya-swi commented May 24, 2024 •

edited

Loading

nuno-agostinho commented May 24, 2024

Ananya-swi commented May 24, 2024

Performance and Error Issues with VEP v111 Docker Container #1681

Performance and Error Issues with VEP v111 Docker Container #1681

Comments

Ananya-swi commented May 23, 2024

nuno-agostinho commented May 23, 2024

Ananya-swi commented May 24, 2024 • edited Loading

nuno-agostinho commented May 24, 2024

Ananya-swi commented May 24, 2024

Ananya-swi commented May 24, 2024 •

edited

Loading