Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel execution of MSA tools #399

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fuji8
Copy link

@fuji8 fuji8 commented Mar 14, 2022

Based on the dependencies among the data, the MSA tool execution is divided into the following three parts and the tools are called asynchronously to execute them in parallel.

  • jackhmmer(uniref90) + template_searcher(pdb)
  • jackhmmer(mgnify)
  • hhblits(bfd) or jackhmmer(small_bfd)

Execution time may be reduced if sufficient cpu, memory and I/O performance are available.

The implementation uses concurrent.futures.ThreadPoolExecutor and max_workers can be specified with the --n_parallel_msa flag.
If --n_parallel_msa flag is 1, the execution is not parallelized.
The case where --n_parallel_msa flag is 3 is the maximum and potentially the fastest.

example

The following is a partial log of a T1041 run with 28 CPU cores and 235 GB of RAM.

I0222 13:49:16.924237 46912496428352 run_alphafold.py:165] Predicting T1041
I0222 13:49:16.927476 46924398855936 jackhmmer.py:133] Launching subprocess "/apps/t3/sles12sp4/free/alphafold/2.1.1/miniconda/py39_4.10.3/envs/alphafold/bin/jackhmmer -o /dev/null -A /scr/10806581.1.all.q/tmpcxmfcp42/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /home/4/18B13254/workspace/research/alphafold2-profiling/fasta/T1041.fasta /gs/hs0/GSIC/alphafold/2.1.1/data/uniref90/uniref90.fasta"
I0222 13:49:16.927617 46924403058432 hhblits.py:128] Launching subprocess "/home/4/18B13254/workspace/research/hh-suite/mirror/hh-suite/build-stable/bin/hhblits -i /home/4/18B13254/workspace/research/alphafold2-profiling/fasta/T1041.fasta -cpu 28 -oa3m /scr/10806581.1.all.q/tmpmd_hpnsn/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /gs/hs0/GSIC/alphafold/2.1.1/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /gs/hs0/GSIC/alphafold/2.1.1/data/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0222 13:49:16.927811 46924400957184 jackhmmer.py:133] Launching subprocess "/apps/t3/sles12sp4/free/alphafold/2.1.1/miniconda/py39_4.10.3/envs/alphafold/bin/jackhmmer -o /dev/null -A /scr/10806581.1.all.q/tmp36tcl19r/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /home/4/18B13254/workspace/research/alphafold2-profiling/fasta/T1041.fasta /gs/hs0/GSIC/alphafold/2.1.1/data/mgnify/mgy_clusters_2018_12.fa"
I0222 13:49:16.984938 46924398855936 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0222 13:49:17.004662 46924400957184 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0222 13:49:17.071144 46924403058432 utils.py:36] Started HHblits query
I0222 13:55:33.821986 46924398855936 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 376.837 seconds
I0222 13:55:33.826219 46924398855936 hhsearch.py:85] Launching subprocess "/apps/t3/sles12sp4/free/alphafold/2.1.1/miniconda/py39_4.10.3/envs/alphafold/bin/hhsearch -i /scr/10806581.1.all.q/tmp_wlgfvtm/query.a3m -o /scr/10806581.1.all.q/tmp_wlgfvtm/output.hhr -maxseq 1000000 -d /gs/hs0/GSIC/alphafold/2.1.1/data/pdb70/pdb70"
I0222 13:55:33.920660 46924398855936 utils.py:36] Started HHsearch query
I0222 13:55:55.552874 46924403058432 utils.py:40] Finished HHblits query in 398.481 seconds
I0222 13:56:22.036781 46924400957184 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 425.032 seconds
I0222 14:04:08.089480 46924398855936 utils.py:40] Finished HHsearch query in 514.168 seconds

@google-cla
Copy link

google-cla bot commented Mar 14, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

@charmichaeld
Copy link

how are the number of cores for the hhblits step determined? The example shows the three parallel steps are using 44 total cores on a 28 core machine; 8 cores for each of the two jackhmmer steps and 28 cores for the hhblits step. Or are there 44+ cores available on the machine in the example?

@fuji8
Copy link
Author

fuji8 commented May 13, 2022

The number of available physical cores on this machine is 28, and the number of logical cores (the number of cores visible to the OS) is 56 due to Hyper-Threading. Although the number of logical cores is 44 < 56, it has no significant effect on execution or performance. (For example, it is possible to run hhblits with 128 cores, and there will rarely be any difference in performance.)

What is important in determining the optimal number of hhblits cores is the I/O performance of the storage where the BFD is stored. If the storage is HDD, the bottleneck will probably be I/O performance, and increasing the number of cores will not improve the execution time. If the storage is SSD, it may be worthwhile to increase the number of cores to a level that does not exceed the number of physical cores.

In this example, the reason for increasing the number of cores is that the storage uses the Lustre File System, which improves I/O performance when the number of parallel cores is increased. In the case of this file system, it makes sense to increase the parallelization number even if the physical storage is HDD. The number of cores used here is 28, which is derived from the number of physical cores, but there was no difference in performance from about 16.

If this PR improves the execution time rather than simply changing the number of cores, the following factors can be considered.
Speed up non-parallelized parts that are not affected by the -cpu value. Not all of the code for each tool is parallelized, so there are places where it is running on a single core without being affected by the value of -cpu.
The timing of I/O and CPU resource needs of each tool are different. If the timing of I/O accesses are staggered, I/O performance can be better utilized from a time perspective. However, simultaneous I/O accesses may degrade performance. The same is true for CPUs, where not all cores specified by -cpu are dedicated and used, such as for I/O latency, so that other tools can use it.

If you are using HDDs and the execution time of hhblits is extremely slow, you may want to move the file containing cs219 (about 20 GB) in the bfd directory to an SSD and put a symbolic link to it. This file is called a prefiltering database file in hhblits-user-guide and should ideally be in memory, but it will be better to have it on the SSD than on the HDD.

@xlminfei
Copy link

when i set n_parallel_msa=3 and run multimer, i found it will start Jackhmmer(uniref90), Jackhmmer(mgy_clusters_2018_12) and HHblite at the same time. And after all those have done, it will start Jackhmmer(uniprot).
Is there anyway to start Jackhmmer(uniprot) with Jackhmmer(uniref90), Jackhmmer(mgy_clusters_2018_12) and HHblite at the same time?
thanks

@fuji8
Copy link
Author

fuji8 commented Jun 3, 2022

@xlminfei I am sorry, but the current PR can not simultaneous execution of Jackhmmer(uniprot).
However, it is possible in principle, so I will consider implementing it.

@Phage-structure-geek
Copy link

Phage-structure-geek commented Jun 20, 2022

Hi,

When your mods are run on T1050, the size of the uniref90_hits.sto file increases from 75 MB to 1.16 GB. The size of the mgnify_hits.sto increases from 3.6 MB to 1.9 GB. The other two MSA files are very similar in size to the original AF implementation (whatever the latest version). The final models are similar in both cases.

What is going on? Why concurrent execution of MSA results in such a massive increase in the size of MSA files?

Thanks,

Petr

@Phage-structure-geek
Copy link

Hi,

After some additional testing, this parallel implementation works very well. On short to medium-sized sequences the speedup of the complete run is up to 25%. Very impressive!

Thank you @fuji8 for this implementation! I hope this will be incorporated into the main branch soon.

Petr

@YaoYinYing
Copy link

@xlminfei I am sorry, but the current PR can not simultaneous execution of Jackhmmer(uniprot). However, it is possible in principle, so I will consider implementing it.

Hi, I am working on this MSA parallel issue and found your PR. You may consider my fork for this implementation. this modification has successfully passed both monomer and heterodimer modeling.

I1010 16:32:12.457884 140302699362112 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I1010 16:32:12.460422 140302699362112 utils.py:36] Started Jackhmmer (mgy_clusters.fa) query
I1010 16:32:12.460629 140302699362112 utils.py:36] Started Jackhmmer (uniprot.fasta) query
I1010 16:32:12.460555 140302699362112 utils.py:36] Started HHblits query
I1010 16:38:13.022515 140302699362112 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 360.564 seconds
I1010 16:38:39.663420 140302699362112 utils.py:40] Finished HHblits query in 387.202 seconds
I1010 16:39:04.995680 140302699362112 utils.py:40] Finished Jackhmmer (mgy_clusters.fa) query in 412.535 seconds
I1010 16:43:12.018283 140302699362112 utils.py:40] Finished Jackhmmer (uniprot.fasta) query in 659.557 seconds

Howerver, I'm sure the parallel execution requires extremely high instant reading speed provided by SSD.

@Phage-structure-geek
Copy link

Hi @fuji8,

Would you have time to modify the latest release of AF (2.3) to make the tools run concurrently? Your v2.2 implementation was much faster than the official distro. It saved all of us a lot of time... Disappointingly, your contribution wasn't implemented in the new release. I hope it won't take too much of your time and effort to convert the v.2.2 implementation to v.2.3.

Thanks so much!

Petr

@@ -124,7 +126,8 @@ def __init__(self,
use_small_bfd: bool,
mgnify_max_hits: int = 501,
uniref_max_hits: int = 10000,
use_precomputed_msas: bool = False):
use_precomputed_msas: bool = False,
n_parallel_msa: int = 1):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful if you were to add a comment here about the number of logical threads involved with a non-default value of n_parallel_msa. Alternatively, if the number of available cores could be supplied, a function could be devised to adjust both this and the n_cpu variables in the tools folder to optimize the run

@fuji8 fuji8 reopened this Feb 11, 2023
@fuji8
Copy link
Author

fuji8 commented Feb 12, 2023

Sorry, I made a mistake and temporarily closed PR.

@Phage-structure-geek
Fixed to work with v2.3 as well.
I have confirmed that it works, but please let me know if it doesn't.

When your mods are run on T1050, the size of the uniref90_hits.sto file increases from 75 MB to 1.16 GB. The size of the mgnify_hits.sto increases from 3.6 MB to 1.9 GB. The other two MSA files are very similar in size to the original AF implementation (whatever the latest version). The final models are similar in both cases.
What is going on? Why concurrent execution of MSA results in such a massive increase in the size of MSA files?
#399 (comment)

As for this one, I think it's because I forgot to reflect the one below.
https://github.com/deepmind/alphafold/blob/18e12d61314214c51ca266d192aad3cc6619018a/alphafold/data/pipeline.py#L169

It has already been fixed, so the file size difference should disappear.

@Phage-structure-geek
Copy link

Hi @fuji8
All is working here as well!
Here are the runtimes for comparison:
hexamer (two homotrimers = two consecutive runs of the tools):
features: 2435 vs. 3520;
small monomer
features: 625 vs 1243.
Very nice!
Thanks again for your work!
Petr

@groponp
Copy link

groponp commented Feb 15, 2023

Hi , I'm sorry but general question, but how can install it PR into my PC?, I've a workstation with 32 cores and 4 P100 GPus, and will like use all resource to modeling 5000 proteins. It is suitable using it PR or general AlphaFold?

@Phage-structure-geek
Copy link

Phage-structure-geek commented Feb 15, 2023

Hi , I'm sorry but general question, but how can install it PR into my PC?, I've a workstation with 32 cores and 4 P100 GPus, and will like use all resource to modeling 5000 proteins. It is suitable using it PR or general AlphaFold?

Replace the three files containing the changes (see the tab at the top "Files changed") in your AF distro and recompile the docker.
Do not forget to add the --n_parallel_msa=3 flag to other flags in your python3 docker/run_docker.py string (or script) when you run your new docker.

Note that the speedup is substantial but jackhhmer still often uses only two threads, no matter how many threads you compiled it with. It is nevertheless so much more satisfying to see jackhhmer and hhblits run in parallel and at least hhblits using a good amount of CPU.

Petr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants