Parallel execution of MSA tools #399

fuji8 · 2022-03-14T23:15:07Z

Based on the dependencies among the data, the MSA tool execution is divided into the following three parts and the tools are called asynchronously to execute them in parallel.

jackhmmer(uniref90) + template_searcher(pdb)
jackhmmer(mgnify)
hhblits(bfd) or jackhmmer(small_bfd)

Execution time may be reduced if sufficient cpu, memory and I/O performance are available.

The implementation uses concurrent.futures.ThreadPoolExecutor and max_workers can be specified with the --n_parallel_msa flag.
If --n_parallel_msa flag is 1, the execution is not parallelized.
The case where --n_parallel_msa flag is 3 is the maximum and potentially the fastest.

example

The following is a partial log of a T1041 run with 28 CPU cores and 235 GB of RAM.

I0222 13:49:16.924237 46912496428352 run_alphafold.py:165] Predicting T1041
I0222 13:49:16.927476 46924398855936 jackhmmer.py:133] Launching subprocess "/apps/t3/sles12sp4/free/alphafold/2.1.1/miniconda/py39_4.10.3/envs/alphafold/bin/jackhmmer -o /dev/null -A /scr/10806581.1.all.q/tmpcxmfcp42/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /home/4/18B13254/workspace/research/alphafold2-profiling/fasta/T1041.fasta /gs/hs0/GSIC/alphafold/2.1.1/data/uniref90/uniref90.fasta"
I0222 13:49:16.927617 46924403058432 hhblits.py:128] Launching subprocess "/home/4/18B13254/workspace/research/hh-suite/mirror/hh-suite/build-stable/bin/hhblits -i /home/4/18B13254/workspace/research/alphafold2-profiling/fasta/T1041.fasta -cpu 28 -oa3m /scr/10806581.1.all.q/tmpmd_hpnsn/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /gs/hs0/GSIC/alphafold/2.1.1/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /gs/hs0/GSIC/alphafold/2.1.1/data/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0222 13:49:16.927811 46924400957184 jackhmmer.py:133] Launching subprocess "/apps/t3/sles12sp4/free/alphafold/2.1.1/miniconda/py39_4.10.3/envs/alphafold/bin/jackhmmer -o /dev/null -A /scr/10806581.1.all.q/tmp36tcl19r/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /home/4/18B13254/workspace/research/alphafold2-profiling/fasta/T1041.fasta /gs/hs0/GSIC/alphafold/2.1.1/data/mgnify/mgy_clusters_2018_12.fa"
I0222 13:49:16.984938 46924398855936 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0222 13:49:17.004662 46924400957184 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0222 13:49:17.071144 46924403058432 utils.py:36] Started HHblits query
I0222 13:55:33.821986 46924398855936 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 376.837 seconds
I0222 13:55:33.826219 46924398855936 hhsearch.py:85] Launching subprocess "/apps/t3/sles12sp4/free/alphafold/2.1.1/miniconda/py39_4.10.3/envs/alphafold/bin/hhsearch -i /scr/10806581.1.all.q/tmp_wlgfvtm/query.a3m -o /scr/10806581.1.all.q/tmp_wlgfvtm/output.hhr -maxseq 1000000 -d /gs/hs0/GSIC/alphafold/2.1.1/data/pdb70/pdb70"
I0222 13:55:33.920660 46924398855936 utils.py:36] Started HHsearch query
I0222 13:55:55.552874 46924403058432 utils.py:40] Finished HHblits query in 398.481 seconds
I0222 13:56:22.036781 46924400957184 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 425.032 seconds
I0222 14:04:08.089480 46924398855936 utils.py:40] Finished HHsearch query in 514.168 seconds

google-cla · 2022-03-14T23:15:12Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

charmichaeld · 2022-05-12T17:28:59Z

how are the number of cores for the hhblits step determined? The example shows the three parallel steps are using 44 total cores on a 28 core machine; 8 cores for each of the two jackhmmer steps and 28 cores for the hhblits step. Or are there 44+ cores available on the machine in the example?

fuji8 · 2022-05-13T18:44:47Z

The number of available physical cores on this machine is 28, and the number of logical cores (the number of cores visible to the OS) is 56 due to Hyper-Threading. Although the number of logical cores is 44 < 56, it has no significant effect on execution or performance. (For example, it is possible to run hhblits with 128 cores, and there will rarely be any difference in performance.)

What is important in determining the optimal number of hhblits cores is the I/O performance of the storage where the BFD is stored. If the storage is HDD, the bottleneck will probably be I/O performance, and increasing the number of cores will not improve the execution time. If the storage is SSD, it may be worthwhile to increase the number of cores to a level that does not exceed the number of physical cores.

In this example, the reason for increasing the number of cores is that the storage uses the Lustre File System, which improves I/O performance when the number of parallel cores is increased. In the case of this file system, it makes sense to increase the parallelization number even if the physical storage is HDD. The number of cores used here is 28, which is derived from the number of physical cores, but there was no difference in performance from about 16.

If this PR improves the execution time rather than simply changing the number of cores, the following factors can be considered.
Speed up non-parallelized parts that are not affected by the -cpu value. Not all of the code for each tool is parallelized, so there are places where it is running on a single core without being affected by the value of -cpu.
The timing of I/O and CPU resource needs of each tool are different. If the timing of I/O accesses are staggered, I/O performance can be better utilized from a time perspective. However, simultaneous I/O accesses may degrade performance. The same is true for CPUs, where not all cores specified by -cpu are dedicated and used, such as for I/O latency, so that other tools can use it.

If you are using HDDs and the execution time of hhblits is extremely slow, you may want to move the file containing cs219 (about 20 GB) in the bfd directory to an SSD and put a symbolic link to it. This file is called a prefiltering database file in hhblits-user-guide and should ideally be in memory, but it will be better to have it on the SSD than on the HDD.

xlminfei · 2022-05-31T17:21:44Z

when i set n_parallel_msa=3 and run multimer, i found it will start Jackhmmer(uniref90), Jackhmmer(mgy_clusters_2018_12) and HHblite at the same time. And after all those have done, it will start Jackhmmer(uniprot).
Is there anyway to start Jackhmmer(uniprot) with Jackhmmer(uniref90), Jackhmmer(mgy_clusters_2018_12) and HHblite at the same time?
thanks

fuji8 · 2022-06-03T18:37:07Z

@xlminfei I am sorry, but the current PR can not simultaneous execution of Jackhmmer(uniprot).
However, it is possible in principle, so I will consider implementing it.

Phage-structure-geek · 2022-06-20T09:56:31Z

Hi,

When your mods are run on T1050, the size of the uniref90_hits.sto file increases from 75 MB to 1.16 GB. The size of the mgnify_hits.sto increases from 3.6 MB to 1.9 GB. The other two MSA files are very similar in size to the original AF implementation (whatever the latest version). The final models are similar in both cases.

What is going on? Why concurrent execution of MSA results in such a massive increase in the size of MSA files?

Thanks,

Petr

Phage-structure-geek · 2022-06-21T19:26:47Z

Hi,

After some additional testing, this parallel implementation works very well. On short to medium-sized sequences the speedup of the complete run is up to 25%. Very impressive!

Thank you @fuji8 for this implementation! I hope this will be incorporated into the main branch soon.

Petr

YaoYinYing · 2022-10-10T09:36:58Z

@xlminfei I am sorry, but the current PR can not simultaneous execution of Jackhmmer(uniprot). However, it is possible in principle, so I will consider implementing it.

Hi, I am working on this MSA parallel issue and found your PR. You may consider my fork for this implementation. this modification has successfully passed both monomer and heterodimer modeling.

I1010 16:32:12.457884 140302699362112 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I1010 16:32:12.460422 140302699362112 utils.py:36] Started Jackhmmer (mgy_clusters.fa) query
I1010 16:32:12.460629 140302699362112 utils.py:36] Started Jackhmmer (uniprot.fasta) query
I1010 16:32:12.460555 140302699362112 utils.py:36] Started HHblits query
I1010 16:38:13.022515 140302699362112 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 360.564 seconds
I1010 16:38:39.663420 140302699362112 utils.py:40] Finished HHblits query in 387.202 seconds
I1010 16:39:04.995680 140302699362112 utils.py:40] Finished Jackhmmer (mgy_clusters.fa) query in 412.535 seconds
I1010 16:43:12.018283 140302699362112 utils.py:40] Finished Jackhmmer (uniprot.fasta) query in 659.557 seconds

Howerver, I'm sure the parallel execution requires extremely high instant reading speed provided by SSD.

Phage-structure-geek · 2023-02-04T03:29:47Z

Hi @fuji8,

Would you have time to modify the latest release of AF (2.3) to make the tools run concurrently? Your v2.2 implementation was much faster than the official distro. It saved all of us a lot of time... Disappointingly, your contribution wasn't implemented in the new release. I hope it won't take too much of your time and effort to convert the v.2.2 implementation to v.2.3.

Thanks so much!

Petr

tcoates5 · 2023-02-08T15:39:01Z

alphafold/data/pipeline.py

@@ -124,7 +126,8 @@ def __init__(self,
               use_small_bfd: bool,
               mgnify_max_hits: int = 501,
               uniref_max_hits: int = 10000,
-               use_precomputed_msas: bool = False):
+               use_precomputed_msas: bool = False,
+               n_parallel_msa: int = 1):


It would be helpful if you were to add a comment here about the number of logical threads involved with a non-default value of n_parallel_msa. Alternatively, if the number of available cores could be supplied, a function could be devised to adjust both this and the n_cpu variables in the tools folder to optimize the run

…el_msa

fuji8 · 2023-02-12T19:05:08Z

Sorry, I made a mistake and temporarily closed PR.

@Phage-structure-geek
Fixed to work with v2.3 as well.
I have confirmed that it works, but please let me know if it doesn't.

When your mods are run on T1050, the size of the uniref90_hits.sto file increases from 75 MB to 1.16 GB. The size of the mgnify_hits.sto increases from 3.6 MB to 1.9 GB. The other two MSA files are very similar in size to the original AF implementation (whatever the latest version). The final models are similar in both cases.
What is going on? Why concurrent execution of MSA results in such a massive increase in the size of MSA files?
#399 (comment)

As for this one, I think it's because I forgot to reflect the one below.
https://github.com/deepmind/alphafold/blob/18e12d61314214c51ca266d192aad3cc6619018a/alphafold/data/pipeline.py#L169

It has already been fixed, so the file size difference should disappear.

Phage-structure-geek · 2023-02-14T20:33:30Z

Hi @fuji8
All is working here as well!
Here are the runtimes for comparison:
hexamer (two homotrimers = two consecutive runs of the tools):
features: 2435 vs. 3520;
small monomer
features: 625 vs 1243.
Very nice!
Thanks again for your work!
Petr

groponp · 2023-02-15T04:38:29Z

Hi , I'm sorry but general question, but how can install it PR into my PC?, I've a workstation with 32 cores and 4 P100 GPus, and will like use all resource to modeling 5000 proteins. It is suitable using it PR or general AlphaFold?

Phage-structure-geek · 2023-02-15T04:55:14Z

Hi , I'm sorry but general question, but how can install it PR into my PC?, I've a workstation with 32 cores and 4 P100 GPus, and will like use all resource to modeling 5000 proteins. It is suitable using it PR or general AlphaFold?

Replace the three files containing the changes (see the tab at the top "Files changed") in your AF distro and recompile the docker.
Do not forget to add the --n_parallel_msa=3 flag to other flags in your python3 docker/run_docker.py string (or script) when you run your new docker.

Note that the speedup is substantial but jackhhmer still often uses only two threads, no matter how many threads you compiled it with. It is nevertheless so much more satisfying to see jackhhmer and hhblits run in parallel and at least hhblits using a good amount of CPU.

Petr

Add parallel runs of MSA tools

5e18b3a

fuji8 mentioned this pull request May 7, 2022

is CPU acceleration available in this repo? Zuricho/ParallelFold#17

Open

dikay0 mentioned this pull request May 30, 2022

Tensor cores or cuda cores? #485

Open

tcoates5 reviewed Feb 8, 2023

View reviewed changes

fuji8 closed this Feb 11, 2023

fuji8 force-pushed the parallel_msa branch from 5e18b3a to 18e12d6 Compare February 11, 2023 17:21

fuji8 added 3 commits February 12, 2023 02:57

Merge branch 'parallel_msa' of github.com:fuji8/alphafold into parall…

aa9a45a

…el_msa

fix for v2.3.1

86c7775

rename func name

6acf53c

fuji8 reopened this Feb 11, 2023

fix for docker

36ea354

dixiadishiyi123 mentioned this pull request Apr 10, 2023

Has CPU acceleration (the parallelization of the 3 library lookup) been implemented in this repo yet? Zuricho/ParallelFold#34

Open

VRehnberg mentioned this pull request Apr 25, 2024

AlphaFold, new flags for resource utilisation easybuilders/easybuild-easyconfigs#20421

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel execution of MSA tools #399

Parallel execution of MSA tools #399

fuji8 commented Mar 14, 2022

google-cla bot commented Mar 14, 2022

charmichaeld commented May 12, 2022

fuji8 commented May 13, 2022

xlminfei commented May 31, 2022

fuji8 commented Jun 3, 2022

Phage-structure-geek commented Jun 20, 2022 •

edited

Loading

Phage-structure-geek commented Jun 21, 2022

YaoYinYing commented Oct 10, 2022

Phage-structure-geek commented Feb 4, 2023

tcoates5 Feb 8, 2023

fuji8 commented Feb 12, 2023

Phage-structure-geek commented Feb 14, 2023

groponp commented Feb 15, 2023

Phage-structure-geek commented Feb 15, 2023 •

edited

Loading

Parallel execution of MSA tools #399

Are you sure you want to change the base?

Parallel execution of MSA tools #399

Conversation

fuji8 commented Mar 14, 2022

example

google-cla bot commented Mar 14, 2022

charmichaeld commented May 12, 2022

fuji8 commented May 13, 2022

xlminfei commented May 31, 2022

fuji8 commented Jun 3, 2022

Phage-structure-geek commented Jun 20, 2022 • edited Loading

Phage-structure-geek commented Jun 21, 2022

YaoYinYing commented Oct 10, 2022

Phage-structure-geek commented Feb 4, 2023

tcoates5 Feb 8, 2023

Choose a reason for hiding this comment

fuji8 commented Feb 12, 2023

Phage-structure-geek commented Feb 14, 2023

groponp commented Feb 15, 2023

Phage-structure-geek commented Feb 15, 2023 • edited Loading

Phage-structure-geek commented Jun 20, 2022 •

edited

Loading

Phage-structure-geek commented Feb 15, 2023 •

edited

Loading