wfmash step to speed up #205

OZTaekOppa · 2024-08-01T05:36:30Z

Description of feature

Dear nf-core & pangenome team,

I have a few questions about your great program.

Based on the link (https://github.com/nf-core/pangenome/blob/1.0.0/modules/nf-core/wfmash/main.nf), it appears that wfmash performs all-vs-all alignment on a single node.

wfmash \\
    ${fasta_gz} \\
    $query \\
    $query_list \\
    --threads $task.cpus \\
    $paf_mappings \\
    $args > ${prefix}.paf

From my trials, this is indeed the case.

I am trying to speed up the wfmash process on multiple nodes (PBSpro) by running parallel jobs. My idea is to perform one-vs-all alignments for each node from an input full genome dataset (120 human pangenomes), and then merge the results into a single paf file for further analysis.

Do you have any recommendations for tweaking the wfmash code to achieve this?
If I run one-vs-all alignments on each node, will the merged paf file be equivalent to an all-vs-all alignment? Theoretically, I assume the final outcome should be the same.

Looking forward to your insights.

Kind regards,

Taek

The text was updated successfully, but these errors were encountered:

subwaystation · 2024-08-01T06:37:16Z

Dear @OZTaekOppa,

Per default, wfmash indeed only makes use of one node. However, there is a parameter called --wfmash_chunks https://nf-co.re/pangenome/1.1.2/parameters/#wfmash_chunks which allows nf-core/pangenome to scale the all-vs-all base pair level alignments across nodes of a cluster. This was also extensively evaluated in https://www.biorxiv.org/content/10.1101/2024.05.13.593871v1.

Just to be clear about wfmash again, when wfmash_chunks > 1:

wfmash is run in approximate mapping mode which finds sequence homologies determined by the given wfmash parameters

pangenome/subworkflows/local/pggb.nf

Line 47 in af6d1dd

WFMASH_MAP(ch_wfmash_map,
The resulting PAF is split into chunks of equal alignment problem size, the number of chunks is given by --wfmash_chunks

pangenome/subworkflows/local/pggb.nf

Line 51 in af6d1dd

SPLIT_APPROX_MAPPINGS_IN_CHUNKS(WFMASH_MAP.out.paf)
For each such chunked PAF we can run wfmash in base pair level alignment mode on nodes of a cluster in paralleld

pangenome/subworkflows/local/pggb.nf

Line 54 in af6d1dd

WFMASH_ALIGN(ch_wfmash_align,

I hope this answers your question!

subwaystation · 2024-08-01T06:37:47Z

I didn't test it for one vs. all, but it should work out the same way.

subwaystation · 2024-08-01T06:39:03Z

This question is also discussed at pangenome/pggb#403.

OZTaekOppa · 2024-08-01T06:52:34Z

Hi @subwaystation,

Thank you for your prompt reply.
I will get back to you after testing your suggestion.

Cheers,

Taek

OZTaekOppa · 2024-08-16T06:58:55Z

Hi @subwaystation,

The current single-node approach requires significant RAM, CPUs, and extended walltime. The HPC team is exploring alternative solutions to run parallel jobs across multiple nodes.

From testing a small dataset, both the all-vs-all and one-vs-all approaches produced the same outcome. Currently, I am working with the team to optimize the partition and PGGB steps for Nextflow.

Cheers,

Taek

subwaystation · 2024-08-16T07:56:25Z

I am a little bit confused. There is an option to directly run wfmash across several nodes, as stated above.
Did you try this one?

Else I am curious, how your plans will turn out :)

OZTaekOppa added the enhancement Improvement for existing functionality label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wfmash step to speed up #205

wfmash step to speed up #205

OZTaekOppa commented Aug 1, 2024 •

edited

Loading

subwaystation commented Aug 1, 2024

subwaystation commented Aug 1, 2024

subwaystation commented Aug 1, 2024

OZTaekOppa commented Aug 1, 2024

OZTaekOppa commented Aug 16, 2024

subwaystation commented Aug 16, 2024

wfmash step to speed up #205

wfmash step to speed up #205

Comments

OZTaekOppa commented Aug 1, 2024 • edited Loading

Description of feature

subwaystation commented Aug 1, 2024

subwaystation commented Aug 1, 2024

subwaystation commented Aug 1, 2024

OZTaekOppa commented Aug 1, 2024

OZTaekOppa commented Aug 16, 2024

subwaystation commented Aug 16, 2024

OZTaekOppa commented Aug 1, 2024 •

edited

Loading