Skip to content

Commit

Permalink
Merge pull request #318 from LilyAnderssonLee/metaphlan4_profiler
Browse files Browse the repository at this point in the history
Add Metaphlan4 profiler
  • Loading branch information
LilyAnderssonLee committed Jul 17, 2023
2 parents 71c9275 + 1be2aaa commit 36d3156
Show file tree
Hide file tree
Showing 24 changed files with 109 additions and 106 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#315](https://github.com/nf-core/taxprofiler/pull/315) Updated to nf-core pipeline template v2.9 (added by @sofstam & @jfy133)
- [#319](https://github.com/nf-core/taxprofiler/pull/319) Added support for virus hit expansion in Kaiju (❤️ to @dnlrxn for requesting, added by @jfy133)
- [#323](https://github.com/nf-core/taxprofiler/pull/323) Add ability to skip sequencing quality control tools (❤️ to @vinisalazar for requesting, added by @jfy133)
- [#318](https://github.com/nf-core/taxprofiler/pull/318) Added the profiler MetaPhlAn4 and removed MetaPhlAn3 (added by @LilyAnderssonLee)

### `Fixed`

Expand Down
4 changes: 2 additions & 2 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@

> Breitwieser, Florian P., Daniel N. Baker, and Steven L. Salzberg. 2018. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biology 19 (1): 198. doi: 10.1186/s13059-018-1568-0
- [MetaPhlAn3](https://doi.org/10.7554/eLife.65088)
- [MetaPhlAn](https://doi.org/10.1038/s41587-023-01688-w)

> Beghini, Francesco, Lauren J McIver, Aitor Blanco-Míguez, Leonard Dubois, Francesco Asnicar, Sagun Maharjan, Ana Mailyan, et al. 2021. “Integrating Taxonomic, Functional, and Strain-Level Profiling of Diverse Microbial Communities with BioBakery 3.” Edited by Peter Turnbaugh, Eduardo Franco, and C Titus Brown. ELife 10 (May): e65088. doi: 10.7554/eLife.65088
> Blanco-Míguez, A., Beghini, F., Cumbo, F. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol (2023). doi: 10.1038/s41587-023-01688-w
- [MALT](https://doi.org/10.1038/s41559-017-0446-6)

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
3. Supports statistics for host-read removal ([Samtools](http://www.htslib.org/))
4. Performs taxonomic classification and/or profiling using one or more of:
- [Kraken2](https://ccb.jhu.edu/software/kraken2/)
- [MetaPhlAn3](https://huttenhower.sph.harvard.edu/metaphlan/)
- [MetaPhlAn](https://huttenhower.sph.harvard.edu/metaphlan/)
- [MALT](https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/malt/)
- [DIAMOND](https://github.com/bbuchfink/diamond)
- [Centrifuge](https://ccb.jhu.edu/software/centrifuge/)
Expand Down Expand Up @@ -68,7 +68,7 @@ Additionally, you will need a database sheet that looks as follows:
```
tool,db_name,db_params,db_path
kraken2,db2,--quick,/<path>/<to>/kraken2/testdb-kraken2.tar.gz
metaphlan3,db1,,/<path>/<to>/metaphlan3/metaphlan_database/
metaphlan,db1,,/<path>/<to>/metaphlan/metaphlan_database/
```

That includes directories or `.tar.gz` archives containing databases for the tools you wish to run the pipeline against.
Expand All @@ -81,7 +81,7 @@ nextflow run nf-core/taxprofiler \
--input samplesheet.csv \
--databases databases.csv \
--outdir <OUTDIR> \
--run_kraken2 --run_metaphlan3
--run_kraken2 --run_metaphlan
```

> **Warning:**
Expand Down
12 changes: 6 additions & 6 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -523,20 +523,20 @@ process {
]
}

withName: METAPHLAN3_METAPHLAN3 {
withName: METAPHLAN_METAPHLAN {
ext.args = { "${meta.db_params}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}.metaphlan3" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}.metaphlan3" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}.metaphlan" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}.metaphlan" }
publishDir = [
path: { "${params.outdir}/metaphlan3/${meta.db_name}/" },
path: { "${params.outdir}/metaphlan/${meta.db_name}/" },
mode: params.publish_dir_mode,
pattern: '*.{biom,txt}'
]
}

withName: METAPHLAN3_MERGEMETAPHLANTABLES {
ext.prefix = { "metaphlan3_${meta.id}_combined_reports" }
withName: METAPHLAN_MERGEMETAPHLANTABLES {
ext.prefix = { "metaphlan_${meta.id}_combined_reports" }
publishDir = [
path: { "${params.outdir}/metaphlan3/" },
path: { "${params.outdir}/metaphlan/" },
mode: params.publish_dir_mode,
pattern: '*.{txt}'
]
Expand Down
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ params {
run_kraken2 = true
run_bracken = true
run_malt = false
run_metaphlan3 = true
run_metaphlan = true
run_centrifuge = true
run_diamond = true
run_krakenuniq = true
Expand Down
2 changes: 1 addition & 1 deletion conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ params {
malt_save_reads = false
malt_generate_megansummary = true

run_metaphlan3 = true
run_metaphlan = true

run_motus = true
motus_save_mgc_read_counts = true
Expand Down
2 changes: 1 addition & 1 deletion conf/test_krakenuniq.config
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ params {
run_kraken2 = false
run_bracken = false
run_malt = false
run_metaphlan3 = false
run_metaphlan = false
run_centrifuge = false
run_diamond = false
run_krakenuniq = true
Expand Down
2 changes: 1 addition & 1 deletion conf/test_motus.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ params {
run_kraken2 = false
run_bracken = false
run_malt = false
run_metaphlan3 = false
run_metaphlan = false
run_centrifuge = false
run_diamond = false
run_krakenuniq = false
Expand Down
2 changes: 1 addition & 1 deletion conf/test_nopreprocessing.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ params {
run_kraken2 = true
run_bracken = true
run_malt = true
run_metaphlan3 = true
run_metaphlan = true
run_centrifuge = true
run_diamond = true
run_krakenuniq = true
Expand Down
2 changes: 1 addition & 1 deletion conf/test_noprofiling.config
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ params {
run_kraken2 = false
run_bracken = false
run_malt = false
run_metaphlan3 = false
run_metaphlan = false
run_centrifuge = false
run_diamond = false
run_krakenuniq = false
Expand Down
2 changes: 1 addition & 1 deletion conf/test_nothing.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ params {
run_kraken2 = false
run_bracken = false
run_malt = false
run_metaphlan3 = false
run_metaphlan = false
run_centrifuge = false
run_diamond = false
run_krakenuniq = false
Expand Down
2 changes: 1 addition & 1 deletion docs/images/taxprofiler_tube.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 9 additions & 9 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Kaiju](#kaiju) - Taxonomic classifier that finds maximum (in-)exact matches on the protein-level.
- [Diamond](#diamond) - Sequence aligner for protein and translated DNA searches.
- [MALT](#malt) - Sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics
- [MetaPhlAn3](#metaphlan3) - Genome-level marker gene based taxonomic classifier
- [MetaPhlAn](#metaphlan) - Genome-level marker gene based taxonomic classifier
- [mOTUs](#motus) - Tool for marker gene-based OTU (mOTU) profiling.
- [ganon](#ganon) - Taxonomic classifier and profile that uses Interleaved Bloom Filters as indices based on k-mers/minimizers.
- [TAXPASTA](#taxpasta) - Tool to standardise taxonomic profiles as well as merge profiles across samples from the same database and classifier/profiler.
Expand Down Expand Up @@ -429,23 +429,23 @@ The main output of MALT is the `.rma6` file format, which can be only loaded int

You will only receive the `.sam` and `.megan` files if you supply `--malt_save_reads` and/or `--malt_generate_megansummary` parameters to the pipeline.

### MetaPhlAn3
### MetaPhlAn

[MetaPhlAn3](https://github.com/biobakery/metaphlan) is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level resolution via marker genes.
[MetaPhlAn](https://github.com/biobakery/metaphlan) is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level resolution via marker genes.

<details markdown="1">
<summary>Output files</summary>

- `metaphlan3/`
- `metaphlan3_<db_name>_combined_reports.txt`: A combined profile of all samples aligned to a given database (as generated by `metaphlan_merge_tables`)
- `metaphlan/`
- `metaphlan_<db_name>_combined_reports.txt`: A combined profile of all samples aligned to a given database (as generated by `metaphlan_merge_tables`)
- `<db_name>/`
- `<sample_id>.biom`: taxonomic profile in BIOM format
- `<sample_id>.bowtie2out.txt`: BowTie2 alignment information (can be re-used for skipping alignment when re-running MetaPhlAn3 with different parameters)
- `<sample_id>_profile.txt`: MetaPhlAn3 taxonomic profile including abundance estimates
- `<sample_id>.bowtie2out.txt`: BowTie2 alignment information (can be re-used for skipping alignment when re-running MetaPhlAn with different parameters)
- `<sample_id>_profile.txt`: MetaPhlAn taxonomic profile including abundance estimates

</details>

The main taxonomic profiling file from MetaPhlAn3 is the `*_profile.txt` file. This provides the abundance estimates from MetaPhlAn3 however does not include raw counts by default.
The main taxonomic profiling file from MetaPhlAn is the `*_profile.txt` file. This provides the abundance estimates from MetaPhlAn however does not include raw counts by default.

### mOTUs

Expand Down Expand Up @@ -535,7 +535,7 @@ The following report files are used for the taxpasta step:
- KrakenUniq: `<sample_id>_<db_name>.report.txt` Taxpasta uses the `reads` column for the standardised profile.
- Kraken2: `<sample_id>_<db_name>.report.txt` Taxpasta uses the `direct_assigned_reads` column for the standardised profile.
- MALT: `<sample_id>.txt.gz` Taxpasta uses the `count` (second) column from the output of MEGAN6's rma2info for the standardised profile.
- MetaPhlAn3: `<sample_id>_profile.txt` Taxpasta uses the `relative_abundance` column multiplied with a fixed number to yield an integer for the standardised profile.
- MetaPhlAn: `<sample_id>_profile.txt` Taxpasta uses the `relative_abundance` column multiplied with a fixed number to yield an integer for the standardised profile.
- mOTUs: `<sample_id>.out` Taxpasta uses the `read_count` column for the standardised profile.

> ⚠️ Please aware the outputs of each tool's standardised profile _may not_ be directly comparable between each tool. Some may report raw read counts, whereas others may report abundance information. Please always refer to the list above, for which information is used for each tool.
Expand Down
37 changes: 18 additions & 19 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ bracken,db1,;-r 150,/<path>/<to>/bracken/testdb-bracken.tar.gz
kraken2,db2,--quick,/<path>/<to>/kraken2/testdb-kraken2.tar.gz
krakenuniq,db3,,/<path>/<to>/krakenuniq/testdb-krakenuniq.tar.gz
centrifuge,db1,,/<path>/<to>/centrifuge/minigut_cf.tar.gz
metaphlan3,db1,,/<path>/<to>/metaphlan3/metaphlan_database/
metaphlan,db1,,/<path>/<to>/metaphlan/metaphlan_database/
motus,db_mOTU,,/<path>/<to>/motus/motus_database/
ganon,db1,,/<path>/<to>/ganon/test-db-ganon.tar.gz
```
Expand Down Expand Up @@ -130,7 +130,7 @@ The (uncompressed) database paths (`db_path`) for each tool are expected to cont
- [**Kraken2**:](#kraken2-custom-database) output of `kraken2-build` command(s).
- [**KrakenUniq**:](#krakenuniq-custom-database) output of `krakenuniq-build` command(s).
- [**MALT**](#malt-custom-database) output of `malt-build`.
- [**MetaPhlAn3**:](#metaphlan3-custom-database) output of with `metaphlan --install` or downloaded from links on the [MetaPhlAn3 wiki](https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.0#customizing-the-database).
- [**MetaPhlAn**:](#metaphlan-custom-database) output of with `metaphlan --install` or downloaded from links on the [MetaPhlAn wiki](https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-4#customizing-the-database).
- [**mOTUs**:](#motus-custom-database) the directory `db_mOTU/` that is downloaded via `motus downloadDB`.
- [**ganon**:](#ganon-custom-database) output of `ganon build` or `ganon build-custom`.

Expand Down Expand Up @@ -298,9 +298,9 @@ MALT does not support paired-end reads alignment (unlike other tools), therefore

Krona can only be run on MALT output if path to Krona taxonomy database supplied to `--krona_taxonomy_directory`. Therefore if you do not supply the a Krona directory, Krona plots will not be produced for MALT.

##### MetaPhlAn3
##### MetaPhlAn

MetaPhlAn3 currently does not accept FASTA files as input, therefore no output will be produced for these input files.
MetaPhlAn4 is compatible with the MetaPhlAn3 database by adding the `--mpa3` paramter to the MetaPhlAn process in the config file `module.config`.

##### mOTUs

Expand Down Expand Up @@ -339,7 +339,7 @@ The following tools will produce multi-sample taxon tables:
- **Centrifuge** (via KrakenTools' `combine_kreports.py` script)
- **Kaiju** (via Kaiju's `kaiju2table` tool)
- **Kraken2** (via KrakenTools' `combine_kreports.py` script)
- **MetaPhlAn3** (via MetaPhlAn's `merge_metaphlan_tables.py` script)
- **MetaPhlAn** (via MetaPhlAn's `merge_metaphlan_tables.py` script)
- **mOTUs** (via the `motus merge` command)
- **ganon** (via the `ganon table` command)

Expand Down Expand Up @@ -712,11 +712,11 @@ You can then add the `<YOUR_DB_NAME>/` path to your nf-core/taxprofiler database

See the [MALT manual](https://software-ab.informatik.uni-tuebingen.de/download/malt/manual.pdf) for more information.

#### MetaPhlAn3 custom database
#### MetaPhlAn custom database

MetaPhlAn3 does not allow (easy) construction of custom databases. Therefore we recommend to use the prebuilt database of marker genes that is provided by the developers.
MetaPhlAn does not allow (easy) construction of custom databases. Therefore we recommend to use the prebuilt database of marker genes that is provided by the developers.

To do this you need to have `MetaPhlAn3` installed on your machine.
To do this you need to have `MetaPhlAn` installed on your machine.

```bash
metaphlan --install --bowtie2db <YOUR_DB_NAME>/
Expand All @@ -731,21 +731,20 @@ You can then add the `<YOUR_DB_NAME>/` path to your nf-core/taxprofiler database
<details markdown="1">
<summary>Expected files in database directory</summary>

- `metaphlan3`
- `mpa_v30_CHOCOPhlAn_201901.pkl`
- `mpa_v30_CHOCOPhlAn_201901.pkl`
- `mpa_v30_CHOCOPhlAn_201901.fasta`
- `mpa_v30_CHOCOPhlAn_201901.3.bt2`
- `mpa_v30_CHOCOPhlAn_201901.4.bt2`
- `mpa_v30_CHOCOPhlAn_201901.1.bt2`
- `mpa_v30_CHOCOPhlAn_201901.2.bt2`
- `mpa_v30_CHOCOPhlAn_201901.rev.1.bt2`
- `mpa_v30_CHOCOPhlAn_201901.rev.2.bt2`
- `metaphlan`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.pkl`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.fna.bz2`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.1.bt2l`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.2.bt2l`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.3.bt2l`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.4.bt2l`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.rev.1.bt2l`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.rev.2.bt2l`
- `mpa_latest`

</details>

More information on the MetaPhlAn3 database can be found [here](https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.1#installation).
More information on the MetaPhlAn database can be found [here](https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-4#Pre-requisites).

#### mOTUs custom database

Expand Down
8 changes: 4 additions & 4 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -156,14 +156,14 @@
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"metaphlan3/mergemetaphlantables": {
"metaphlan/mergemetaphlantables": {
"branch": "master",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"git_sha": "9aa59197c0fb35c29e315bcd10c0fc9e1afc70a8",
"installed_by": ["modules"]
},
"metaphlan3/metaphlan3": {
"metaphlan/metaphlan": {
"branch": "master",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"git_sha": "31ec4470b455fe88c072151a5ea7821bfb2add38",
"installed_by": ["modules"]
},
"minimap2/align": {
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 36d3156

Please sign in to comment.