Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Metaphlan4 profiler #318

Merged
merged 33 commits into from
Jul 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
66800aa
Add MetaPhlAn4 citation to CITATIONS.md
LilyAnderssonLee Jul 7, 2023
e10027b
Add MetaPhlAn4 to README.md
LilyAnderssonLee Jul 7, 2023
71dc963
Add profiler MetaPhlAn4 to modules.config
LilyAnderssonLee Jul 7, 2023
44df24f
Add profiler MetaPhlAn4 to conf/test.config
LilyAnderssonLee Jul 7, 2023
9ce1282
Add profiler MetaPhlAn4 to conf/test_full.config
LilyAnderssonLee Jul 7, 2023
7537c0b
Add profiler MetaPhlAn4 to conf/test_krakenuniq.config
LilyAnderssonLee Jul 7, 2023
b212e94
Add profiler MetaPhlAn4 to conf/test_motus.config
LilyAnderssonLee Jul 7, 2023
b6b49b3
Add profiler MetaPhlAn4 to conf/test_nopreprocessing.coconfig
LilyAnderssonLee Jul 7, 2023
15a4e1b
Add profiler MetaPhlAn4 to conf/test_noprofiling.config
LilyAnderssonLee Jul 7, 2023
708e81d
Add profiler MetaPhlAn4 to conf/test_nothing.config
LilyAnderssonLee Jul 7, 2023
4fd9d55
Add profiler MetaPhlAn4 to docs/images/taxprofiler_tube.svg
LilyAnderssonLee Jul 7, 2023
fcf5166
Add MetaPhlAn4 to docs/output.md, docs/usage.md
LilyAnderssonLee Jul 7, 2023
de4baeb
Added parameters to nextflow.config
LilyAnderssonLee Jul 7, 2023
1706cc3
remove the module metaphlan3/metaphlan3
LilyAnderssonLee Jul 7, 2023
55d89f9
Remove the module metaphlan3/mergemetaphlantables
LilyAnderssonLee Jul 7, 2023
b21f113
update the version of module taxpasta/merge, taxpasta/standardise
LilyAnderssonLee Jul 7, 2023
d10e79d
Added profiler to expected tools in subworkflows/local/db_check.nf
LilyAnderssonLee Jul 7, 2023
0b279c0
Added profiler to subworkflows/local/profiling.nf
LilyAnderssonLee Jul 7, 2023
3b3a527
Added profiler to subworkflows/local/standardisation_profiles.nf
LilyAnderssonLee Jul 7, 2023
61e4eca
update the pipeline schema
LilyAnderssonLee Jul 7, 2023
4531549
install the module metaphlan
LilyAnderssonLee Jul 7, 2023
4f945e3
nf-core lint fix format .github/CONTRIBUTING.md
LilyAnderssonLee Jul 7, 2023
985464e
nf-core lint fix format
LilyAnderssonLee Jul 7, 2023
2f8e370
Update into the most recent version on dev
LilyAnderssonLee Jul 7, 2023
5cb7d97
Modify nextflow_schema.json
LilyAnderssonLee Jul 10, 2023
e03f317
comment krakenuniq step ci.yml
LilyAnderssonLee Jul 13, 2023
d5d2bc3
Merge branch 'dev' into metaphlan4_profiler
jfy133 Jul 13, 2023
9b7ffb8
Add MetaPhlAn4 to docs/usage.md
LilyAnderssonLee Jul 13, 2023
9061278
Add contribution to the CHANGELOG.md
LilyAnderssonLee Jul 13, 2023
75581b0
uncomment KrakenUniq in ci.yml
LilyAnderssonLee Jul 14, 2023
bc27e8a
pull the ci.yml from dev branch
LilyAnderssonLee Jul 14, 2023
14bed9d
Merge branch 'dev' into metaphlan4_profiler
LilyAnderssonLee Jul 17, 2023
1be2aaa
Update CHANGELOG.md
LilyAnderssonLee Jul 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#315](https://github.com/nf-core/taxprofiler/pull/315) Updated to nf-core pipeline template v2.9 (added by @sofstam & @jfy133)
- [#319](https://github.com/nf-core/taxprofiler/pull/319) Added support for virus hit expansion in Kaiju (❤️ to @dnlrxn for requesting, added by @jfy133)
- [#323](https://github.com/nf-core/taxprofiler/pull/323) Add ability to skip sequencing quality control tools (❤️ to @vinisalazar for requesting, added by @jfy133)
- [#318](https://github.com/nf-core/taxprofiler/pull/318) Added the profiler MetaPhlAn4 and removed MetaPhlAn3 (added by @LilyAnderssonLee)
LilyAnderssonLee marked this conversation as resolved.
Show resolved Hide resolved

### `Fixed`

Expand Down
4 changes: 2 additions & 2 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@

> Breitwieser, Florian P., Daniel N. Baker, and Steven L. Salzberg. 2018. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biology 19 (1): 198. doi: 10.1186/s13059-018-1568-0

- [MetaPhlAn3](https://doi.org/10.7554/eLife.65088)
- [MetaPhlAn](https://doi.org/10.1038/s41587-023-01688-w)

> Beghini, Francesco, Lauren J McIver, Aitor Blanco-Míguez, Leonard Dubois, Francesco Asnicar, Sagun Maharjan, Ana Mailyan, et al. 2021. “Integrating Taxonomic, Functional, and Strain-Level Profiling of Diverse Microbial Communities with BioBakery 3.” Edited by Peter Turnbaugh, Eduardo Franco, and C Titus Brown. ELife 10 (May): e65088. doi: 10.7554/eLife.65088
> Blanco-Míguez, A., Beghini, F., Cumbo, F. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol (2023). doi: 10.1038/s41587-023-01688-w

- [MALT](https://doi.org/10.1038/s41559-017-0446-6)

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
3. Supports statistics for host-read removal ([Samtools](http://www.htslib.org/))
4. Performs taxonomic classification and/or profiling using one or more of:
- [Kraken2](https://ccb.jhu.edu/software/kraken2/)
- [MetaPhlAn3](https://huttenhower.sph.harvard.edu/metaphlan/)
- [MetaPhlAn](https://huttenhower.sph.harvard.edu/metaphlan/)
- [MALT](https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/malt/)
- [DIAMOND](https://github.com/bbuchfink/diamond)
- [Centrifuge](https://ccb.jhu.edu/software/centrifuge/)
Expand Down Expand Up @@ -68,7 +68,7 @@ Additionally, you will need a database sheet that looks as follows:
```
tool,db_name,db_params,db_path
kraken2,db2,--quick,/<path>/<to>/kraken2/testdb-kraken2.tar.gz
metaphlan3,db1,,/<path>/<to>/metaphlan3/metaphlan_database/
metaphlan,db1,,/<path>/<to>/metaphlan/metaphlan_database/
```

That includes directories or `.tar.gz` archives containing databases for the tools you wish to run the pipeline against.
Expand All @@ -81,7 +81,7 @@ nextflow run nf-core/taxprofiler \
--input samplesheet.csv \
--databases databases.csv \
--outdir <OUTDIR> \
--run_kraken2 --run_metaphlan3
--run_kraken2 --run_metaphlan
```

> **Warning:**
Expand Down
12 changes: 6 additions & 6 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -523,20 +523,20 @@ process {
]
}

withName: METAPHLAN3_METAPHLAN3 {
withName: METAPHLAN_METAPHLAN {
ext.args = { "${meta.db_params}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}.metaphlan3" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}.metaphlan3" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}.metaphlan" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}.metaphlan" }
publishDir = [
path: { "${params.outdir}/metaphlan3/${meta.db_name}/" },
path: { "${params.outdir}/metaphlan/${meta.db_name}/" },
mode: params.publish_dir_mode,
pattern: '*.{biom,txt}'
]
}

withName: METAPHLAN3_MERGEMETAPHLANTABLES {
ext.prefix = { "metaphlan3_${meta.id}_combined_reports" }
withName: METAPHLAN_MERGEMETAPHLANTABLES {
ext.prefix = { "metaphlan_${meta.id}_combined_reports" }
publishDir = [
path: { "${params.outdir}/metaphlan3/" },
path: { "${params.outdir}/metaphlan/" },
mode: params.publish_dir_mode,
pattern: '*.{txt}'
]
Expand Down
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ params {
run_kraken2 = true
run_bracken = true
run_malt = false
run_metaphlan3 = true
run_metaphlan = true
run_centrifuge = true
run_diamond = true
run_krakenuniq = true
Expand Down
2 changes: 1 addition & 1 deletion conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ params {
malt_save_reads = false
malt_generate_megansummary = true

run_metaphlan3 = true
run_metaphlan = true

run_motus = true
motus_save_mgc_read_counts = true
Expand Down
2 changes: 1 addition & 1 deletion conf/test_krakenuniq.config
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ params {
run_kraken2 = false
run_bracken = false
run_malt = false
run_metaphlan3 = false
run_metaphlan = false
run_centrifuge = false
run_diamond = false
run_krakenuniq = true
Expand Down
2 changes: 1 addition & 1 deletion conf/test_motus.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ params {
run_kraken2 = false
run_bracken = false
run_malt = false
run_metaphlan3 = false
run_metaphlan = false
run_centrifuge = false
run_diamond = false
run_krakenuniq = false
Expand Down
2 changes: 1 addition & 1 deletion conf/test_nopreprocessing.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ params {
run_kraken2 = true
run_bracken = true
run_malt = true
run_metaphlan3 = true
run_metaphlan = true
run_centrifuge = true
run_diamond = true
run_krakenuniq = true
Expand Down
2 changes: 1 addition & 1 deletion conf/test_noprofiling.config
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ params {
run_kraken2 = false
run_bracken = false
run_malt = false
run_metaphlan3 = false
run_metaphlan = false
run_centrifuge = false
run_diamond = false
run_krakenuniq = false
Expand Down
2 changes: 1 addition & 1 deletion conf/test_nothing.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ params {
run_kraken2 = false
run_bracken = false
run_malt = false
run_metaphlan3 = false
run_metaphlan = false
run_centrifuge = false
run_diamond = false
run_krakenuniq = false
Expand Down
2 changes: 1 addition & 1 deletion docs/images/taxprofiler_tube.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 9 additions & 9 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Kaiju](#kaiju) - Taxonomic classifier that finds maximum (in-)exact matches on the protein-level.
- [Diamond](#diamond) - Sequence aligner for protein and translated DNA searches.
- [MALT](#malt) - Sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics
- [MetaPhlAn3](#metaphlan3) - Genome-level marker gene based taxonomic classifier
- [MetaPhlAn](#metaphlan) - Genome-level marker gene based taxonomic classifier
- [mOTUs](#motus) - Tool for marker gene-based OTU (mOTU) profiling.
- [ganon](#ganon) - Taxonomic classifier and profile that uses Interleaved Bloom Filters as indices based on k-mers/minimizers.
- [TAXPASTA](#taxpasta) - Tool to standardise taxonomic profiles as well as merge profiles across samples from the same database and classifier/profiler.
Expand Down Expand Up @@ -429,23 +429,23 @@ The main output of MALT is the `.rma6` file format, which can be only loaded int

You will only receive the `.sam` and `.megan` files if you supply `--malt_save_reads` and/or `--malt_generate_megansummary` parameters to the pipeline.

### MetaPhlAn3
### MetaPhlAn

[MetaPhlAn3](https://github.com/biobakery/metaphlan) is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level resolution via marker genes.
[MetaPhlAn](https://github.com/biobakery/metaphlan) is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level resolution via marker genes.

<details markdown="1">
<summary>Output files</summary>

- `metaphlan3/`
- `metaphlan3_<db_name>_combined_reports.txt`: A combined profile of all samples aligned to a given database (as generated by `metaphlan_merge_tables`)
- `metaphlan/`
- `metaphlan_<db_name>_combined_reports.txt`: A combined profile of all samples aligned to a given database (as generated by `metaphlan_merge_tables`)
- `<db_name>/`
- `<sample_id>.biom`: taxonomic profile in BIOM format
- `<sample_id>.bowtie2out.txt`: BowTie2 alignment information (can be re-used for skipping alignment when re-running MetaPhlAn3 with different parameters)
- `<sample_id>_profile.txt`: MetaPhlAn3 taxonomic profile including abundance estimates
- `<sample_id>.bowtie2out.txt`: BowTie2 alignment information (can be re-used for skipping alignment when re-running MetaPhlAn with different parameters)
- `<sample_id>_profile.txt`: MetaPhlAn taxonomic profile including abundance estimates

</details>

The main taxonomic profiling file from MetaPhlAn3 is the `*_profile.txt` file. This provides the abundance estimates from MetaPhlAn3 however does not include raw counts by default.
The main taxonomic profiling file from MetaPhlAn is the `*_profile.txt` file. This provides the abundance estimates from MetaPhlAn however does not include raw counts by default.

### mOTUs

Expand Down Expand Up @@ -535,7 +535,7 @@ The following report files are used for the taxpasta step:
- KrakenUniq: `<sample_id>_<db_name>.report.txt` Taxpasta uses the `reads` column for the standardised profile.
- Kraken2: `<sample_id>_<db_name>.report.txt` Taxpasta uses the `direct_assigned_reads` column for the standardised profile.
- MALT: `<sample_id>.txt.gz` Taxpasta uses the `count` (second) column from the output of MEGAN6's rma2info for the standardised profile.
- MetaPhlAn3: `<sample_id>_profile.txt` Taxpasta uses the `relative_abundance` column multiplied with a fixed number to yield an integer for the standardised profile.
- MetaPhlAn: `<sample_id>_profile.txt` Taxpasta uses the `relative_abundance` column multiplied with a fixed number to yield an integer for the standardised profile.
- mOTUs: `<sample_id>.out` Taxpasta uses the `read_count` column for the standardised profile.

> ⚠️ Please aware the outputs of each tool's standardised profile _may not_ be directly comparable between each tool. Some may report raw read counts, whereas others may report abundance information. Please always refer to the list above, for which information is used for each tool.
Expand Down
37 changes: 18 additions & 19 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ bracken,db1,;-r 150,/<path>/<to>/bracken/testdb-bracken.tar.gz
kraken2,db2,--quick,/<path>/<to>/kraken2/testdb-kraken2.tar.gz
krakenuniq,db3,,/<path>/<to>/krakenuniq/testdb-krakenuniq.tar.gz
centrifuge,db1,,/<path>/<to>/centrifuge/minigut_cf.tar.gz
metaphlan3,db1,,/<path>/<to>/metaphlan3/metaphlan_database/
metaphlan,db1,,/<path>/<to>/metaphlan/metaphlan_database/
motus,db_mOTU,,/<path>/<to>/motus/motus_database/
ganon,db1,,/<path>/<to>/ganon/test-db-ganon.tar.gz
```
Expand Down Expand Up @@ -130,7 +130,7 @@ The (uncompressed) database paths (`db_path`) for each tool are expected to cont
- [**Kraken2**:](#kraken2-custom-database) output of `kraken2-build` command(s).
- [**KrakenUniq**:](#krakenuniq-custom-database) output of `krakenuniq-build` command(s).
- [**MALT**](#malt-custom-database) output of `malt-build`.
- [**MetaPhlAn3**:](#metaphlan3-custom-database) output of with `metaphlan --install` or downloaded from links on the [MetaPhlAn3 wiki](https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.0#customizing-the-database).
- [**MetaPhlAn**:](#metaphlan-custom-database) output of with `metaphlan --install` or downloaded from links on the [MetaPhlAn wiki](https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-4#customizing-the-database).
- [**mOTUs**:](#motus-custom-database) the directory `db_mOTU/` that is downloaded via `motus downloadDB`.
- [**ganon**:](#ganon-custom-database) output of `ganon build` or `ganon build-custom`.

Expand Down Expand Up @@ -298,9 +298,9 @@ MALT does not support paired-end reads alignment (unlike other tools), therefore

Krona can only be run on MALT output if path to Krona taxonomy database supplied to `--krona_taxonomy_directory`. Therefore if you do not supply the a Krona directory, Krona plots will not be produced for MALT.

##### MetaPhlAn3
##### MetaPhlAn

MetaPhlAn3 currently does not accept FASTA files as input, therefore no output will be produced for these input files.
MetaPhlAn4 is compatible with the MetaPhlAn3 database by adding the `--mpa3` paramter to the MetaPhlAn process in the config file `module.config`.
LilyAnderssonLee marked this conversation as resolved.
Show resolved Hide resolved

##### mOTUs

Expand Down Expand Up @@ -339,7 +339,7 @@ The following tools will produce multi-sample taxon tables:
- **Centrifuge** (via KrakenTools' `combine_kreports.py` script)
- **Kaiju** (via Kaiju's `kaiju2table` tool)
- **Kraken2** (via KrakenTools' `combine_kreports.py` script)
- **MetaPhlAn3** (via MetaPhlAn's `merge_metaphlan_tables.py` script)
- **MetaPhlAn** (via MetaPhlAn's `merge_metaphlan_tables.py` script)
- **mOTUs** (via the `motus merge` command)
- **ganon** (via the `ganon table` command)

Expand Down Expand Up @@ -712,11 +712,11 @@ You can then add the `<YOUR_DB_NAME>/` path to your nf-core/taxprofiler database

See the [MALT manual](https://software-ab.informatik.uni-tuebingen.de/download/malt/manual.pdf) for more information.

#### MetaPhlAn3 custom database
#### MetaPhlAn custom database

MetaPhlAn3 does not allow (easy) construction of custom databases. Therefore we recommend to use the prebuilt database of marker genes that is provided by the developers.
MetaPhlAn does not allow (easy) construction of custom databases. Therefore we recommend to use the prebuilt database of marker genes that is provided by the developers.
LilyAnderssonLee marked this conversation as resolved.
Show resolved Hide resolved

To do this you need to have `MetaPhlAn3` installed on your machine.
To do this you need to have `MetaPhlAn` installed on your machine.

```bash
metaphlan --install --bowtie2db <YOUR_DB_NAME>/
Expand All @@ -731,21 +731,20 @@ You can then add the `<YOUR_DB_NAME>/` path to your nf-core/taxprofiler database
<details markdown="1">
<summary>Expected files in database directory</summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MetaPhlAn4


- `metaphlan3`
- `mpa_v30_CHOCOPhlAn_201901.pkl`
- `mpa_v30_CHOCOPhlAn_201901.pkl`
- `mpa_v30_CHOCOPhlAn_201901.fasta`
- `mpa_v30_CHOCOPhlAn_201901.3.bt2`
- `mpa_v30_CHOCOPhlAn_201901.4.bt2`
- `mpa_v30_CHOCOPhlAn_201901.1.bt2`
- `mpa_v30_CHOCOPhlAn_201901.2.bt2`
- `mpa_v30_CHOCOPhlAn_201901.rev.1.bt2`
- `mpa_v30_CHOCOPhlAn_201901.rev.2.bt2`
- `metaphlan`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.pkl`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.fna.bz2`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.1.bt2l`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.2.bt2l`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.3.bt2l`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.4.bt2l`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.rev.1.bt2l`
- `mpa_vJan21_TOY_CHOCOPhlAnSGB_202103.rev.2.bt2l`
- `mpa_latest`

</details>

More information on the MetaPhlAn3 database can be found [here](https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.1#installation).
More information on the MetaPhlAn database can be found [here](https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-4#Pre-requisites).

#### mOTUs custom database

Expand Down
8 changes: 4 additions & 4 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -156,14 +156,14 @@
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"metaphlan3/mergemetaphlantables": {
"metaphlan/mergemetaphlantables": {
"branch": "master",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"git_sha": "9aa59197c0fb35c29e315bcd10c0fc9e1afc70a8",
"installed_by": ["modules"]
},
"metaphlan3/metaphlan3": {
"metaphlan/metaphlan": {
"branch": "master",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"git_sha": "31ec4470b455fe88c072151a5ea7821bfb2add38",
"installed_by": ["modules"]
},
"minimap2/align": {
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading