Skip to content
Pierre Chaumeil edited this page Oct 2, 2024 · 2 revisions

Add CheckM v2 information

To update Checkm2 information for genomes in GTDB, We only run genomes that have been processed in the Checkm v1 step. We then use the ncbi_assembly_metadata.tsv file generated previously to select the tranlation table of interest.

gtdb_migration_tk prepare_checkm2 --checkm_summary_genbank /srv/db/gtdb/metadata/release<#>/checkm/genbank/checkm.profiles.tsv --checkm_summary_refseq /srv/db/gtdb/metadata/release<#>/checkm/refseq/checkm.profiles.tsv -g /srv/db/gtdb/genomes/ncbi/release<#>/genome_dirs.tsv -o /srv/db/gtdb/metadata/release<#>/checkm2/ -l logs/prepare_checkm2.log -m /srv/db/gtdb/metadata/release<#>/metadata_tables/ncbi_assembly_metadata.tsv

This creates a checkm_cmds.lst file in /srv/db/gtdb/metadata/release<#>/checkm2/

Because checkm1 is already installed in the gtdb-migration-tk env, I havent installed Checkm2 yet.
so for now the pipeline is:

conda activate checkm2_1.0.2
sh checkm_cmds.lst

We want to add them in the db Information can be found. under /srv/whitlam/projects1/gtdb/studies/checkm2/r214 For now , I have just copied the checkm2-gtdb_r214.tsv: cut -f1,2,3,4 checkm2-gtdb_r214.tsv > checkm2-metadata.tsv replace the 4 columns name in checkm2-metadata.tsv with: genome_id,checkm2_completeness,checkm2_contamination,checkm2_model I have remove the extension .fna from genome_id and change the model_used information to only Specific or General