-
Notifications
You must be signed in to change notification settings - Fork 2
Checkm2 metadata
To update Checkm2 information for genomes in GTDB, We only run genomes that have been processed in the Checkm v1 step. We then use the ncbi_assembly_metadata.tsv file generated previously to select the tranlation table of interest.
gtdb_migration_tk prepare_checkm2 --checkm_summary_genbank /srv/db/gtdb/metadata/release<#>/checkm/genbank/checkm.profiles.tsv --checkm_summary_refseq /srv/db/gtdb/metadata/release<#>/checkm/refseq/checkm.profiles.tsv -g /srv/db/gtdb/genomes/ncbi/release<#>/genome_dirs.tsv -o /srv/db/gtdb/metadata/release<#>/checkm2/ -l logs/prepare_checkm2.log -m /srv/db/gtdb/metadata/release<#>/metadata_tables/ncbi_assembly_metadata.tsv
This creates a checkm_cmds.lst file in /srv/db/gtdb/metadata/release<#>/checkm2/
Because checkm1 is already installed in the gtdb-migration-tk env, I havent installed Checkm2 yet.
so for now the pipeline is:
conda activate checkm2_1.0.2
sh checkm_cmds.lst
We want to add them in the db Information can be found. under /srv/whitlam/projects1/gtdb/studies/checkm2/r214 For now , I have just copied the checkm2-gtdb_r214.tsv: cut -f1,2,3,4 checkm2-gtdb_r214.tsv > checkm2-metadata.tsv replace the 4 columns name in checkm2-metadata.tsv with: genome_id,checkm2_completeness,checkm2_contamination,checkm2_model I have remove the extension .fna from genome_id and change the model_used information to only Specific or General