What about MLST not hosted at pubmlst.org? #185

MostafaYA · 2017-07-19T07:09:20Z

Hi,
The mlst scheme I work on is not hosted at PUBMLST. How can I format it to run against with ariba?

martinghunt · 2017-07-20T09:13:12Z

You'll need 3 files:

FASTA file of your sequences
A file that defines the MLST scheme.
A file to define the clustering of the sequences.

If you run ariba pubmlstget "Mycobacterium abscessus" out (or another organism of your choice) then you can see example files.

The MLST scheme is a tab-delimied file that looks like this:

$ head -n 3 out/ref_db/pubmlst.profile.txt
ST	Mab_argH	Mab_cya	Mab_gnd	Mab_murC	Mab_pta	Mab_purH	Mab_rpoB	clonal_complex
1	2	1	1	2	2	3	1	3
2	4	1	1	2	7	4	4	8

(you don't need the final clonal_complex column. It's just there in some schemes and ariba ignores it).

The clustering file will be the fun one to make, and needs to be like this:

$ cat out/clusters.tsv
Mab_argH.1	Mab_argH.2	Mab_argH.3	Mab_argH.4	Mab_argH.5	Mab_argH.6
Mab_cya.1	Mab_cya.2	Mab_cya.3	Mab_cya.4	Mab_cya.5
Mab_gnd.1	Mab_gnd.2	Mab_gnd.3	Mab_gnd.4	Mab_gnd.5	Mab_gnd.6	Mab_gnd.7
Mab_murC.1	Mab_murC.2	Mab_murC.3	Mab_murC.4	Mab_murC.5	Mab_murC.6	Mab_murC.7	Mab_murC.8
Mab_pta.1	Mab_pta.10	Mab_pta.11	Mab_pta.2	Mab_pta.3	Mab_pta.4	Mab_pta.5	Mab_pta.6	Mab_pta.7	Mab_pta.8	Mab_pta.9
Mab_purH.1	Mab_purH.2	Mab_purH.3	Mab_purH.4	Mab_purH.5	Mab_purH.6	Mab_purH.7

ie one line per cluster. All the allele names for a given gene on one line, tab-delimited.

Once you have your 3 files, run this:

ariba prepareref --cdhit_clusters clusters.tsv --fasta seqs.fa --all_coding no prepareref_out

and then put a copy of your profile file inside the output_directory:

cp my_profile.txt prepareref_out/pubmlst.profile.txt

... it must be called pubmlst.profile.txt inside there to make ariba do MLST calling.

MostafaYA · 2017-07-24T11:08:34Z

It worked. Thanks

slvrshot · 2021-01-01T05:23:42Z

Is there a script that can easily convert all fasta headers from all the alleles per loci into this clustering file? I have a few ideas but it's probably not very efficient.

MostafaYA closed this as completed Jul 24, 2017

MostafaYA mentioned this issue Nov 30, 2017

MLST sequences are shorter than their reference #207

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What about MLST not hosted at pubmlst.org? #185

What about MLST not hosted at pubmlst.org? #185

MostafaYA commented Jul 19, 2017

martinghunt commented Jul 20, 2017 •

edited

Loading

MostafaYA commented Jul 24, 2017

slvrshot commented Jan 1, 2021

What about MLST not hosted at pubmlst.org? #185

What about MLST not hosted at pubmlst.org? #185

Comments

MostafaYA commented Jul 19, 2017

martinghunt commented Jul 20, 2017 • edited Loading

MostafaYA commented Jul 24, 2017

slvrshot commented Jan 1, 2021

martinghunt commented Jul 20, 2017 •

edited

Loading