Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about MLST not hosted at pubmlst.org? #185

Closed
MostafaYA opened this issue Jul 19, 2017 · 3 comments
Closed

What about MLST not hosted at pubmlst.org? #185

MostafaYA opened this issue Jul 19, 2017 · 3 comments

Comments

@MostafaYA
Copy link

Hi,
The mlst scheme I work on is not hosted at PUBMLST. How can I format it to run against with ariba?

@martinghunt
Copy link
Contributor

martinghunt commented Jul 20, 2017

You'll need 3 files:

  1. FASTA file of your sequences
  2. A file that defines the MLST scheme.
  3. A file to define the clustering of the sequences.

If you run ariba pubmlstget "Mycobacterium abscessus" out (or another organism of your choice) then you can see example files.

The MLST scheme is a tab-delimied file that looks like this:

$ head -n 3 out/ref_db/pubmlst.profile.txt
ST	Mab_argH	Mab_cya	Mab_gnd	Mab_murC	Mab_pta	Mab_purH	Mab_rpoB	clonal_complex
1	2	1	1	2	2	3	1	3
2	4	1	1	2	7	4	4	8

(you don't need the final clonal_complex column. It's just there in some schemes and ariba ignores it).

The clustering file will be the fun one to make, and needs to be like this:

$ cat out/clusters.tsv
Mab_argH.1	Mab_argH.2	Mab_argH.3	Mab_argH.4	Mab_argH.5	Mab_argH.6
Mab_cya.1	Mab_cya.2	Mab_cya.3	Mab_cya.4	Mab_cya.5
Mab_gnd.1	Mab_gnd.2	Mab_gnd.3	Mab_gnd.4	Mab_gnd.5	Mab_gnd.6	Mab_gnd.7
Mab_murC.1	Mab_murC.2	Mab_murC.3	Mab_murC.4	Mab_murC.5	Mab_murC.6	Mab_murC.7	Mab_murC.8
Mab_pta.1	Mab_pta.10	Mab_pta.11	Mab_pta.2	Mab_pta.3	Mab_pta.4	Mab_pta.5	Mab_pta.6	Mab_pta.7	Mab_pta.8	Mab_pta.9
Mab_purH.1	Mab_purH.2	Mab_purH.3	Mab_purH.4	Mab_purH.5	Mab_purH.6	Mab_purH.7

ie one line per cluster. All the allele names for a given gene on one line, tab-delimited.

Once you have your 3 files, run this:

ariba prepareref --cdhit_clusters clusters.tsv --fasta seqs.fa --all_coding no prepareref_out

and then put a copy of your profile file inside the output_directory:

cp my_profile.txt prepareref_out/pubmlst.profile.txt

... it must be called pubmlst.profile.txt inside there to make ariba do MLST calling.

@MostafaYA
Copy link
Author

It worked. Thanks

@slvrshot
Copy link

slvrshot commented Jan 1, 2021

Is there a script that can easily convert all fasta headers from all the alleles per loci into this clustering file? I have a few ideas but it's probably not very efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants