-
Notifications
You must be signed in to change notification settings - Fork 22
Input Files format
Peng Jia edited this page Dec 19, 2020
·
1 revision
Reference file is needed in the scan module, and the version and contig name of reference you used should be in accordance with your bam files. And the reference should be save as uncompressed fasta (fa) format.
Microsatellite file is need in all module of MSIsesnsor-pro,it contain the chromosome, location, repeat unit, repeat unit length, and other information of each microsatellite.
Example:
chromosome | location | repeat_unit_length | repeat_unit_binary | repeat_times | left_flank_binary | right_flank_binary | repeat_unit_bases | left_flank_bases | right_flank_bases | threshold | supportSamples |
---|---|---|---|---|---|---|---|---|---|---|---|
chr1 | 3780974 | 1 | 0 | 11 | 221 | 321 | A | ATCTC | CCAAC | 0.080981 | 30 |
chr1 | 3784993 | 1 | 0 | 13 | 885 | 758 | A | TCTCC | GTTCG | 0.007576 | 18 |
chr1 | 3836468 | 1 | 3 | 24 | 342 | 80 | T | CCCCG | ACCAA | 0.061750 | 19 |
chr1 | 3872414 | 1 | 0 | 13 | 834 | 545 | A | TCAAG | GAGAC | 0.02842 | 28 |
chr1 | 4712522 | 3 | 20 | 7 | 662 | 421 | CCA | GGCCG | CGGCC | 0.024391 | 3 |
Note:
- Columns with *_binary means: binary conversion of DNA bases based on A=00, C=01, G=10, and T=11.
- threshold means: the unstable baseline of slippages. It is calculated in baseline module and applied in pro module.
- supportSamples means: the number of samples with sufficient reads covered.
bam file need to be sorted and the index file are required.
configure file is needed in baseline module, the first column is the sample name and the second is the absolute path of its bam file.
Example:
case1 /path/to/normal/case1_sorted.bam
case2 /path/to/normal/case2_sorted.bam
case3 /path/to/normal/case3_sorted.bam