Skip to content

File Format

CYZ_Torry edited this page Jan 30, 2022 · 14 revisions

Input File Format

3dg File

3dg file stores the spatial positions of each genomic bin. It formatted as (separated by \t)

chr start end x y z is_valid

chr is the chromosome. start, end is the genomic position. x, y, z is the spatial position. is_valid denoted if the bin is valid (1 for valid, 0 for invalid). Usually, bins with the least contacts in Hi-C map are regarded as invalid. If you cannnot tell the validity, add 1 at the end of each row.

Index File

Index file is used at entire work. It indexes the genomic bins for faster searching. It formatted as (separated by \t)

chr start end index

chr is the chromosome. start, end is the genomic position. index is the index for genomic bins. This file can be created using *bedtools, as followed

bedtools makewindows -g REF/GENOME/PATH -w RESOLUTION | awk -v OFS="\t" 'BEGIN{i=0}{print $1, $2, $3, i; i+=1}' > OUT_FILE

Marker File

Marker file is bed files denoting the markers enrichment at genome. Here we recommende to use the fold change over control file, for it preserving the enrichment information of the whole genome. The file is a noremal bed file format (separated by \t), as follow

chr start end value

chr is the chromosome. start, end is the genomic position. value is the enrichment.

Output File Format

den_dtp

den_dtp is a tsv file storing the density and DisTP. It formatted as

chr start end index is_valid x_loc y_loc z_loc density DisTP

chr is the chromosome. start, end is the genomic position. is_valid denoted if the bin is valid, same as inputted is_valid. x_loc, y_loc, z_loc is the spatial index for bin (see Methods from paper).

hist

After D2 map, we puts the bins on density-DisTP matrix. The matrix is two-dimensional, with density on x-axis and DisTP on y-axis. For efficiently storing, we use np.reshape to reshape the matrix to one-dimensional and output to hist file. The first two lines in hist file denote the x axis and y axis of matrix. The third line shows the header for the rest of lines.

Clone this wiki locally