Skip to content
Gavin Ha edited this page Sep 19, 2018 · 6 revisions

Files output by ichorCNA

  • <sampleID>.seg
    • Segments called by the Viterbi algorithm. Format is compatible with IGV.
  • <sampleID>.seg.txt
    • Same as <sampleID>.seg but also includes subclonal status of segments (0=clonal, 1=subclonal). Format not compatible with IGV.
  • <sampleID>.cna.seg
    • Estimated copy number, log ratio, and subclone status for each bin/window.
  • <sampleID>.params.txt
    • Final converged parameters for optimal solution. Also contains table of converged parameters for all solutions.
  • <sampleID>.correctedDepth.txt
    • Log2 ratio of each bin/window after correction for GC and mappability biases.
  • <sampleID>.RData
    • Saved R image after ichorCNA has finished. Results for all solutions will be included.
  • <sampleID>/
    • Directory of plots
  • <sampleID>/<sampleID>_CNA_chr#.pdf
    • Copy number plot for each individual chromosome.
  • <sampleID>/<sampleID>_bias.pdf
    • Plots illustrating data before/after GC and mappability bias correction.
  • <sampleID>/<sampleID>_correct.pdf
    • Genome wide plot of data before/after GC and mappability bias correction.
  • <sampleID>/<sampleID>_genomeWide_n##-p#.pdf
    • Genome wide plot of data annotated for estimated copy number, tumor fraction, and ploidy for solution initialized with n and p. n=non-tumor proportion; p=tumor ploidy
  • <sampleID>/<sampleID>_genomeWide.pdf
    • Genome wide plot of data annotated for estimated copy number, tumor fraction, and ploidy for the optimal solution.
  • <sampleID>/<sampleID>_tpdf.pdf
    • Plot of students-t distributions for each copy number state using converged parameters from the optimal solution.

Important Parameter Definitions in <sampleID>_params.txt

Tumor Fraction

  • Estimated fraction of tumor-derived DNA. Equivalent to purity in bulk tumor analysis.

Tumor Ploidy

  • Average number of copies of the tumor-derived genome.
    • Note that the overall sample ploidy is 2 * (1 - tumor.fraction) + tumor.fraction * tumor.ploidy

Subclone Fraction

  • Fraction of tumor-derived DNA that is subclonal

Fraction Genome Subclonal

  • Fraction of all bins that are subclonal

Fraction CNA Subclonal

  • Fraction of copy number altered bins that are subclonal

GC-Map correction MAD

  • Measure of the noise in the data following GC-content bias correction. Computed as the median absolute deviation of differences between adjacent bins.

Parameters for all solutions n_est

  • Listed for all solutions. The estimated fraction of non-tumor-derived (normal) DNA. Equivalent to 1 - tumor.fraction.

phi_est

  • Listed for all solutions. The estimated tumor ploidy (average number of copies of the tumor-derived DNA).

loglik

  • Listed for all solutions. The complete log-likelihood of each solution after EM convergence. The optimal solution is chosen as the solution with the highest log-likelihood. However, additional pre-defined filtering criteria will exclude solutions from consideration.
    For example, a solution with the proportion of the genome altered by subclonal events is larger than a certain value will be excluded. See ichorCNA arguments for more details on various criteria that can be set.

Interpreting genome wide plots

Genome wide plots represent the log2 ratio copy number for each bin in the genome. These data points have already been corrected for GC-content and mappability bias as well as with a panel of normals or matched normal sample if used. The color of each data point corresponds to the estimated integer copy number. The color mapping is:

  • 1 copy = dark green
  • 2 copies = blue
  • 3 copies = brown
  • 4+ copies = red

The segment medians are also plotted as horizontal lines with the same color as the event itself if it is predicted to be clonal. A light green segment represents a subclonal prediction. The estimated tumor fraction and ploidy is printed at the top of the plot as well. To get more details for a particular segment, look in the <sampleID>.seg or <sampleID>.seg.txt file. For specific values for each data point, look in the <sampleID>.cna.seg file.