Title :
Reference-free inference of tumor phylogenies from single-cell sequencing data
Author :
Subramanian, Ananth ; Schwartz, R.
Author_Institution :
Dept. of Biol. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Effective management and treatment of cancer is greatly complicated by the rapid evolution and resulting heterogeneity of tumors. In prior work, we showed that phylogenetic study of cell populations in single tumors provides a way to make sense of this heterogeneity and identify robust features of evolutionary processes of single tumors. The introduction of single-cell sequencing has shown great promise for advancing single-tumor phylogenetics, but the volume and high noise of these data present many challenges for studying tumor evolution, especially with regard to the chromosome abnormalities that typically dominate tumor evolution. We propose a reference-free approach to mining genome sequence reads to allow predictive classification of tumors into heterogeneous types and reconstruct models of their evolution. The approach extracts k-mer counts from single-cell tumor sequences, using differences in normalized k-mer frequencies as a proxy for overall evolutionary distance between distinct cells. The approach is computationally more efficient in time and space than standard protocols for deriving phylogenetic markers, which rely on first aligning sequence reads to a reference genome and then processing the data downstream to extract meaningful progression markers and use them to construct phylogenetic trees. The approach also provides a way to bypass some of the challenges that massive genome rearrangement typical of tumor genomes present for reference-based methods. To handle the unique challenges of single-cell sequencing data, we have applied a series of noise correction measures intended to account for biases due to the sequencing technology. We illustrate the method using publicly available tumor single cell sequencing data. Phylogenies built from these k-mer spectrum distance matrices yield splits that are statistically significant when tested for their ability to partition cells at different stages of cancer.
Keywords :
bioinformatics; cancer; cellular biophysics; data mining; evolution (biological); genetics; genomics; tumours; cancer stages; cancer treatment; cell partitioning; cell populations; chromosome abnormalities; data downstream; evolutionary processes; first aligning sequence reads; genome sequence mining; heterogeneous types; k-mer counts; k-mer spectrum distance matrices; massive genome rearrangement; meaningful progression markers; noise correction measures; normalized k-mer frequencies; overall evolutionary distance; phylogenetic markers; phylogenetic study; phylogenetic trees; predictive classification; proxy; reference genome; reference-based methods; reference-free inference; sequencing technology; single tumors; single-cell sequencing data; single-cell tumor sequences; single-tumor phylogenetics; standard protocols; tumor evolution; tumor genomes; tumor heterogeneity; tumor single cell sequencing data; using; Bioinformatics; Genomics; Phylogeny; Sequential analysis; Tumors; evolutionary trees; single cell sequencing; tumor evolution; tumor phylogeny;
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4799-5786-6
DOI :
10.1109/ICCABS.2014.6863944