Title :
LGH: A Fast and Accurate Algorithm for Single Individual Haplotyping Based on a Two-Locus Linkage Graph
Author :
Minzhu Xie ; Jianxin Wang ; Xin Chen
Author_Institution :
Sch. of Phys. & Math. Sci., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
Phased haplotype information is crucial in our complete understanding of differences between individuals at the genetic level. Given a collection of DNA fragments sequenced from a homologous pair of chromosomes, the problem of single individual haplotyping (SIH) aims to reconstruct a pair of haplotypes using a computer algorithm. In this paper, we encode the information of aligned DNA fragments into a two-locus linkage graph and approach the SIH problem by vertex labeling of the graph. In order to find a vertex labeling with the minimum sum of weights of incompatible edges, we develop a fast and accurate heuristic algorithm. It starts with detecting error-tolerant components by an adapted breadth-first search. A proper labeling of vertices is then identified for each component, with which sequencing errors are further corrected and edge weights are adjusted accordingly. After contracting each error-tolerant component into a single vertex, the above procedure is iterated on the resulting condensed linkage graph until error-tolerant components are no longer detected. The algorithm finally outputs a haplotype pair based on the vertex labeling. Extensive experiments on simulated and real data show that our algorithm is more accurate and faster than five existing algorithms for single individual haplotyping.
Keywords :
DNA; genetics; molecular biophysics; DNA fragments; LGH; SIH problem; chromosomes; computer algorithm; condensed linkage graph; edge weights; error-tolerant component; genetic level; haplotype pair; heuristic algorithm; homologous pair; phased haplotype information; single individual haplotyping; two-locus linkage graph; vertex labeling; Bioinformatics; Biological cells; DNA; Genomics; Labeling; Sequential analysis; Single individual haplotyping; algorithm; error correction; next-generation sequencing;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2015.2430352