• DocumentCode
    1151669
  • Title

    Accuracy Assessment of Diploid Consensus Sequences

  • Author

    Kim, Jong Hyun ; Waterman, Michael S. ; Li, Lei M.

  • Author_Institution
    Dept. of Comput. Sci. & the Molecular & Comput. Biol. Program, Southern California Univ., Los Angeles, CA
  • Volume
    4
  • Issue
    1
  • fYear
    2007
  • Firstpage
    88
  • Lastpage
    97
  • Abstract
    If the origins of fragments are known in genome sequencing projects, it is straightforward to reconstruct diploid consensus sequences. In reality, however, this is not true. Although there are proposed methods to reconstruct haplotypes from genome sequencing projects, an accuracy assessment is required to evaluate the confidence of the estimated diploid consensus sequences. In this paper, we define the confidence score of diploid consensus sequences. It requires the calculation of the likelihood of an assembly. To calculate the likelihood, we propose a linear time algorithm with respect to the number of polymorphic sites. The likelihood calculation and confidence score are used for further improvements of haplotype estimation in two directions. One direction is that low-scored phases are disconnected. The other direction is that, instead of using nominal frequency 1/2, the haplotype frequency is estimated to reflect the actual contribution of each haplotype. Our method was evaluated on the simulated data whose polymorphism rate (1.2 percent) was based on Ciona intestinalis. As a result, the high accuracy of our algorithm was indicated: The true positive rate of the haplotype estimation was greater than 97 percent
  • Keywords
    biology computing; genetics; maximum likelihood estimation; molecular biophysics; molecular configurations; polymorphism; Ciona intestinalis; diploid consensus sequences; genome sequencing; haplotype frequency; haplotype reconstruction; linear time algorithm; maximum likelihood estimation; polymorphic sites; polymorphism; Assembly; Bioinformatics; Cloning; Computational biology; Frequency estimation; Genomics; Intestines; Organisms; Redundancy; Sequences; Haplotype; diploid.; polymorphism; shotgun sequencing; Algorithms; Animals; Ciona intestinalis; Computational Biology; Computer Simulation; Consensus Sequence; Diploidy; Gene Frequency; Haplotypes; Likelihood Functions; Markov Chains; Models, Statistical; Polymorphism, Genetic; Probability; Sequence Analysis, DNA;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2007.1007
  • Filename
    4104462