• DocumentCode
    2724549
  • Title

    A Probability Similarity Scoring Schema Incorporating Positional Trends in Information Content for DNA Motifs Comparison

  • Author

    Tian, Bin ; Gong, Xiu-Jun

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
  • fYear
    2012
  • fDate
    11-13 Aug. 2012
  • Firstpage
    2052
  • Lastpage
    2055
  • Abstract
    Motifs comparison plays a key role in clustering of redundant motifs and mapping motifs to transcription factors from previously characterized motif databases. Most of existed algorithms decompose the similarity of two motifs into the sum of similarities of aligned positions using position-independence assumption. However it is unreasonable to compare two motifs with vast difference in length. In this paper, we present a novel features extraction method, which extracts statistical information of positions information content and pair wise nucleotide dependencies. Then we combine these two aspects of information into one uniform formula called probability similarity scoring schema (PS3). Results on simulated dataset generated from JASPAR database demonstrates that our method outperforms others, and experiments on a real dataset from human kidney tissue shows that our method finds many motifs that are not only found in human tissue but also in relevant species such as mouse and rat, which indicates that it´s a possible approach for elucidating DNA motifs that employing cross-species sequence conservation.
  • Keywords
    DNA; biological tissues; feature extraction; kidney; medical computing; pattern clustering; probability; redundancy; statistical analysis; DNA motif comparison; DNA motif elucidation; JASPAR motif database; PS3; cross-species sequence conservation; feature extraction method; human kidney tissue; information content; motif mapping; motif similarity decomposition; mouse; pairwise nucleotide dependencies; position information content; position-independence assumption; positional trends; probability similarity scoring schema; rat; redundant motif clustering; statistical information extraction; transcription factors; Bioinformatics; Clustering algorithms; DNA; Databases; Educational institutions; Humans; Kidney; Motifs comparison; Pairwise nucleotide dependencies; Positions information content; Probability Similarity Scoring Schema;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science & Service System (CSSS), 2012 International Conference on
  • Conference_Location
    Nanjing
  • Print_ISBN
    978-1-4673-0721-5
  • Type

    conf

  • DOI
    10.1109/CSSS.2012.510
  • Filename
    6394828