DocumentCode
2724549
Title
A Probability Similarity Scoring Schema Incorporating Positional Trends in Information Content for DNA Motifs Comparison
Author
Tian, Bin ; Gong, Xiu-Jun
Author_Institution
Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
fYear
2012
fDate
11-13 Aug. 2012
Firstpage
2052
Lastpage
2055
Abstract
Motifs comparison plays a key role in clustering of redundant motifs and mapping motifs to transcription factors from previously characterized motif databases. Most of existed algorithms decompose the similarity of two motifs into the sum of similarities of aligned positions using position-independence assumption. However it is unreasonable to compare two motifs with vast difference in length. In this paper, we present a novel features extraction method, which extracts statistical information of positions information content and pair wise nucleotide dependencies. Then we combine these two aspects of information into one uniform formula called probability similarity scoring schema (PS3). Results on simulated dataset generated from JASPAR database demonstrates that our method outperforms others, and experiments on a real dataset from human kidney tissue shows that our method finds many motifs that are not only found in human tissue but also in relevant species such as mouse and rat, which indicates that it´s a possible approach for elucidating DNA motifs that employing cross-species sequence conservation.
Keywords
DNA; biological tissues; feature extraction; kidney; medical computing; pattern clustering; probability; redundancy; statistical analysis; DNA motif comparison; DNA motif elucidation; JASPAR motif database; PS3; cross-species sequence conservation; feature extraction method; human kidney tissue; information content; motif mapping; motif similarity decomposition; mouse; pairwise nucleotide dependencies; position information content; position-independence assumption; positional trends; probability similarity scoring schema; rat; redundant motif clustering; statistical information extraction; transcription factors; Bioinformatics; Clustering algorithms; DNA; Databases; Educational institutions; Humans; Kidney; Motifs comparison; Pairwise nucleotide dependencies; Positions information content; Probability Similarity Scoring Schema;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science & Service System (CSSS), 2012 International Conference on
Conference_Location
Nanjing
Print_ISBN
978-1-4673-0721-5
Type
conf
DOI
10.1109/CSSS.2012.510
Filename
6394828
Link To Document