• DocumentCode
    952127
  • Title

    DNA Motif Representation with Nucleotide Dependency

  • Author

    Chin, Francis ; Leung, Henry C M

  • Author_Institution
    Univ. of Hong Kong, Hong Kong
  • Volume
    5
  • Issue
    1
  • fYear
    2008
  • Firstpage
    110
  • Lastpage
    119
  • Abstract
    The problem of discovering novel motifs of binding sites is important to the understanding of gene regulatory networks. Motifs are generally represented by matrices (position weight matrix (PWM) or position specific scoring matrix (PSSM)) or strings. However, these representations cannot model biological binding sites well because they fail to capture nucleotide interdependence. It has been pointed out by many researchers that the nucleotides of the DNA binding site cannot be treated independently, for example, the binding sites of zinc finger in proteins. In this paper, a new representation called scored position specific pattern (SPSP), which is a generalization of the matrix and string representations, is introduced, which takes into consideration the dependent occurrences of neighboring nucleotides. Even though the problem of discovering the optimal motif in SPSP representation is proved to be NP-hard, we introduce a heuristic algorithm called SPSP Finder, which can effectively find optimal motifs in most simulated cases and some real cases for which existing popular motif-finding software, such as Weeder, MEME, and AlignACE, fail.
  • Keywords
    DNA; biocomputing; bonds (chemical); cellular biophysics; computer aided software engineering; genetics; molecular biophysics; proteins; AlignACE; DNA binding site; DNA motif; MEME; SPSP finder; Weeder; gene regulatory networks; heuristic algorithm; motif-finding software; nucleotide; position specific scoring matrix; position weight matrix; proteins; strings; zinc finger; Computing Methodologies; Design Methodology; Pattern Recognition; Pattern analysis; Algorithms; Animals; Base Sequence; Binding Sites; Computer Simulation; Conserved Sequence; Drosophila; Pattern Recognition, Automated; Promoter Regions (Genetics); Regulatory Sequences, Nucleic Acid; Saccharomyces cerevisiae; Sequence Analysis, DNA;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2007.70220
  • Filename
    4359878