• DocumentCode
    1220205
  • Title

    An O(N2) algorithm for discovering optimal Boolean pattern pairs

  • Author

    Bannai, Hideo ; Hyyrö, Heikki ; Shinohara, Ayumi ; Takeda, Masayuki ; Nakai, Kenta ; Miyano, Satoru

  • Author_Institution
    Inst. of Medical Sci., Tokyo Univ.
  • Volume
    1
  • Issue
    4
  • fYear
    2004
  • Firstpage
    159
  • Lastpage
    170
  • Abstract
    We consider the problem of finding the optimal combination of string patterns, which characterizes a given set of strings that have a numeric attribute value assigned to each string. Pattern combinations are scored based on the correlation between their occurrences in the strings and the numeric attribute values. The aim is to find the combination of patterns which is best with respect to an appropriate scoring function. We present an O(N2) time algorithm for finding the optimal pair of substring patterns combined with Boolean functions, where N is the total length of the sequences. The algorithm looks for all possible Boolean combinations of the patterns, e.g., patterns of the form p nland notq, which indicates that the pattern pair is considered to occur in a given string s, if p occurs in s, and q does not occur in s. An efficient implementation using suffix arrays is presented, and we further show that the algorithm can be adapted to find the best k-pattern Boolean combination in O(Nk) time. The algorithm is applied to mRNA sequence data sets of moderate size combined with their turnover rates for the purpose of finding regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing mRNA decay
  • Keywords
    Boolean functions; biology computing; genetics; molecular biophysics; O(N2) time algorithm; k-pattern Boolean combination; mRNA decay; mRNA sequence; numeric attribute values; optimal Boolean pattern pairs; pattern combinations; string patterns; suffix arrays; Adaptive arrays; Bioinformatics; Biological information theory; Biology computing; Boolean functions; DNA; Data mining; Genomics; Organisms; Sequences; Boolean patterns; Pattern discovery; suffix array.; suffix tree; Algorithms; Base Sequence; Computational Biology; DNA; Genes, Fungal; Humans; Models, Statistical; Models, Theoretical; Pattern Recognition, Automated; RNA, Messenger; Sequence Analysis, DNA; Software;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2004.36
  • Filename
    1388181