• DocumentCode
    1000008
  • Title

    Inferring correlation between database queries: analysis of protein sequence patterns

  • Author

    Guigó, Roderic ; Smith, Temple F.

  • Author_Institution
    Dept. of Biostat., Harvard Univ., Cambridge, MA, USA
  • Volume
    15
  • Issue
    10
  • fYear
    1993
  • fDate
    10/1/1993 12:00:00 AM
  • Firstpage
    1030
  • Lastpage
    1041
  • Abstract
    Given a subset P of a database, the problem of finding the query φ in a given database attribute having the closest extension to P is addressed. In the particular case that is outlined, P is the set of protein sequences in a protein sequence database matching a given protein sequence pattern, whereas φ is a query in the annotation of the database. Ideally, φ is the description of a biological function. If the extension of φ is very similar to P, an association between the pattern and the biological function described by the query may be inferred. An algorithm that efficiently searches the query space when negation is not considered is developed. Since the query language is a first-order language, the query space may be mapped into a set algebra in which a measure of stochastic dependence-an asymptotic approximation of the correlation coefficient-is used as a measure of set similarity. The algorithm uses the algebraic properties of such a measure to reduce the time required to search the query space. A prototype implementation of the algorithm has been tested in different collections of protein sequence patterns
  • Keywords
    algebra; biology computing; database theory; proteins; query processing; set theory; annotation query; asymptotic approximation; correlation coefficient; correlation inference; database queries; first-order language; protein sequence database; protein sequence pattern analysis; query language; query space; set algebra; set similarity measure; stochastic dependence measurement; Biological information theory; Biomedical measurements; Cancer; Data analysis; Databases; Helium; Pattern analysis; Protein sequence; Sequences; Stochastic processes;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/34.254060
  • Filename
    254060