• DocumentCode
    2506499
  • Title

    Indexing weighted-sequences in large databases

  • Author

    Wang, Haixun ; Perng, Chang-Shing ; Fan, Wei ; Park, Sanghyun ; Yu, Philip S.

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Hawthorne, NY, USA
  • fYear
    2003
  • fDate
    5-8 March 2003
  • Firstpage
    63
  • Lastpage
    74
  • Abstract
    We present an index structure for managing weighted-sequences in large databases. A weighted-sequence is defined as a two-dimensional structure where each element in the sequence is associated with a weight. A series of network events, for instance, is a weighted-sequence in that each event has a timestamp. Querying a large sequence database by events´ occurrence patterns is a first step towards understanding the temporal causal relationships among the events. The index structure proposed enables us to efficiently retrieve from the database all subsequences, possibly noncontiguous, that match a given query sequence both by events and by weights. The index method also takes into consideration the nonuniformfrequency distribution of events in the sequence data. In addition, our method finds a broad range of applications in indexing scientific data consisting of multiple numerical columns for discovery of correlations among these columns. For instance, indexing a DNA microarray that records expression levels of genes under different conditions enables us to search for genes whose responses to various experimental perturbations follow a given pattern. We demonstrate, using real-world data sets, that our method is effective and efficient.
  • Keywords
    DNA; database indexing; query formulation; query processing; statistical distributions; very large databases; DNA microarray; index structure; large databases; nonuniformfrequency distribution; real-world data sets; temporal causal relationship; weighted-sequences; Computer science; DNA; Data analysis; Data engineering; Databases; Engineering management; Indexes; Indexing; Information retrieval; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2003. Proceedings. 19th International Conference on
  • Print_ISBN
    0-7803-7665-X
  • Type

    conf

  • DOI
    10.1109/ICDE.2003.1260782
  • Filename
    1260782