• DocumentCode
    2404976
  • Title

    OSSM: a segmentation approach to optimize frequency counting

  • Author

    Leung, Carson Kai-Sang ; Ng, Raymond T. ; Mannila, Heikki

  • Author_Institution
    British Columbia Univ., Vancouver, BC, Canada
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    583
  • Lastpage
    592
  • Abstract
    Computing the frequency of a pattern is one of the key operations in data mining algorithms. We describe a simple yet powerful way of speeding up any form of frequency counting satisfying the monotonicity condition. Our method, the optimized segment support map (OSSM), is a light-weight structure which partitions the collection of transactions into m segments, so as to reduce the number of candidate patterns that require frequency counting. We study the following problems: (1) what is the optimal number of segments to be used; and (2) given a user-determined m, what is the best segmentation/composition of the m segments? For Problem 1, we provide a thorough analysis and a theorem establishing the minimum value of m for which there is no accuracy lost in using the OSSM. For Problem 2, we develop various algorithms and heuristics, which efficiently generate OSSMs that are compact and effective, to help facilitate segmentation
  • Keywords
    data mining; data structures; pattern recognition; very large databases; data mining; data structure; heuristics; large databases; monotonicity condition; optimized segment support map; pattern frequency counting; performance analysis; Association rules; Data mining; Data structures; Frequency; Heuristic algorithms; Lightweight structures; Optimization methods; Partitioning algorithms; Performance analysis; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2002. Proceedings. 18th International Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-1531-2
  • Type

    conf

  • DOI
    10.1109/ICDE.2002.994776
  • Filename
    994776