• DocumentCode
    1961807
  • Title

    Dynamic miss-counting algorithms: finding implication and similarity rules with confidence pruning

  • Author

    Fujiwara, Shinji ; Ullman, Jeffrey D. ; Motwani, Rajeev

  • Author_Institution
    Dept. of Comput. Sci., Stanford Univ., CA, USA
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    501
  • Lastpage
    511
  • Abstract
    Dynamic miss-counting (DMC) algorithms are proposed which find all implication and similarity rules with confidence pruning but without support pruning. To handle data sets with a large number of columns, we propose dynamic pruning techniques that can be applied during data scanning. DMC counts the numbers of rows in which each pair of columns disagree instead of counting the number of hits. DMC deletes a candidate as soon as the number of misses exceeds the maximum number of misses allowed for that pair. We also propose several optimization techniques that reduce the required memory size significantly. We evaluated our algorithms by using four data sets, viz. Web access logs, a Web page-link graph, news documents and a dictionary. These data sets have between 74,000 and 700,000 items. Experiments show that DMC can find high-confidence rules for such large data sets efficiently
  • Keywords
    data mining; dictionaries; information resources; optimisation; very large databases; Web page-link graph; World Wide Web access logs; confidence pruning; data scanning; data set columns; dictionary; dynamic miss-counting algorithms; dynamic pruning techniques; high-confidence rules; implication rule discovery; large data sets; memory size reduction; news documents; optimization techniques; row counting; similarity rule discovery; Association rules; Collaboration; Computer science; Data mining; Dictionaries; Filtering; Frequency; Heuristic algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2000. Proceedings. 16th International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-0506-6
  • Type

    conf

  • DOI
    10.1109/ICDE.2000.839449
  • Filename
    839449