• DocumentCode
    3140664
  • Title

    Mining association algorithm with threshold based on ROC analysis

  • Author

    Kawahara, Minoru ; Kawano, Hiroyuki

  • Author_Institution
    Data Process. Center, Kyoto Univ., Japan
  • fYear
    2001
  • fDate
    6-6 Jan. 2001
  • Abstract
    The mining association algorithm is one of the most important data mining algorithms to derive association rules at high speed from huge databases. However, the algorithm tends to derive those rules that contain noise, such as stopwords, and then some systems remove the noise using noise filters. We have been improving the algorithm and developing navigation systems for semi-structured data using the algorithm, and we also use a dictionary to remove noise from derived association rules. In order to derive effective rules, it is very important to determine system parameters such as the threshold values of minimum support and minimum confidence. We have adapted ROC analysis to the algorithm on our navigation systems and have evaluated the performance of derived rules. In this paper, we import the parameters from the ROC analysis into the algorithm in order to propose extended mining association algorithms. Moreover, we evaluate the performance of our proposed algorithms using a experimental database and show how our proposed algorithms can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data.
  • Keywords
    data analysis; data mining; noise; online front-ends; software performance evaluation; very large databases; Apriori algorithm; ROC analysis; algorithm performance evaluation; association rules derivation; automatic stopword removal; data mining algorithm; dictionary; information navigation systems; large databases; minimum confidence; minimum support; mining association algorithm; noise filters; noise removal; receiver operating characteristic; rule performance; semi-structured data; system parameters; threshold values; Algorithm design and analysis; Association rules; Data mining; Data processing; Databases; Dictionaries; Electronic mail; Filters; Navigation; Performance analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System Sciences, 2001. Proceedings of the 34th Annual Hawaii International Conference on
  • Conference_Location
    Maui, HI, USA
  • Print_ISBN
    0-7695-0981-9
  • Type

    conf

  • DOI
    10.1109/HICSS.2001.926303
  • Filename
    926303