• DocumentCode
    467707
  • Title

    Hypergraph Based Document Categorization: Frequent Itemsets vs Hypercliques

  • Author

    Hu, Tian-Ming ; Ouyang, Ji ; Qu, Chao ; Sung, Sam Yuan

  • Author_Institution
    DongGuan Univ. of Technol., Dongguan
  • Volume
    2
  • fYear
    2007
  • fDate
    19-22 Aug. 2007
  • Firstpage
    824
  • Lastpage
    829
  • Abstract
    This paper describes a new hypergraph formulation for document categorization, where hyperclique patterns, strongly affiliated documents in this case, are used as hyperedges. Compared to frequent itemsets, the objects in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine or Jaccard similarity measure. Since hypergraph partitioning is mainly based on vertex similairty on the hyperedge, hypercliques may serve as better quality hyperedges. Besides, due to the additional confidence constraint, we can cover more items in the mined patterns while keep the pattern size reasonable. Hence, the difficulty in partitioning dense hypergraphs, which is often encountered in frequent itemset based hypergraph partitioning, is alleviated considerably. Finally, experiments with real-world datasets show that, with hyperclique patterns as hyperedges, we can improve the clustering results in terms of various external validation measures.
  • Keywords
    document handling; graph theory; pattern clustering; Jaccard similarity; clustering; cosine similarity; document categorization; frequent itemsets; global pairwise similarity; hypercliques; hyperedges; hypergraph partitioning; mined patterns; vertex similairty; Association rules; Clustering algorithms; Cybernetics; Frequency; Itemsets; Machine learning; Machine learning algorithms; Partitioning algorithms; Pattern analysis; Transaction databases; Document categorization; Frequent itemset; Hyperclique; Hypergraph partitioning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2007 International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4244-0973-0
  • Electronic_ISBN
    978-1-4244-0973-0
  • Type

    conf

  • DOI
    10.1109/ICMLC.2007.4370256
  • Filename
    4370256