Title :
Hypergraph Based Document Categorization: Frequent Itemsets vs Hypercliques
Author :
Hu, Tian-Ming ; Ouyang, Ji ; Qu, Chao ; Sung, Sam Yuan
Author_Institution :
DongGuan Univ. of Technol., Dongguan
Abstract :
This paper describes a new hypergraph formulation for document categorization, where hyperclique patterns, strongly affiliated documents in this case, are used as hyperedges. Compared to frequent itemsets, the objects in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine or Jaccard similarity measure. Since hypergraph partitioning is mainly based on vertex similairty on the hyperedge, hypercliques may serve as better quality hyperedges. Besides, due to the additional confidence constraint, we can cover more items in the mined patterns while keep the pattern size reasonable. Hence, the difficulty in partitioning dense hypergraphs, which is often encountered in frequent itemset based hypergraph partitioning, is alleviated considerably. Finally, experiments with real-world datasets show that, with hyperclique patterns as hyperedges, we can improve the clustering results in terms of various external validation measures.
Keywords :
document handling; graph theory; pattern clustering; Jaccard similarity; clustering; cosine similarity; document categorization; frequent itemsets; global pairwise similarity; hypercliques; hyperedges; hypergraph partitioning; mined patterns; vertex similairty; Association rules; Clustering algorithms; Cybernetics; Frequency; Itemsets; Machine learning; Machine learning algorithms; Partitioning algorithms; Pattern analysis; Transaction databases; Document categorization; Frequent itemset; Hyperclique; Hypergraph partitioning;
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
DOI :
10.1109/ICMLC.2007.4370256