DocumentCode
467707
Title
Hypergraph Based Document Categorization: Frequent Itemsets vs Hypercliques
Author
Hu, Tian-Ming ; Ouyang, Ji ; Qu, Chao ; Sung, Sam Yuan
Author_Institution
DongGuan Univ. of Technol., Dongguan
Volume
2
fYear
2007
fDate
19-22 Aug. 2007
Firstpage
824
Lastpage
829
Abstract
This paper describes a new hypergraph formulation for document categorization, where hyperclique patterns, strongly affiliated documents in this case, are used as hyperedges. Compared to frequent itemsets, the objects in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine or Jaccard similarity measure. Since hypergraph partitioning is mainly based on vertex similairty on the hyperedge, hypercliques may serve as better quality hyperedges. Besides, due to the additional confidence constraint, we can cover more items in the mined patterns while keep the pattern size reasonable. Hence, the difficulty in partitioning dense hypergraphs, which is often encountered in frequent itemset based hypergraph partitioning, is alleviated considerably. Finally, experiments with real-world datasets show that, with hyperclique patterns as hyperedges, we can improve the clustering results in terms of various external validation measures.
Keywords
document handling; graph theory; pattern clustering; Jaccard similarity; clustering; cosine similarity; document categorization; frequent itemsets; global pairwise similarity; hypercliques; hyperedges; hypergraph partitioning; mined patterns; vertex similairty; Association rules; Clustering algorithms; Cybernetics; Frequency; Itemsets; Machine learning; Machine learning algorithms; Partitioning algorithms; Pattern analysis; Transaction databases; Document categorization; Frequent itemset; Hyperclique; Hypergraph partitioning;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location
Hong Kong
Print_ISBN
978-1-4244-0973-0
Electronic_ISBN
978-1-4244-0973-0
Type
conf
DOI
10.1109/ICMLC.2007.4370256
Filename
4370256
Link To Document