Title :
Text Clustering using Frequent Contextual Termset
Author :
Akhriza, Tubagus Mohammad ; Ma, Yinghua ; Li, Jianhua
Author_Institution :
Sch. of Commun. & Inf. Syst., Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
We introduce frequent contextual term set (FCT) as an alternative concept of term set construction for text clustering which is produced from the interestingness of documents. Comparing to state-of-art term set, the proposed approach has some advantages: (1) more efficient in term set production (2) more effective in storing the vocabulary amongst documents which express the context amongst documents and (3) more suitable to discover specificity of dataset. To utilize FCT we also introduce frequent contextual term set based hierarchical clustering (FCTHC) which adopts the concept of cancroids in K-means with some main differences. The experiment shows that FCT is the correct pattern to perform text clustering and FCTHC provides flexible approach in clusters construction.
Keywords :
pattern clustering; text analysis; vocabulary; cancroid concept; dataset specificity discovery; document interestingness; frequent contextual term set based hierarchical clustering; k-means; term set construction; term set production; text clustering; vocabulary storage; Clustering algorithms; Context; Data mining; Equations; Itemsets; Merging; Production; Frequent Contextual Termset; Frequent Itemset; Text clustering;
Conference_Titel :
Information Management, Innovation Management and Industrial Engineering (ICIII), 2011 International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-61284-450-3
DOI :
10.1109/ICIII.2011.86