DocumentCode
560327
Title
Text Clustering using Frequent Contextual Termset
Author
Akhriza, Tubagus Mohammad ; Ma, Yinghua ; Li, Jianhua
Author_Institution
Sch. of Commun. & Inf. Syst., Shanghai Jiao Tong Univ., Shanghai, China
Volume
1
fYear
2011
fDate
26-27 Nov. 2011
Firstpage
339
Lastpage
342
Abstract
We introduce frequent contextual term set (FCT) as an alternative concept of term set construction for text clustering which is produced from the interestingness of documents. Comparing to state-of-art term set, the proposed approach has some advantages: (1) more efficient in term set production (2) more effective in storing the vocabulary amongst documents which express the context amongst documents and (3) more suitable to discover specificity of dataset. To utilize FCT we also introduce frequent contextual term set based hierarchical clustering (FCTHC) which adopts the concept of cancroids in K-means with some main differences. The experiment shows that FCT is the correct pattern to perform text clustering and FCTHC provides flexible approach in clusters construction.
Keywords
pattern clustering; text analysis; vocabulary; cancroid concept; dataset specificity discovery; document interestingness; frequent contextual term set based hierarchical clustering; k-means; term set construction; term set production; text clustering; vocabulary storage; Clustering algorithms; Context; Data mining; Equations; Itemsets; Merging; Production; Frequent Contextual Termset; Frequent Itemset; Text clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Management, Innovation Management and Industrial Engineering (ICIII), 2011 International Conference on
Conference_Location
Shenzhen
Print_ISBN
978-1-61284-450-3
Type
conf
DOI
10.1109/ICIII.2011.86
Filename
6115455
Link To Document