DocumentCode :
2774235
Title :
A WordNet-Based Semantic Model for Enhancing Text Clustering
Author :
Shehata, Shady
Author_Institution :
Univ. of Waterloo, Waterloo, ON, Canada
fYear :
2009
fDate :
6-6 Dec. 2009
Firstpage :
477
Lastpage :
482
Abstract :
Most of text mining techniques are based on word and/or phrase analysis of the text. The statistical analysis of a term (word or phrase) frequency captures the importance of the term within a document. However, to achieve a more accurate analysis, the underlying mining technique should indicate terms that capture the semantics of the text from which the importance of a term in a sentence and in the document can be derived. Incorporating semantic features from the WordNet lexical database is one of many approaches that have been tried to improve the accuracy of text clustering techniques. A new semantic-based model that analyzes documents based on their meaning is introduced. The proposed model analyzes terms and their corresponding synonyms and/or hypernyms on the sentence and document levels. In this model, if two documents contain different words and these words are semantically related, the proposed model can measure the semantic-based similarity between the two documents. The similarity between documents relies on a new semantic-based similarity measure which is applied to the matching concepts between documents. Experiments using the proposed semantic-based model in text clustering are conducted. Experimental results demonstrate that the newly developed semantic-based model enhances the clustering quality of sets of documents substantially.
Keywords :
data mining; statistical analysis; text analysis; WordNet lexical database; document analysis; semantic model; semantic-based similarity; statistical analysis; term frequency; text clustering; text mining; Clustering algorithms; Clustering methods; Conferences; Data mining; Frequency; Natural language processing; Spatial databases; Statistical analysis; Text mining; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-5384-9
Electronic_ISBN :
978-0-7695-3902-7
Type :
conf
DOI :
10.1109/ICDMW.2009.86
Filename :
5360452
Link To Document :
بازگشت