DocumentCode :
1955651
Title :
Text Clustering Based on Domain Ontology and Latent Semantic Analysis
Author :
Yaxiong Li ; Jianqiang Zhang ; Hu, Dan
Author_Institution :
Network Manage. Center, Xianning Univ., Xianning, China
fYear :
2010
fDate :
28-30 Dec. 2010
Firstpage :
219
Lastpage :
222
Abstract :
One key step in text mining is the categorization of texts, i.e., to put texts of the same or similar contents into one group so as to distinguish texts of different contents. However, traditional word-frequency-based statistical approaches, such as VSM model, failed to reflect the complicated meaning in texts. This paper ushers in domain ontology and constructs new conceptual vector space model in the pre-processing stage of text clustering, substituting the initial matrix (lexicon-text matrix) in the latent semantic analysis with concept-text matrix. In the clustering analysis stage, this model adopts semantic similarity, partially overcoming the difficulty in accurately and effectively evaluating the degree of similarity of text due to simply taking into account the frequency of words and/or phrases in the text. Experimental results indicate that this method is helpful in improving the result of text clustering.
Keywords :
data mining; matrix algebra; ontologies (artificial intelligence); pattern clustering; text analysis; vectors; clustering analysis; concept-text matrix; domain ontology; latent semantic analysis; lexicon-text matrix; text categorization; text clustering; text mining; vector space model; Clustering algorithms; Feature extraction; Matrix decomposition; Ontologies; Semantics; Support vector machine classification; Concept-Text Matrix; Domain Ontology; Latent Semantic Analysis; Text Clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
Type :
conf
DOI :
10.1109/IALP.2010.55
Filename :
5681616
Link To Document :
بازگشت