DocumentCode :
2372446
Title :
An Evaluation of the Formal Concept Analysis-Based Document Vector on Document Clustering
Author :
Jehng, Jihn-Chang ; Chou, Shihchieh ; Cheng, Chin-Yi ; Heh, Jia-Sheng
Author_Institution :
Inst. of Human Resource Manage., Nat. Central Univ., Jhongli, Taiwan
fYear :
2011
fDate :
20-23 June 2011
Firstpage :
207
Lastpage :
210
Abstract :
In conventional approaches, documents are represented by the vector whose dimensionalities are equivalent to the terms extracted from a document set. These approaches, called bag-of-term approaches, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms. In the past, researches have applied thesauri such as Word Net to solve this problem. However, thesauri such as Word Net are developed more for general purposes and are limited in specific domain. Therefore, an automatically built ontology for terms is desired. In our previous study, we proposed a method which applies formal concept analysis (FCA), an automatic ontology building method, to extract the term relationships from a document set, and then apply the extracted information as the ontology of terms to represent the documents as concept vectors. In order to evaluate the usability and effectiveness of the proposed method for information retrieval related applications, we employed the concept vectors generated for the documents to the document clustering. In this study, we apply bisecting k-means clustering and hierarchical agglomerative clustering as the platforms with which to evaluate our method.
Keywords :
document handling; information retrieval; ontologies (artificial intelligence); pattern clustering; vectors; automatic ontology building method; bag-of-term approach; bisecting k-means clustering; document clustering; document vector; formal concept analysis; hierarchical agglomerative clustering; information retrieval; term relationship extraction; Clustering algorithms; Context; Data mining; Information retrieval; Lattices; Ontologies; Thesauri; Concept vector; Document clustering; Document vector; Formal concept analysis; Term Ontology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Science and Its Applications (ICCSA), 2011 International Conference on
Conference_Location :
Santander
Print_ISBN :
978-1-4577-0142-9
Type :
conf
DOI :
10.1109/ICCSA.2011.57
Filename :
5959620
Link To Document :
بازگشت