Title :
Semantic explorative evaluation of document clustering algorithms
Author :
Hung Son Nguyen ; Sinh Hoa Nguyen ; Swieboda, Wojciech
Author_Institution :
Inst. of Math., Univ. of Warsaw, Warsaw, Poland
Abstract :
In this paper, we investigate the problem of quality analysis of clustering results using semantic annotations given by experts. We propose a novel approach to construction of evaluation measure, which is based on the Minimal Description Length (MDL) principle. In fact this proposed measure, called SEE (Semantic Evaluation by Exploration), is an improvement of the existing evaluation methods such as Rand Index or Normalized Mutual Information. It fixes some of weaknesses of the original methods. We illustrate the proposed evaluation method on the freely accessible biomedical research articles from Pubmed Central (PMC). Many articles from Pubmed Central are annotated by the experts using Medical Subject Headings (MeSH) thesaurus. This paper is a part of the research on designing and developing a dialog-based semantic search engine for SONCA system which is a part of the SYNAT project. We compare different semantic techniques for search result clustering using the proposed measure.
Keywords :
document handling; indexing; pattern clustering; search engines; MDL principle; MeSH thesaurus; PMC; Pubmed Central; SEE; SONCA system; SYNAT project; dialog-based semantic search engine; document clustering algorithms; freely accessible biomedical research; medical subject headings thesaurus; minimal description length principle; normalized mutual information; quality analysis; rand index; search result clustering; semantic annotations; semantic evaluation by exploration; semantic explorative evaluation; Biomedical measurement; Clustering algorithms; Decision trees; Extraterrestrial measurements; Indexes; Moon; Semantics;
Conference_Titel :
Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on
Conference_Location :
Krako??w