Title :
Term Clustering and Confidence Measurement in Document Clustering
Author :
Csorba, Kristóf ; Vajk, Istváin
Author_Institution :
Dept. of Autom. & Appl. Inf., Budapest Univ. of Technol. & Econ., Budapest
Abstract :
A novel topic based document clustering technique is presented in the paper for situations, where there is no need to assign all the documents to the clusters. Under such conditions the clustering system can provide a much cleaner result by rejecting the classification of documents with ambiguous topic. This is achieved by applying a confidence measurement for every classification result and by discarding documents with a confidence value less than a predefined lower limit. This means that our system returns the classification for a document only if it feels sure about it If not, the document is marked as "unsure". Beside this ability the confidence measurement allows the use of a much stronger term filtering, performed by a novel, supervised term cluster creation and term filtering algorithm, which is presented in this paper as well.
Keywords :
classification; document handling; information filtering; learning (artificial intelligence); pattern clustering; ambiguous topic; confidence measurement; document classification; document clustering system; supervised term cluster creation; supervised term filtering algorithm; Automation; Feature extraction; Filtering algorithms; Frequency; Informatics; Information filtering; Information filters; Paper technology; Performance evaluation; Supervised learning; confidence; document clustering; supervised learning; term cluster creation;
Conference_Titel :
Computational Cybernetics, 2006. ICCC 2006. IEEE International Conference on
Conference_Location :
Budapest
Print_ISBN :
1-4244-0071-6
Electronic_ISBN :
1-4244-0072-4
DOI :
10.1109/ICCCYB.2006.305694