Title :
An Automatic Document Classifier System based on Naíve Bayes Classifier and Ontology
Author :
Chang, Yi-Hsing ; Huang, Hsiu-Yi
Author_Institution :
Dept. of Inf. Manage., Southern Taiwan Univ., Yungkang
Abstract :
An automatic document classifier system based on ontology and the naive Bayes classifier is proposed in this paper. The main concept is to first establish a keyword synonymous table by experts for narrowing down the range and getting the consistency of keywords. The formal concept analysis is then used for establishing knowledge ontology through the complex categories and attributes relation. Finally, the ontology is applied to a naive Bayes classifier to get the automatic document classifier system. In this system, 319 documents divided into 11 categories are used to assess the effectiveness of classification, where 224 and 95 documents are the training and testing documents respectively, and the F1-measure is as the assessment criteria. The experimental results show that nine from 11 categories reaches 80% effectiveness of the documents classification, whereas the other two categories reached over 60% effectiveness of the documents classification. In sum, the average effectiveness of document classification in 11 categories is about 89%. Thus, the automatic classifier system can indeed reach the effectiveness of document classification.
Keywords :
Bayes methods; document handling; ontologies (artificial intelligence); pattern classification; F1-measure; assessment criteria; automatic document classifier system; formal concept analysis; keyword synonymous table; naive Bayes classifier; ontology; Cybernetics; Electronic mail; Frequency; Information management; Information retrieval; Machine learning; Ontologies; System testing; Terminology; Text categorization; Document classification; Formal concept analysis; Naíve bayes classifier; Ontology;
Conference_Titel :
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2095-7
Electronic_ISBN :
978-1-4244-2096-4
DOI :
10.1109/ICMLC.2008.4620948