DocumentCode
2260278
Title
Automatic Evaluation of Document Classification Using N-Gram Statistics
Author
Choi, Dongjin ; Ko, Byeongkyu ; Lee, Eunji ; Hwang, Myunggwon ; Kim, Pankoo
Author_Institution
Dept. of Comput. Eng., Chosun Univ., Gwangju, South Korea
fYear
2012
fDate
26-28 Sept. 2012
Firstpage
739
Lastpage
742
Abstract
Due to the development of World Wide Web technologies, people are living in the place flooding trillions of web pages in every moment. The amount of web size has been increasing dramatically. For this reason, it is getting more difficult to find relevant web documents corresponding to what users want to read. Classifying documents into predefined categories is one of the most important tasks in Natural Language Processing field. Over the years, many statistical and linguistical approaches have been applied to overcome traditional classification machine. However, it still remains in unsolved problem. There is a no perfect solution to machine understand human language yet. We have to consider every possibility for making machine think like human does. In this paper, we propose a method for classifying textural document using n-gram co-occurrence statistics which have a great possibility to find similarities between given documents. We also compare our proposed method with traditional method suggested by Keselj. This paper only covers simple approaches and still needs more sophisticated experiments. However, the performance using this method is better than the Keselj approach.
Keywords
Web sites; computational linguistics; natural language processing; pattern classification; statistical analysis; text analysis; Keselj approach; Web documents; Web pages; World Wide Web technologies; classification machine; document classification automatic evaluation; linguistical approach; n-gram co-occurrence statistics; natural language processing field; statistical approach; textural document classification; Bioinformatics; Computer vision; Computers; Data mining; Humans; Semantics; Training; N-gram; Natural Language Processing; document classification; formatting;
fLanguage
English
Publisher
ieee
Conference_Titel
Network-Based Information Systems (NBiS), 2012 15th International Conference on
Conference_Location
Melbourne, VIC
Print_ISBN
978-1-4673-2331-4
Type
conf
DOI
10.1109/NBiS.2012.96
Filename
6354916
Link To Document