DocumentCode
1795932
Title
Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach
Author
Hakim, Ari Aulia ; Erwin, Alva ; Eng, Kho I. ; Galinium, Maulahikmah ; Muliady, Wahyu
Author_Institution
Fac. of Eng. & Inf. Technol., Swiss German Univ., Tangerang, Indonesia
fYear
2014
fDate
7-8 Oct. 2014
Firstpage
1
Lastpage
4
Abstract
The exponential growth of the data may lead us to the information explosion era, an era where most of the data cannot be managed easily. Text mining study is believed to prevent the world from entering that era. One of the text mining studies that may prevent the explosion era is text classification. It is a way to classify articles into several predefined categories. In this research, the classifier implements TF-IDF algorithm. TF-IDF is an algorithm that counts the word weight by considering frequency of the word (TF) and in how many files the word can be found (IDF). Since the IDF could see the in how many files a term can be found, it can control the weight of each word. When a word can be found in so many files, it will be considered as an unimportant word. TF-IDF has been proven to create a classifier that could classify news articles in Bahasa Indonesia in a high accuracy; 98.3%.
Keywords
data mining; electronic publishing; pattern classification; text analysis; Bahasa Indonesia; TF-IDF approach; automated document classification; news article classification; term frequency inverse document frequency approach; text mining; Accuracy; Classification algorithms; Computers; Dictionaries; Explosions; Text categorization; Text mining; TF-IDF approach; Text Classification; Text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology and Electrical Engineering (ICITEE), 2014 6th International Conference on
Conference_Location
Yogyakarta
Print_ISBN
978-1-4799-5302-8
Type
conf
DOI
10.1109/ICITEED.2014.7007894
Filename
7007894
Link To Document