• DocumentCode
    1795932
  • Title

    Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach

  • Author

    Hakim, Ari Aulia ; Erwin, Alva ; Eng, Kho I. ; Galinium, Maulahikmah ; Muliady, Wahyu

  • Author_Institution
    Fac. of Eng. & Inf. Technol., Swiss German Univ., Tangerang, Indonesia
  • fYear
    2014
  • fDate
    7-8 Oct. 2014
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The exponential growth of the data may lead us to the information explosion era, an era where most of the data cannot be managed easily. Text mining study is believed to prevent the world from entering that era. One of the text mining studies that may prevent the explosion era is text classification. It is a way to classify articles into several predefined categories. In this research, the classifier implements TF-IDF algorithm. TF-IDF is an algorithm that counts the word weight by considering frequency of the word (TF) and in how many files the word can be found (IDF). Since the IDF could see the in how many files a term can be found, it can control the weight of each word. When a word can be found in so many files, it will be considered as an unimportant word. TF-IDF has been proven to create a classifier that could classify news articles in Bahasa Indonesia in a high accuracy; 98.3%.
  • Keywords
    data mining; electronic publishing; pattern classification; text analysis; Bahasa Indonesia; TF-IDF approach; automated document classification; news article classification; term frequency inverse document frequency approach; text mining; Accuracy; Classification algorithms; Computers; Dictionaries; Explosions; Text categorization; Text mining; TF-IDF approach; Text Classification; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology and Electrical Engineering (ICITEE), 2014 6th International Conference on
  • Conference_Location
    Yogyakarta
  • Print_ISBN
    978-1-4799-5302-8
  • Type

    conf

  • DOI
    10.1109/ICITEED.2014.7007894
  • Filename
    7007894