• DocumentCode
    2965929
  • Title

    A Simple Study of Webpage Text Classification Algorithms for Arabic and English Languages

  • Author

    Al-Ghuribi, Sumaia Mohammed ; Alshomrani, Saleh

  • Author_Institution
    Fac. of Comput. & Inf. Technol., King Abdulaziz Univ., Jeddah, Saudi Arabia
  • fYear
    2013
  • fDate
    16-18 Dec. 2013
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Webpage text Classification is an important problem that has been studied through different approaches and algorithms. It aims to assign a predefined category to a Webpage based on its content and linguistic features. It has many applications such as word sense disambiguation, document indexing, text filtering, Webpages hierarchical categorization and document organization. This study is a part of a work in progress, in which we are targeting to develop Bi-languages algorithm for classifying Arabic and English Webpage text and can perform accurate and efficient in both languages. It aims at providing a simple overview of many approaches that constructed for classifying Arabic and English Webpage documents. In this survey, the widely used algorithms for text classification are given with a comparison of the recent researches in classification field for Arabic and English languages to conclude which is the best algorithm that we can apply for both Arabic and English Languages.
  • Keywords
    Internet; classification; indexing; natural language processing; text analysis; Arabic Webpage text; Arabic language; English Language; English Webpage text; Webpage text classification algorithm; Webpages hierarchical categorization; bilanguages algorithm; document indexing; document organization; text filtering; word sense disambiguation; Accuracy; Classification algorithms; Decision trees; Niobium; Support vector machines; Text categorization; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    IT Convergence and Security (ICITCS), 2013 International Conference on
  • Conference_Location
    Macao
  • Type

    conf

  • DOI
    10.1109/ICITCS.2013.6717784
  • Filename
    6717784