DocumentCode
2965929
Title
A Simple Study of Webpage Text Classification Algorithms for Arabic and English Languages
Author
Al-Ghuribi, Sumaia Mohammed ; Alshomrani, Saleh
Author_Institution
Fac. of Comput. & Inf. Technol., King Abdulaziz Univ., Jeddah, Saudi Arabia
fYear
2013
fDate
16-18 Dec. 2013
Firstpage
1
Lastpage
5
Abstract
Webpage text Classification is an important problem that has been studied through different approaches and algorithms. It aims to assign a predefined category to a Webpage based on its content and linguistic features. It has many applications such as word sense disambiguation, document indexing, text filtering, Webpages hierarchical categorization and document organization. This study is a part of a work in progress, in which we are targeting to develop Bi-languages algorithm for classifying Arabic and English Webpage text and can perform accurate and efficient in both languages. It aims at providing a simple overview of many approaches that constructed for classifying Arabic and English Webpage documents. In this survey, the widely used algorithms for text classification are given with a comparison of the recent researches in classification field for Arabic and English languages to conclude which is the best algorithm that we can apply for both Arabic and English Languages.
Keywords
Internet; classification; indexing; natural language processing; text analysis; Arabic Webpage text; Arabic language; English Language; English Webpage text; Webpage text classification algorithm; Webpages hierarchical categorization; bilanguages algorithm; document indexing; document organization; text filtering; word sense disambiguation; Accuracy; Classification algorithms; Decision trees; Niobium; Support vector machines; Text categorization; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
IT Convergence and Security (ICITCS), 2013 International Conference on
Conference_Location
Macao
Type
conf
DOI
10.1109/ICITCS.2013.6717784
Filename
6717784
Link To Document