مرکز منطقه ای اطلاع رساني علوم و فناوري - Language detection for classification and content-based web pages filtering

شماره ركورد كنفرانس :

720

عنوان مقاله :

Language detection for classification and content-based web pages filtering

پديدآورندگان :

Bashbaghi Saman نويسنده , Khotanlou Hassan نويسنده Department of Computer Engineering

تعداد صفحه :

كليدواژه :

Text classification , automatic language detection , web page filtering

عنوان كنفرانس :

مجموعه مقالات اولين همايش ملي انجير ديم

زبان مدرك :

فارسی

چكيده فارسي :

According to daily increase of the documents increasing on the internet, automatic language detection is getting more important. In this paper we used language detection system to classify and filtering of the immoral web pages, based on their contents. This system could detect 10 most used languages in the immoral web pages, including FARSI language. As a technique we introduce a new combined method which consists of three parts; URL Processor, page encoding processor, and text processor. In order to generate proper results this system has a voter which combines the results of these three parts. We used the immoral web pages and labeled web pages as an input data set in order to make a linguistic model for each language and system evaluation. Our experiments show 95% accuracy success in accuracy of outcome results.

شماره مدرك كنفرانس :

3608842

سال انتشار :

1393

از صفحه :

تا صفحه :

سال انتشار :

لينک به اين مدرک :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=36&DC=86375