شماره ركورد كنفرانس :
720
عنوان مقاله :
Language detection for classification and content-based web pages filtering
پديدآورندگان :
Bashbaghi Saman نويسنده , Khotanlou Hassan نويسنده Department of Computer Engineering
كليدواژه :
Text classification , automatic language detection , web page filtering
عنوان كنفرانس :
مجموعه مقالات اولين همايش ملي انجير ديم
چكيده فارسي :
According to daily increase of the documents
increasing on the internet, automatic language detection is
getting more important. In this paper we used language
detection system to classify and filtering of the immoral web
pages, based on their contents. This system could detect 10
most used languages in the immoral web pages, including
FARSI language. As a technique we introduce a new combined
method which consists of three parts; URL Processor, page
encoding processor, and text processor. In order to generate
proper results this system has a voter which combines the
results of these three parts. We used the immoral web pages
and labeled web pages as an input data set in order to make a
linguistic model for each language and system evaluation. Our
experiments show 95% accuracy success in accuracy of
outcome results.
شماره مدرك كنفرانس :
3608842