Title of article :
A Web Page Classification Technique with Textual Content Analysis Using NN-PCA for Objectionable Web Page Classification
Author/Authors :
Patel، Deepshikha نويسنده , , Singh Chauhan، Prashant نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Abstract :
As the Internet has recently been rapidly
expanded; we can find information easily and quickly. A lot
of useful information exists on the internet, but there is also
harmful information involving pornography, adult content, is
not appropriate for all users. This is particularly problematic
when children are able to access the objectionable material
with ease. Pornography web content is one of the biggest
harmful resources that pollute the healthy mind of children
and teenagers. Several content based analysis approaches
had been proposed to avoiding objectionable and other
offensive material accessed by the children. Internet users
have begun to protect themselves and their wards by using
so called web content filters, which allow access to legitimate
content and disallow access to objectionable, illegal,
pornographic, and other problematic content. This paper,
proposes a new content based classification scheme for
objectionable web documents filtering. Proposed method uses
a neural network with input obtained by Principal
Component Analysis (PCA). Each web page is represented by
the term weighting scheme. As the number of unique words
in the collection set is big, the PCA has been used to select the
most relevant features for the classification. These feature
vectors are then used as the input to the neural network for
classification. We have conducted results by taking four data
sets containing different web pages to test the performance of
classifier in various scenarios. The experimental evaluation
demonstrates that the proposed method provides outstanding
classification accuracy for objectionable document
classification
Journal title :
International Journal of Electronics Communication and Computer Engineering
Journal title :
International Journal of Electronics Communication and Computer Engineering