DocumentCode :
526653
Title :
Associative web document classification based on word mixed weight
Author :
Li, Xingyi ; Lan, Jun ; Shi, Huaji
Author_Institution :
Dept. of Comput. Sci. & Telecommun. Eng., Jiangsu Univ., Zhenjiang, China
Volume :
3
fYear :
2010
fDate :
9-11 July 2010
Firstpage :
578
Lastpage :
581
Abstract :
There are two shortages when the method of classification based on association rules is applied to classify the web documents: one is that the method process the web document as a plain text, ignoring the HTML tags information of the web page; another is that either item of the association rules is only the word in the web page, without considering the weight of the word, or it quantifies the weight of the word frequency, ignoring the importance of the location of the word in the web document. Therefore, a new efficient method is proposed in the paper. It calculates the word´s mixed weight by the information of the HTML tags feature, and then mines the classification rules based on the mixed weight to classify the web pages. The result of experiment shows that the performance of this approach is better than the traditional associated classification methods.
Keywords :
Internet; Web sites; classification; data mining; document handling; HTML tags information; Web page classification rules; associated classification method; association rules; associative Web document classification; word frequency; word mixed weight; Artificial neural networks; HTML; Niobium; Variable speed drives; HTML tags; association rules; mixed weight; web document classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-5537-9
Type :
conf
DOI :
10.1109/ICCSIT.2010.5564804
Filename :
5564804
Link To Document :
بازگشت