DocumentCode
441630
Title
A Novel Content and Style Based Measurement of Web Pages Distance
Author
Zhang, Q.P. ; Liang, M. ; Lai, L.L.
Author_Institution
Dept. of Computer Science and Engineering, Fudan University, Shanghai 200433, China; E-MAIL: qpzhang@fudan.edu.cn
Volume
1
fYear
2005
fDate
18-21 Aug. 2005
Firstpage
429
Lastpage
435
Abstract
Nowadays, many web-based systems have been using machine learning techniques in order to design more intelligent mechanisms for organizing, indexing, and retrieving web content, and it is necessary for more and more researches and applications to calculate the distance of web pages rationally. Generally proposed methodology is fit for extracting the differences between HTML documents of web pages, results of which cannot be used to tell the actual distance, between the content of web pages and the facade displayed in internet explorers. Based on these above, content distance, style distance, and hybrid distance are proposed in this paper, to make measurement result more practical. The efficiency will be proved through some classical experiments.
Keywords
Web mining; Web page; cluster; distance function; Computer science; Content based retrieval; Distance measurement; HTML; Internet; Machine learning; Markup languages; Multimedia databases; Web mining; Web pages; Web mining; Web page; cluster; distance function;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location
Guangzhou, China
Print_ISBN
0-7803-9091-1
Type
conf
DOI
10.1109/ICMLC.2005.1526985
Filename
1526985
Link To Document