DocumentCode
2876866
Title
A Method for Judging Web-page Type
Author
Xue Hong-Jun ; Chen Tao ; Xue Li-Min
Author_Institution
Dept. of Inf. Warfare Study, Naval Command Coll., Nanjing, China
fYear
2012
fDate
17-18 Nov. 2012
Firstpage
91
Lastpage
93
Abstract
This paper introduces a concept of information entropy to judge web-page types, which associates with the method put forward by Roadrunner that pre-purifying topic pages and then using proportional relation to judge the type of pages. With some typical pages from large website home, the average precision could be reached to 96.7%, which lays foundation for further information extracting work.
Keywords
Web sites; data mining; entropy; text analysis; Roadrunner; Web site; Web-page type; information entropy; proportional relation; Accuracy; Data mining; Educational institutions; HTML; Information entropy; Noise reduction; Web pages; HTML Parser; web-page segmentation; web-page type;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Security (CIS), 2012 Eighth International Conference on
Conference_Location
Guangzhou
Print_ISBN
978-1-4673-4725-9
Type
conf
DOI
10.1109/CIS.2012.28
Filename
6405873
Link To Document