Title :
A Method for Judging Web-page Type
Author :
Xue Hong-Jun ; Chen Tao ; Xue Li-Min
Author_Institution :
Dept. of Inf. Warfare Study, Naval Command Coll., Nanjing, China
Abstract :
This paper introduces a concept of information entropy to judge web-page types, which associates with the method put forward by Roadrunner that pre-purifying topic pages and then using proportional relation to judge the type of pages. With some typical pages from large website home, the average precision could be reached to 96.7%, which lays foundation for further information extracting work.
Keywords :
Web sites; data mining; entropy; text analysis; Roadrunner; Web site; Web-page type; information entropy; proportional relation; Accuracy; Data mining; Educational institutions; HTML; Information entropy; Noise reduction; Web pages; HTML Parser; web-page segmentation; web-page type;
Conference_Titel :
Computational Intelligence and Security (CIS), 2012 Eighth International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
978-1-4673-4725-9
DOI :
10.1109/CIS.2012.28