Title :
Web Object Mining Based on Entropy Pruning
Author :
Gao, Kun ; Liu, Rui ; Wang, Deqing
Author_Institution :
State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
Abstract :
Currently, a large number of Web information on the Internet is presented in structured objects. Mining object information from Web is of great importance for Web data management. MDR algorithm is a fully automated data records mining method. The algorithm can successfully detect data records in pages, but the effect is not very ideal because the noise existing in the page can not be eliminated. The paper introduced information entropy theory based on MDR algorithm and presents a new Web object mining method. Firstly, a Web page is divided to blocks and we construct a semantic DOM tree for each block. Then we compute the entropy value of the Web page and find topic regions using entropy pruning algorithm. Lastly, data records are mined from topic regions. Experiment proved the effectiveness and practicality of the method.
Keywords :
Internet; data mining; entropy; Internet; MDR algorithm; Web data management; Web information; Web object mining; Web page; data records mining method; entropy pruning; information entropy theory; semantic DOM tree; Data mining; Information entropy; Internet; Metasearch; Object detection; Ontologies; Programming; Testing; Web pages; World Wide Web; Web object; data mining; information entropy; information extraction;
Conference_Titel :
Semantics, Knowledge and Grid, 2008. SKG '08. Fourth International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-0-7695-3401-5
Electronic_ISBN :
978-0-7695-3401-5
DOI :
10.1109/SKG.2008.20