DocumentCode
2113365
Title
A Semantic DOM Approach for Webpage Information Extraction
Author
Fei, Yulian ; Luo, Zongwei ; Xu, Yun ; Zhang, Winston
Author_Institution
Comput. Sci. & Inf. Eng. Inst., Zhejiang Gongshang Univ., Hangzhou, China
fYear
2009
fDate
20-22 Sept. 2009
Firstpage
1
Lastpage
5
Abstract
With the development of electronic technology and e-commerce, technology for Web pages has attracted a lot of research efforts which becomes one of the hottest topics recently. This paper has proposed a semantic DOM (SDOM) approach for information extraction of e-commerce Web pages. With the combination of content and structure information, the precision and recall can achieve a good result which is shown in our experiments on listpage and tablepage data sets.
Keywords
Web sites; electronic commerce; Web page information extraction; document object model; e-commerce; listpage dataset; semantic DOM approach; tablepage data set; Computer science; Data mining; Facebook; HTML; Information services; Internet; Machine learning; Tree data structures; Web pages; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Management and Service Science, 2009. MASS '09. International Conference on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-4638-4
Electronic_ISBN
978-1-4244-4639-1
Type
conf
DOI
10.1109/ICMSS.2009.5302541
Filename
5302541
Link To Document