DocumentCode
3046016
Title
Information Extraction Based on Table Area Locating for E-Commerce Websites
Author
Ouyang, Liubo ; Dong, Rui ; Zou, Beiji
Author_Institution
Software Sch., Hunan Univ., Changsha, China
Volume
4
fYear
2009
fDate
19-21 May 2009
Firstpage
441
Lastpage
445
Abstract
Efficient extracting merchandise information is the key technology for e-commerce searching engine. By analyzing Web table characters of HTML pages of e-commerce Websites, this article proposes the notion of table area locating, and decomposes the merchandise information extraction into three key processes: searching preparative core areas (PCA), locating core area (CA) and extracting attribute values from core-area, and then design the algorithm of locating core area and the algorithm of extracting attributes names and values. We experimented with the new approach on some HTML pages from various e-commerce Websites. The results indicate that this approach can locate merchandise information area and extract attributes names and values efficiently, and have better performance of precise and recall.
Keywords
Web sites; electronic commerce; hypermedia markup languages; information retrieval; search engines; CA; HTML page; PCA; e-commerce Website; locating core area; merchandise information extraction; preparative core area; search engine; table area locating; Algorithm design and analysis; Character recognition; Classification tree analysis; Data mining; Electronic commerce; HTML; Information analysis; Merchandise; Pattern recognition; Search engines; Area location; DOM tree; Information extraction; Web Tables;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems, 2009. GCIS '09. WRI Global Congress on
Conference_Location
Xiamen
Print_ISBN
978-0-7695-3571-5
Type
conf
DOI
10.1109/GCIS.2009.310
Filename
5209246
Link To Document