DocumentCode :
2281393
Title :
Spatial Relation Based Object Extraction from the World Wide Web
Author :
Jingmin, Hao ; Lejian, Liao ; HeDi
Author_Institution :
Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing
Volume :
3
fYear :
2008
fDate :
9-12 Dec. 2008
Firstpage :
94
Lastpage :
97
Abstract :
The statistical results of observations show that regular spatial distribution characteristics exist for Web information about objects of the same type across different Web sites. The spatial distance between components within one object is always less than that between different objects. A novel method based on spatial configuration of Web document to extract object from the World Wide Web is presented. It demonstrates a fully automatic bottom-up process of object extraction. This method primarily considers the distribution characteristic of Web information and is independent of underlying documentation representation, such as HTML code. Experiments show that the proposed method can work well even when the HTML structure is far different from layout structure, and the results are encouraging.
Keywords :
Web sites; document handling; information retrieval; Web site; World Wide Web; documentation representation; object extraction; spatial distribution characteristic; Data mining; Documentation; HTML; Humans; Information technology; Intelligent agent; Intelligent structures; Laboratories; Web pages; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-0-7695-3496-1
Type :
conf
DOI :
10.1109/WIIAT.2008.371
Filename :
4740735
Link To Document :
بازگشت