Title :
Building Web Page Logical Structure Model towards Effective Metadata Extraction
Author :
Zhou, Baoyao ; Zhang, Ming
Author_Institution :
Hewlett-Packard Labs. China, Beijing, China
Abstract :
Web pages are typical semi-structure data. Some tree-based models have been proposed to describe the semantic content structure of web pages in order to facilitate further content analysis. However, most existing models only present the segmentation hierarchy of content blocks rather than the semantic relationships among them. In this work, we propose a novel web page semantic structure model, called Logical Structure Model. It can present more comprehensive structure information of web pages. Based on this model, the hidden patterns in web content can be revealed easier. The proposed model has been used to facilitate identifying course metadata in our Online Course Organization project, which aims to build an online course portal to serve the course information obtained from the Web.
Keywords :
Web design; computer aided instruction; content management; educational courses; meta data; semantic Web; Web content; Web page semantic structure model; content analysis; content block; course information; course metadata; logical structure model; metadata extraction; online course portal; segmentation hierarchy; semantic content structure; semantic relationship; semistructure data; structure information; tree-based model; Blogs; Buildings; Computer science; Data mining; Data structures; HTML; Industrial relations; Portals; Technological innovation; Web pages; web metadata extraction; web page logical structure model;
Conference_Titel :
Web Conference (APWEB), 2010 12th International Asia-Pacific
Conference_Location :
Busan
Print_ISBN :
978-1-7695-4012-2
Electronic_ISBN :
978-1-4244-6600-9
DOI :
10.1109/APWeb.2010.81