DocumentCode
3023130
Title
A learning approach to discovering Web page semantic structures
Author
Feng, Junlan ; Haffner, Patrick ; Gilbert, Mazin
fYear
2005
fDate
29 Aug.-1 Sept. 2005
Firstpage
1055
Abstract
This paper proposes a learning approach for discovering the semantic structure of Web pages. The task includes partitioning the text on a Web page into information blocks and identifying their semantic categories. We employed two machine learning techniques, Adaboost and SVMs, to learn from a labeled Web page corpus. We evaluated our approach on general Web pages from the World Wide Web and obtained encouraging results. This work can be beneficial to a number of Web-driven applications such as search engines, Web-based question answering, Web-based data mining as well as voice enabled Web navigation.
Keywords
Web sites; learning (artificial intelligence); support vector machines; text analysis; Adaboost; Web navigation; Web page semantic structure; Web-based data mining; Web-based question answering; World Wide Web; machine learning; search engine; semantic category; support vector machine; text partitioning; Data mining; HTML; Humans; Image segmentation; Machine learning; Navigation; Partitioning algorithms; Search engines; Web pages; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
ISSN
1520-5263
Print_ISBN
0-7695-2420-6
Type
conf
DOI
10.1109/ICDAR.2005.19
Filename
1575705
Link To Document