DocumentCode :
3023130
Title :
A learning approach to discovering Web page semantic structures
Author :
Feng, Junlan ; Haffner, Patrick ; Gilbert, Mazin
fYear :
2005
fDate :
29 Aug.-1 Sept. 2005
Firstpage :
1055
Abstract :
This paper proposes a learning approach for discovering the semantic structure of Web pages. The task includes partitioning the text on a Web page into information blocks and identifying their semantic categories. We employed two machine learning techniques, Adaboost and SVMs, to learn from a labeled Web page corpus. We evaluated our approach on general Web pages from the World Wide Web and obtained encouraging results. This work can be beneficial to a number of Web-driven applications such as search engines, Web-based question answering, Web-based data mining as well as voice enabled Web navigation.
Keywords :
Web sites; learning (artificial intelligence); support vector machines; text analysis; Adaboost; Web navigation; Web page semantic structure; Web-based data mining; Web-based question answering; World Wide Web; machine learning; search engine; semantic category; support vector machine; text partitioning; Data mining; HTML; Humans; Image segmentation; Machine learning; Navigation; Partitioning algorithms; Search engines; Web pages; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
ISSN :
1520-5263
Print_ISBN :
0-7695-2420-6
Type :
conf
DOI :
10.1109/ICDAR.2005.19
Filename :
1575705
Link To Document :
بازگشت