DocumentCode
589898
Title
Improving navigation page detection by using DOM-based block text identification
Author
Li Yue ; Dong Shou-bin ; Zheng Xiang ; Ma Bin-Hua
Author_Institution
Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
fYear
2012
fDate
21-23 Nov. 2012
Firstpage
129
Lastpage
134
Abstract
Internet changes very fast, it is necessary to classify the web pages for different usages. According to user purpose, web pages can be classified into navigation pages and content pages. To detect navigation pages is useful for web crawling, topical detection, etc. In this paper, we use DOM-Based block text identification method to improve navigation pages detection. Experimental results suggest that, compared to prior methods, our method is more effective.
Keywords
Internet; pattern classification; text analysis; DOM-based block text identification; Internet; Web crawling; Web page classification; content page; navigation page detection; topical detection; Abstracts; Bars; Business; HTML; Navigation; Noise; Web pages; DOM; block text identification; navigation pages; web pages classification;
fLanguage
English
Publisher
ieee
Conference_Titel
ICT and Knowledge Engineering (ICT & Knowledge Engineering), 2012 10th International Conference on
Conference_Location
Bangkok
ISSN
2157-0981
Print_ISBN
978-1-4673-2316-1
Type
conf
DOI
10.1109/ICTKE.2012.6408541
Filename
6408541
Link To Document