DocumentCode :
2315791
Title :
ContentEx: A framework for automatic content extraction programs
Author :
Song, Linhai ; Cheng, Xueqi ; Guo, Yan ; Liu, Yue ; Ding, Guodong
Author_Institution :
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
fYear :
2009
fDate :
8-11 June 2009
Firstpage :
188
Lastpage :
190
Abstract :
Web pages are often decorated with extraneous information (such as navigation bars, branding banners, JavaScript and advertisements). This kind of information may distract users from actual content they are really interested in and may reduce effects of many advanced Web applications. Automatic content extraction has many applications ranging from providing data for Web mining to realizing better accessing the Web over mobile devices. In this paper, we propose ContentEx, a framework for automatic content extraction programs, which we use to organize codes of automatic content extraction programs and to facilitate the development of related solutions. We also introduce how we extract content from forum pages in this framework to fulfill the requirement from our actual application.
Keywords :
Internet; data mining; mobile computing; Web mining; Web page; automatic content extraction program; contentEx framework; forum page; mobile device; Algorithm design and analysis; Bars; Cellular phones; Computers; Data mining; Databases; Java; Navigation; Web mining; Web pages; ContentEx; automatic content extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics, 2009. ISI '09. IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4171-6
Electronic_ISBN :
978-1-4244-4173-0
Type :
conf
DOI :
10.1109/ISI.2009.5137298
Filename :
5137298
Link To Document :
بازگشت