DocumentCode :
1904482
Title :
Web 2.0 content extraction
Author :
Waqar, Mohannmad ; Khan, Zeeshan Shafi
Author_Institution :
Center of Excellence in Inf. Assurance, King Saud Univ., Riyadh, Saudi Arabia
fYear :
2010
fDate :
8-11 Nov. 2010
Firstpage :
1
Lastpage :
3
Abstract :
This paper presents a simple, efficient and extendable solution for content extraction from web 2.0. Web 2.0 is perceived as the second generation of the web technologies. Web 2.0 has undoubtedly made significant impact in enriching the end-user experience and allowing programmers to write more interactive desktop-like applications for the web. However, it has also introduced some new issues for researchers in the field information retrieval and has made the job of information retrieval from web more difficult, time consuming and challenging. Web pages contain lot of clutter besides the original article. To extract the main content several methods have been developed. However, these methods were originally designed based on the traditional model of the web, and would fail to work on web 2.0 content. Due to evident popularity of web 2.0, the volume of the web 2.0 content on the Web will rise sharply in the coming years. In this paper we propose a new solution to this problem, based upon open source components, which will make the job of web 2.0 content extraction more efficient and will reduce the utilization of precious system resources. The paper also presents a high level logical design for the implementation of such system though available open source components.
Keywords :
Internet; data mining; information retrieval; Web 2.0 content extraction; Web technology; high level logical design; information retrieval; open source component; Browsers; Engines; HTML; IP networks; Loading; Presses; Servers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Internet Technology and Secured Transactions (ICITST), 2010 International Conference for
Conference_Location :
London
Print_ISBN :
978-1-4244-8862-9
Electronic_ISBN :
978-0-9564263-6-9
Type :
conf
Filename :
5678554
Link To Document :
بازگشت