DocumentCode :
260930
Title :
Learning based web crawl forum
Author :
Hemakumar, K. ; Prakash, B.
fYear :
2014
fDate :
27-28 Feb. 2014
Firstpage :
1
Lastpage :
7
Abstract :
The main objective of this project is to crawl applicable forum content from the web with minimal overhead. Forum threads usually contain the information content that is the target of the forum crawlers. The system that is to be created for learn URL patterns across multiple sites and automatically finds a forum´s entry page given a page from the forum. The forum has different layouts, styles and a generic crawler that blindly follows the duplicate links and uninformative page will crawl duplicate pages. The test results will show that the proposed system achieved effectiveness and coverage on a large set of test forums.
Keywords :
data mining; social networking (online); URL patterns; data mining; duplicate links; forum content; forum crawlers; forum entry page; forum layouts; forum styles; forum threads; generic crawler; information content; learning based Web crawl forum; uninformative page; Crawlers; Data mining; Educational institutions; Feature extraction; Indexes; Internet; Uniform resource locators; EIT path; ITF regex; URL type; forum crawling; page classification; page type;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Communication and Embedded Systems (ICICES), 2014 International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4799-3835-3
Type :
conf
DOI :
10.1109/ICICES.2014.7033889
Filename :
7033889
Link To Document :
بازگشت