DocumentCode :
260923
Title :
Crawling the page flipping links
Author :
Priya, K. ; Dhanalakshmi, S.
Author_Institution :
Dept. of CSE, Arunai Eng. Coll., Thiruvannamalai, India
fYear :
2014
fDate :
27-28 Feb. 2014
Firstpage :
1
Lastpage :
6
Abstract :
The supervised web-scale forum crawler is to crawl relevant forum content from the web with minimum overhead. Forum threads contain information content that is the target of forum crawlers, each forums have different layouts or styles and have different forum software packages, they always have similar constant navigation paths connected by specific URL types to direct users from entry pages to thread page, we reduce the web forum crawling problem to a URL-type recognition problem. And shows how to learn accurate and effective regular expression patterns of constant navigation paths from automatically created training sets using aggregated results from weak page type classifiers. Robust page type classifiers can be experienced from as few as five annotated forums and applied to a large set of unseen forums. The results show that Focus achieved over 98 percent effectiveness and 97 percent coverage on a large set of test forums powered by over 150 different forum software packages., The results of applying Focus on more than 100 community, the concept of constant navigation path could apply to other social media site.
Keywords :
Internet; pattern classification; social networking (online); software packages; URL-type recognition problem; Web forum crawling problem; constant navigation path; constant navigation paths; forum software packages; page flipping links crawling; page type classifiers; social media site; weak page type classifiers; Crawlers; Data mining; Educational institutions; Indexes; Training; Uniform resource locators; Web pages; EIT path; ITF regex; URL pattern learning; URL type; forum crawling; page classification(PC); page type;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Communication and Embedded Systems (ICICES), 2014 International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4799-3835-3
Type :
conf
DOI :
10.1109/ICICES.2014.7033885
Filename :
7033885
Link To Document :
بازگشت