Title :
Structure and Content Based Blog Pages Identification
Author :
Yu, Feng ; Zheng, Dequan ; Zhao, Tiejun ; Cheng, Xiao
Author_Institution :
Sch. of Comput. & Inf. Eng., Harbin Univ. of Commerce, Harbin
Abstract :
Blog is becoming more and more popular with the rapid development of Internet. It needs to find an automatic way to distinguish the blog pages from ordinary Web pages for the content extraction of blog pages and the blog community discovered. Some basic concepts and ideas in the area of blog was described in this paper, and a method on the blog pages identification is proposed, which is based on the blog pages structure and blog content. The experimentation shows that a high result can be achieved in precision.
Keywords :
Internet; Web sites; Internet; Web pages; blog pages identification; Business; Fuzzy systems; Information services; Internet; Knowledge engineering; Navigation; Support vector machine classification; Support vector machines; Web pages; Web sites; Blog; Blog Structure and Content; Broad Blog; Narrow Blog;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Shandong
Print_ISBN :
978-0-7695-3305-6
DOI :
10.1109/FSKD.2008.371