DocumentCode
480709
Title
Cool Blog Identi?cation Using Topic-Based Models
Author
Sriphaew, Kritsada ; Takamura, Hiroya ; Okumura, Manabu
Author_Institution
Precision & Intell. Lab., Tokyo Inst. of Technol., Yokohama
Volume
1
fYear
2008
fDate
9-12 Dec. 2008
Firstpage
402
Lastpage
406
Abstract
Among a huge number of blogs on the internet, only some of them are considered to have great contents and worth to be explored. We call such kind of blogs cool blogs and attempt to identify them. To solve the cool blog identification problem, we consider three assumptions on cool blogs: (1) cool blogs tend to have definite topics, (2) cool blogs tend to have sufficient amount of blog entries, and (3) cool blogs tend to have certain levels of topic consistency among their blog entries. Corresponding to these assumptions, we extract a mixture of topic probabilities using a topic model, exploit the number of blog entries of each blog, and calculate the topic consistency among blog entries using distance functions over topic probabilities, respectively. We show the benefits of the proposed assumptions through these features. A feature unification model is also presented to achieve highest effectiveness. The experimental results on Japanese blog data show that we can improve the classification results by applying proposed assumptions.
Keywords
Internet; Web sites; classification; probability; Internet; classification; cool blog identification; distance functions; topic probabilities; topic-based models; Books; Data mining; Feature extraction; Information retrieval; Information services; Intelligent agent; Internet; Laboratories; Probability; Web sites; Cool blog; topic consistency; topic-based model;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
Conference_Location
Sydney, NSW
Print_ISBN
978-0-7695-3496-1
Type
conf
DOI
10.1109/WIIAT.2008.401
Filename
4740482
Link To Document