• DocumentCode
    480709
  • Title

    Cool Blog Identi?cation Using Topic-Based Models

  • Author

    Sriphaew, Kritsada ; Takamura, Hiroya ; Okumura, Manabu

  • Author_Institution
    Precision & Intell. Lab., Tokyo Inst. of Technol., Yokohama
  • Volume
    1
  • fYear
    2008
  • fDate
    9-12 Dec. 2008
  • Firstpage
    402
  • Lastpage
    406
  • Abstract
    Among a huge number of blogs on the internet, only some of them are considered to have great contents and worth to be explored. We call such kind of blogs cool blogs and attempt to identify them. To solve the cool blog identification problem, we consider three assumptions on cool blogs: (1) cool blogs tend to have definite topics, (2) cool blogs tend to have sufficient amount of blog entries, and (3) cool blogs tend to have certain levels of topic consistency among their blog entries. Corresponding to these assumptions, we extract a mixture of topic probabilities using a topic model, exploit the number of blog entries of each blog, and calculate the topic consistency among blog entries using distance functions over topic probabilities, respectively. We show the benefits of the proposed assumptions through these features. A feature unification model is also presented to achieve highest effectiveness. The experimental results on Japanese blog data show that we can improve the classification results by applying proposed assumptions.
  • Keywords
    Internet; Web sites; classification; probability; Internet; classification; cool blog identification; distance functions; topic probabilities; topic-based models; Books; Data mining; Feature extraction; Information retrieval; Information services; Intelligent agent; Internet; Laboratories; Probability; Web sites; Cool blog; topic consistency; topic-based model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-0-7695-3496-1
  • Type

    conf

  • DOI
    10.1109/WIIAT.2008.401
  • Filename
    4740482