Author :
Kim, Su-Do ; Kim, Sung-Hwan ; Cho, Hwan-Gue
Author_Institution :
Center of U-Port IT Res. & Educ., Pusan Nat. Univ., Busan, South Korea
Abstract :
A Blog provides commentary, news, or content on a particular subject. The important part of many blogs is interactive format. Sometimes, there is a heated debate on a topic, any article becomes a political or sociological issue. However, users not pay much attention to most articles. So, how we can predict the popularity of articles in advance and what is a standard for popularity? In this paper, we propose a methodology to predict the popularity of an article. First, we use an analogy between the virtual temperature and the popularity of the on-line articles. Thus, we define four different types of discrete temperature scale, such as explosive, hot, warm, and cold, according to the number of reviews in the saturated state of the article. We are concerned with how to predict the final temperature of the submitted articles in the internet Web-blog space. An experimental data set was collected from the articles submitted to "SEOPRISE", a well known political discussion blog in Korea that more than 50,000 users visit per day. The hit count is used as a factor to predict the popularity, analogous to the number of viewers in the popularity of movies. We calculated the saturation point using the variation of hit count over the lifetime. We derived a sound regression model to predict the popularity temperature of the subject article in terms of the hit counts at the saturation point via the correlation coefficient of hourly hit count and hit count of the saturation point. We can predict the popularity temperature of Internet discussion articles using the hit count of the saturation point with more than 70% accuracy, exploiting only the first 30 minutes\´ information. Because of low predictive value of explosive, the results of prediction were worse than we think. In the hot, warm, and cold categories, we can predict more than 86% accuracy from 30 minutes and more than 90% accuracy from 70 minutes.
Keywords :
Internet; Web sites; regression analysis; social sciences; Internet Web blog space; Korea; SEOPRISE; Web blog articles; discrete temperature scale; interactive format; online popularity measurement tool; political discussion blog; political issue; regression model; sociological issue; virtual temperature; Blogs; Correlation; Error analysis; Explosives; Mathematical model; Motion pictures; Predictive models; Discussion Blog; Hit count; Popularity; Prediction;