DocumentCode
1811194
Title
Are the URLs really popular in microblog messages?
Author
Cui, Anqi ; Zhang, Min ; Liu, Yiqun ; Ma, Shaoping
Author_Institution
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear
2011
fDate
15-17 Sept. 2011
Firstpage
1
Lastpage
5
Abstract
Microblogging services are attracting people and companies to share their ideas and interests. Since the texts of microblog messages are limited, people post URLs to link to other websites for detailed information. Hence, URLs with higher attentions are spread widely and represent popular information. However, not all these URLs are useful. Many of them are spam URLs which are posted by automated agents or by pushing services from other websites automatically. Based on the features of the popular URLs, we divide them into four categories and propose a clustering and classification algorithm to distinguish spam URLs from the really popular ones. Comparative experiments are conducted on English (Twitter) and Chinese (Sina Weibo) messages. We conclude that more than half of the popular URLs are spam. Most of them are pushed from other websites; even the really popular ones gain much attention from the pushing services. Although the proportions of URLs in Twitter and Sina Weibo messages are different, the characteristics of the spam URLs are similar. Our method is efficient for detecting spam URLs and their authors without annotations, and is helpful for both research and business on microblog.
Keywords
Web sites; pattern classification; pattern clustering; unsolicited e-mail; Sina Weibo message; Twitter message; classification algorithm; clustering algorithm; microblog message; microblogging service; spam URL detection; Classification algorithms; Internet; Robots; Twitter; Unsolicited electronic mail; Videos; Microblogging; Sina Weibo; Twitter; spam URL; text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-61284-203-5
Type
conf
DOI
10.1109/CCIS.2011.6045021
Filename
6045021
Link To Document