• DocumentCode
    1811194
  • Title

    Are the URLs really popular in microblog messages?

  • Author

    Cui, Anqi ; Zhang, Min ; Liu, Yiqun ; Ma, Shaoping

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • fYear
    2011
  • fDate
    15-17 Sept. 2011
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Microblogging services are attracting people and companies to share their ideas and interests. Since the texts of microblog messages are limited, people post URLs to link to other websites for detailed information. Hence, URLs with higher attentions are spread widely and represent popular information. However, not all these URLs are useful. Many of them are spam URLs which are posted by automated agents or by pushing services from other websites automatically. Based on the features of the popular URLs, we divide them into four categories and propose a clustering and classification algorithm to distinguish spam URLs from the really popular ones. Comparative experiments are conducted on English (Twitter) and Chinese (Sina Weibo) messages. We conclude that more than half of the popular URLs are spam. Most of them are pushed from other websites; even the really popular ones gain much attention from the pushing services. Although the proportions of URLs in Twitter and Sina Weibo messages are different, the characteristics of the spam URLs are similar. Our method is efficient for detecting spam URLs and their authors without annotations, and is helpful for both research and business on microblog.
  • Keywords
    Web sites; pattern classification; pattern clustering; unsolicited e-mail; Sina Weibo message; Twitter message; classification algorithm; clustering algorithm; microblog message; microblogging service; spam URL detection; Classification algorithms; Internet; Robots; Twitter; Unsolicited electronic mail; Videos; Microblogging; Sina Weibo; Twitter; spam URL; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-61284-203-5
  • Type

    conf

  • DOI
    10.1109/CCIS.2011.6045021
  • Filename
    6045021