• DocumentCode
    660864
  • Title

    Automatic Labeling of Training Data for Collecting Tweets for Ambiguous TV Program Titles

  • Author

    Erdmann, Michael ; Ward, Elizabeth Sally ; Ikeda, Ken-ichi ; Hattori, Gen-Ya ; Ono, C. ; Takishima, Y.

  • Author_Institution
    KDDI R&D Labs., Inc., Ohara, Japan
  • fYear
    2013
  • fDate
    8-14 Sept. 2013
  • Firstpage
    796
  • Lastpage
    802
  • Abstract
    Twitter is a popular medium for sharing opinions on TV programs, and the analysis of TV related tweets is attracting a lot of interest. However, when collecting all tweets containing a given TV program title, we obtain a large number of unrelated tweets, due to the fact that many of the TV program titles are ambiguous. Using supervised learning, TV related tweets can be collected with high accuracy. The goal of our proposed method is to automate the labeling process, in order to eliminate the cost required for data labeling without sacrificing classification accuracy. When creating the training data, we use only tweets of unambiguous TV program titles. In order to decide whether a TV program title is ambiguous, we automatically determine whether it can be used as a common expression or named entity. In two experiments, in which we collected tweets for 32 ambiguous TV program titles, we achieved the same (78.2%) or even higher classification accuracy (79.1%) with automatically labeled training data as with manually labeled data, while effectively eliminating labeling costs.
  • Keywords
    learning (artificial intelligence); pattern classification; social networking (online); television; TV related tweets; ambiguous TV program titles; automatically labeled training data; classification accuracy; labeling costs; manually labeled data; supervised learning; tweet collection; Accuracy; Electronic publishing; Encyclopedias; Feature extraction; Internet; TV;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Social Computing (SocialCom), 2013 International Conference on
  • Conference_Location
    Alexandria, VA
  • Type

    conf

  • DOI
    10.1109/SocialCom.2013.119
  • Filename
    6693416