• DocumentCode
    1912839
  • Title

    Development of a Website to Collect and Provide Questions about Book Titles Posted in Blogs and on Twitter

  • Author

    Arai, Shunsuke ; Tsuji, Keita

  • Author_Institution
    Grad. Sch. of Libr., Inf. & Media Studies, Univ. of Tsukuba, Tsukuba, Japan
  • fYear
    2012
  • fDate
    20-22 Sept. 2012
  • Firstpage
    9
  • Lastpage
    13
  • Abstract
    There are some people who post questions related to book titles in their blogs or on Twitter. If we develop a website that automatically collects such questions and asks for answers, other people who know the answers to these questions can respond efficiently. Hence, we have developed a method to semi-automatically collect questions from blogs and tweets, and we have built a website to display these questions. The proposed data collection method consists of two steps: (1) submission of words (to a search engine) that are characteristic to questions in order to obtain blog articles and tweets that are likely to contain questions, and (2) the use of automatic text classification to extract articles and tweets containing the questions. Through step (1), we extract characteristic words from 400 articles and tweets. In step (2), we adopt four classification methods (support vector machine (SVM), Naive Bayes, decision tree, and boosting) and compare their effectiveness by using 1,900 articles and tweets. It is found that (1) the characteristic words "taitoru-ga-omoidase-nai" produce the best precision (16% for Google Blog Search and 13% for Twitter Search) and (2) boosting and decision tree methods produce the best classification for blogs and Twitter (their F values are 0.943 and 0.941, respectively). When we displayed 30 articles and 31 tweets containing questions on our website, six and five of them, respectively, received satisfactory answers.
  • Keywords
    Bayes methods; decision trees; pattern classification; question answering (information retrieval); search engines; social networking (online); text analysis; Google Blog Search; Twitter Search; Web site; article extraction; automatic text classification; blogs; book titles; boosting; characteristic words; classification method; decision tree; naive Bayes; search engine; support vector machine; taitoru-ga-omoidase-nai; tweet extraction; Blogs; Boosting; Search engines; Support vector machines; Text categorization; Twitter; Writing; Blog; Q&A; Reference Services; Text Classification; Twitter;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Applied Informatics (IIAIAAI), 2012 IIAI International Conference on
  • Conference_Location
    Fukuoka
  • Print_ISBN
    978-1-4673-2719-0
  • Type

    conf

  • DOI
    10.1109/IIAI-AAI.2012.12
  • Filename
    6337237