Title :
Development of a Website to Collect and Provide Questions about Book Titles Posted in Blogs and on Twitter
Author :
Arai, Shunsuke ; Tsuji, Keita
Author_Institution :
Grad. Sch. of Libr., Inf. & Media Studies, Univ. of Tsukuba, Tsukuba, Japan
Abstract :
There are some people who post questions related to book titles in their blogs or on Twitter. If we develop a website that automatically collects such questions and asks for answers, other people who know the answers to these questions can respond efficiently. Hence, we have developed a method to semi-automatically collect questions from blogs and tweets, and we have built a website to display these questions. The proposed data collection method consists of two steps: (1) submission of words (to a search engine) that are characteristic to questions in order to obtain blog articles and tweets that are likely to contain questions, and (2) the use of automatic text classification to extract articles and tweets containing the questions. Through step (1), we extract characteristic words from 400 articles and tweets. In step (2), we adopt four classification methods (support vector machine (SVM), Naive Bayes, decision tree, and boosting) and compare their effectiveness by using 1,900 articles and tweets. It is found that (1) the characteristic words "taitoru-ga-omoidase-nai" produce the best precision (16% for Google Blog Search and 13% for Twitter Search) and (2) boosting and decision tree methods produce the best classification for blogs and Twitter (their F values are 0.943 and 0.941, respectively). When we displayed 30 articles and 31 tweets containing questions on our website, six and five of them, respectively, received satisfactory answers.
Keywords :
Bayes methods; decision trees; pattern classification; question answering (information retrieval); search engines; social networking (online); text analysis; Google Blog Search; Twitter Search; Web site; article extraction; automatic text classification; blogs; book titles; boosting; characteristic words; classification method; decision tree; naive Bayes; search engine; support vector machine; taitoru-ga-omoidase-nai; tweet extraction; Blogs; Boosting; Search engines; Support vector machines; Text categorization; Twitter; Writing; Blog; Q&A; Reference Services; Text Classification; Twitter;
Conference_Titel :
Advanced Applied Informatics (IIAIAAI), 2012 IIAI International Conference on
Conference_Location :
Fukuoka
Print_ISBN :
978-1-4673-2719-0
DOI :
10.1109/IIAI-AAI.2012.12