Development of a Website to Collect and Provide Questions about Book Titles Posted in Blogs and on Twitter

Author

Arai, Shunsuke ; Tsuji, Keita

Author_Institution

Grad. Sch. of Libr., Inf. & Media Studies, Univ. of Tsukuba, Tsukuba, Japan

fYear

2012

fDate

20-22 Sept. 2012

Firstpage

9

Lastpage

13

Abstract

There are some people who post questions related to book titles in their blogs or on Twitter. If we develop a website that automatically collects such questions and asks for answers, other people who know the answers to these questions can respond efficiently. Hence, we have developed a method to semi-automatically collect questions from blogs and tweets, and we have built a website to display these questions. The proposed data collection method consists of two steps: (1) submission of words (to a search engine) that are characteristic to questions in order to obtain blog articles and tweets that are likely to contain questions, and (2) the use of automatic text classification to extract articles and tweets containing the questions. Through step (1), we extract characteristic words from 400 articles and tweets. In step (2), we adopt four classification methods (support vector machine (SVM), Naive Bayes, decision tree, and boosting) and compare their effectiveness by using 1,900 articles and tweets. It is found that (1) the characteristic words "taitoru-ga-omoidase-nai" produce the best precision (16% for Google Blog Search and 13% for Twitter Search) and (2) boosting and decision tree methods produce the best classification for blogs and Twitter (their F values are 0.943 and 0.941, respectively). When we displayed 30 articles and 31 tweets containing questions on our website, six and five of them, respectively, received satisfactory answers.

Keywords

Bayes methods; decision trees; pattern classification; question answering (information retrieval); search engines; social networking (online); text analysis; Google Blog Search; Twitter Search; Web site; article extraction; automatic text classification; blogs; book titles; boosting; characteristic words; classification method; decision tree; naive Bayes; search engine; support vector machine; taitoru-ga-omoidase-nai; tweet extraction; Blogs; Boosting; Search engines; Support vector machines; Text categorization; Twitter; Writing; Blog; Q&A; Reference Services; Text Classification; Twitter;

fLanguage

English

Publisher

ieee

Conference_Titel

Advanced Applied Informatics (IIAIAAI), 2012 IIAI International Conference on

Conference_Location

Fukuoka

Print_ISBN

978-1-4673-2719-0

Type

conf

DOI

10.1109/IIAI-AAI.2012.12

Filename

6337237