مرکز منطقه ای اطلاع رساني علوم و فناوري - Comparing Tweet Classifications by Authors´ Hashtags, Machine Learning, and Human Annotators

DocumentCode :

3740085

Title :

Comparing Tweet Classifications by Authors´ Hashtags, Machine Learning, and Human Annotators

Author :

Chifumi Nishioka;Ansgar Scherp;Klaas Dellschaft

Author_Institution :

ZBW - Leibniz Inf. Centre for Econ., Germany

Volume :

fYear :

2015

Firstpage :

Lastpage :

Abstract :

Over the last years, many papers have been published about how to use machine learning for classifying postings on microblogging platforms like Twitter, e.g., in order to assist users to reach tweets that interest them. Typically, the automatic classification results are then evaluated against a gold standard classification which consists of either (i) the hashtags of the tweets´ authors, or (ii) manual annotations of independent human annotators. In this paper, we show that there are fundamental differences between these two kinds of gold standard classifications, i.e., human annotators are more likely to classify tweets like other human annotators than like the tweets´ authors. Furthermore, we discuss how these differences may influence the evaluation of automatic classifications, like they may be achieved by Latent Dirichlet Allocation (LDA). We argue that researchers who conduct machine learning experiments for tweet classification should pay particular attention to the kind of gold standard they use. One may even argue that hashtags are not appropriate as a gold standard for tweet classification.

Keywords :

"Twitter","Tagging","Gold","Standards","Resource management","Electronic mail","Probability distribution"

Publisher :

ieee

Conference_Titel :

Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on

Type :

conf

DOI :

10.1109/WI-IAT.2015.69

Filename :

7396781

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3740085