DocumentCode :
3740085
Title :
Comparing Tweet Classifications by Authors´ Hashtags, Machine Learning, and Human Annotators
Author :
Chifumi Nishioka;Ansgar Scherp;Klaas Dellschaft
Author_Institution :
ZBW - Leibniz Inf. Centre for Econ., Germany
Volume :
1
fYear :
2015
Firstpage :
67
Lastpage :
74
Abstract :
Over the last years, many papers have been published about how to use machine learning for classifying postings on microblogging platforms like Twitter, e.g., in order to assist users to reach tweets that interest them. Typically, the automatic classification results are then evaluated against a gold standard classification which consists of either (i) the hashtags of the tweets´ authors, or (ii) manual annotations of independent human annotators. In this paper, we show that there are fundamental differences between these two kinds of gold standard classifications, i.e., human annotators are more likely to classify tweets like other human annotators than like the tweets´ authors. Furthermore, we discuss how these differences may influence the evaluation of automatic classifications, like they may be achieved by Latent Dirichlet Allocation (LDA). We argue that researchers who conduct machine learning experiments for tweet classification should pay particular attention to the kind of gold standard they use. One may even argue that hashtags are not appropriate as a gold standard for tweet classification.
Keywords :
"Twitter","Tagging","Gold","Standards","Resource management","Electronic mail","Probability distribution"
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on
Type :
conf
DOI :
10.1109/WI-IAT.2015.69
Filename :
7396781
Link To Document :
بازگشت