DocumentCode :
2194563
Title :
ALPOS: A Machine Learning Approach for Analyzing Microblogging Data
Author :
Zhang, Dan ; Liu, Yan ; Lawrence, Richard D. ; Chenthamarakshan, Vijil
Author_Institution :
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
fYear :
2010
fDate :
13-13 Dec. 2010
Firstpage :
1265
Lastpage :
1272
Abstract :
With the development of Internet, the increasing volume of information posted on micro-blogging sites like Twitter necessitates the need for efficient information filtering. In conventional text classification problems, it is assumed that the feature vectors extracted from the available documents are sufficient to learn good classifiers. However, this conventional approach is not likely to work for Twitter due to the limited number of characters on each tweet. From a higher level, each tweet can be viewed as an abbreviated abstraction of a long document, and we only have a partial observation of this document. To solve the problem caused by the partial observations, we introduce a novel domain adaption/transfer learning approach called Assisted Learning for Partial Observation (ALPOS). The basic idea is to use a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). In particular, we learn a hidden, higher-level abstraction space, which is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space by using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space for recovery and classification. We compare the performance of this method with existing approaches on synthetic data and the well-known Reuters-21578 dataset. We also present experimental results on twitter classification.
Keywords :
Internet; data analysis; feature extraction; information filtering; learning (artificial intelligence); pattern classification; social networking (online); text analysis; ALPOS; Internet; Reuters- 21578 dataset; Twitter; assisted learning; document reconstruction error; domain transfer learning approach; feature vector extraction; higher level abstraction space; information filtering; machine learning approach; microblogging data analysis; multilabeled example; partial observation; text classification; Assisted Learning for Partial Observation; Text Classification; Transfer learning; Twitter;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
Type :
conf
DOI :
10.1109/ICDMW.2010.154
Filename :
5693439
Link To Document :
بازگشت