Title :
Twitter news classification using SVM
Author :
Dilrukshi, Inoshika ; De Zoysa, Kasun ; Caldera, Amitha
Author_Institution :
Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka
Abstract :
With the development of web blogs, Social Networks, many news providers used to share their news headlines in various web sites and web blogs. Now-a-days in Sri Lanka, there are many news groups whom share their news headlines in micro blogging services such as Twitter. These data may carry out much valuable information which will relevant to many social research areas. Thus, the purpose of this research is to classify news into different groups so that the user could identify the most popular news group in a given country for a given time. The short messages were extracted from Twitter micro blog. Several active news groups were chosen to extract the short messages. Each short message was classified manually into 12 groups. These classified data were used to train the machine learning techniques. Words of each short message was considered as features and a feature vector was created using bag-of-words approach in order to create the instances. The data were trained using SVM (Support Vector Machine) machine learning techniques. The main reason of using SVM for the current study is, SVM supports high dimensional data. Current research is a high dimensional problem as a large number of features will be collected using short messages. Cross validation was done in order to avoid the biasness of data. The performance of the system will be the effectiveness of the system. Thus precision and recall values are calculated to measure the performance of the system. Fβ was calculated to obtain a single value measurement. The results show that the system provides high performance for most groups. However, the group development-government does not show much performance using SVM.
Keywords :
data mining; electronic messaging; electronic publishing; information retrieval; learning (artificial intelligence); pattern classification; social networking (online); support vector machines; SVM machine learning techniques; Sri Lanka; Web blogs; Web sites; bag-of-words approach; data training; feature vector; group development government; high dimensional data; short message classification; short message extraction; social networks; support vector machine; twitter microblog; twitter news classification; Accidents; Blogs; Computers; Education; Support vector machines; Vectors; SVM; Text classification; Web mining;
Conference_Titel :
Computer Science & Education (ICCSE), 2013 8th International Conference on
Conference_Location :
Colombo
Print_ISBN :
978-1-4673-4464-7
DOI :
10.1109/ICCSE.2013.6553926