DocumentCode :
1799975
Title :
Topic classification in Romanian blogosphere
Author :
Vasile, Adrian ; Radulescu, Roxana ; Pavaloiu, Ionel-Bujorel
Author_Institution :
Dept. of Electr. Eng., Univ. Politeh. of Bucharest, Bucharest, Romania
fYear :
2014
fDate :
25-27 Nov. 2014
Firstpage :
131
Lastpage :
134
Abstract :
In this paper we analyze the performance of several methods for classification applied to the Romanian blogosphere. Blogs are difficult to categorize by humans and machines alike, because they are written in a changeable style. In the early days of web, directories maintained by humans could not keep up millions the websites; likewise, blog directories cannot keep up with the explosive growth of the blogsphere. This paper investigates the efficacy of using machine learning to categorize blogs written in Romanian language belonging to the Romanian blogosphere. We design a text classification experiment to categorize Romanian blogs into nine topics. The baseline feature is unigrams weighed by TF-IDF. We analyze the corpus, features, and the result data.
Keywords :
Web sites; learning (artificial intelligence); pattern classification; text analysis; Romanian blogosphere; Romanian language; TF-IDF; blog directories; machine learning; term frequency-inverse document frequency; text classification experiment; topic classification; unigram feature; Accuracy; Blogs; Classification algorithms; Data mining; Frequency measurement; Media; Training; Blogs; Classification algorithms; Machine learning; Web mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Network Applications in Electrical Engineering (NEUREL), 2014 12th Symposium on
Conference_Location :
Belgrade
Print_ISBN :
978-1-4799-5887-0
Type :
conf
DOI :
10.1109/NEUREL.2014.7011480
Filename :
7011480
Link To Document :
بازگشت