Title :
Topic classification in Romanian blogosphere
Author :
Vasile, Adrian ; Radulescu, Roxana ; Pavaloiu, Ionel-Bujorel
Author_Institution :
Dept. of Electr. Eng., Univ. Politeh. of Bucharest, Bucharest, Romania
Abstract :
In this paper we analyze the performance of several methods for classification applied to the Romanian blogosphere. Blogs are difficult to categorize by humans and machines alike, because they are written in a changeable style. In the early days of web, directories maintained by humans could not keep up millions the websites; likewise, blog directories cannot keep up with the explosive growth of the blogsphere. This paper investigates the efficacy of using machine learning to categorize blogs written in Romanian language belonging to the Romanian blogosphere. We design a text classification experiment to categorize Romanian blogs into nine topics. The baseline feature is unigrams weighed by TF-IDF. We analyze the corpus, features, and the result data.
Keywords :
Web sites; learning (artificial intelligence); pattern classification; text analysis; Romanian blogosphere; Romanian language; TF-IDF; blog directories; machine learning; term frequency-inverse document frequency; text classification experiment; topic classification; unigram feature; Accuracy; Blogs; Classification algorithms; Data mining; Frequency measurement; Media; Training; Blogs; Classification algorithms; Machine learning; Web mining;
Conference_Titel :
Neural Network Applications in Electrical Engineering (NEUREL), 2014 12th Symposium on
Conference_Location :
Belgrade
Print_ISBN :
978-1-4799-5887-0
DOI :
10.1109/NEUREL.2014.7011480