DocumentCode :
2961482
Title :
Defining and Evaluating Blog Characteristics
Author :
Tellez, F.P. ; Pinto, David ; Cardiff, John ; Rosso, Paolo
Author_Institution :
Social Media Res. Group, Inst. of Technol. Tallaght, Dublin, Ireland
fYear :
2009
fDate :
9-13 Nov. 2009
Firstpage :
97
Lastpage :
102
Abstract :
The analysis of Weblogs has become a popular area of natural language processing. Due to their specific characteristics, such as shortness, vocabulary size and nature,etc. it can be difficult to achieve good results using automated clustering techniques. In particular, their nature can vary considerably, both in length and in breadth of topic. Without apriori knowledge of the nature of a blog it is difficult to achieve accurate clustering results. In this paper, we present a framework for the assessment of a set of corpus features that will provide us with insight into their nature from a number of perspectives including shortness, broadness and class imbalance. This in turn allows us to assess the relative hardness of the clustering task and to identify components that can improve the accuracy of the clustering task. We furthermore present the results of some experiments in which we analyzed the features of two sample blog corpora, and we compared the results with other kinds of short texts.
Keywords :
Web sites; natural language processing; blog characteristic; clustering task; natural language processing; weblogs; Abstracts; Artificial intelligence; Government; Information analysis; Information services; Internet; Natural language processing; Publishing; Vocabulary; Web sites; Blogs; Characterization; Short text;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Artificial Intelligence, 2009. MICAI 2009. Eighth Mexican International Conference on
Conference_Location :
Guanajuato
Print_ISBN :
978-0-7695-3933-1
Type :
conf
DOI :
10.1109/MICAI.2009.21
Filename :
5372711
Link To Document :
بازگشت