Title :
Using empirical risk minimization to detect community structure in the blogosphere
Author :
Huang, Jiaxuan ; Huang, Hongsen
Author_Institution :
Coll. of Comput. Sci., Zhejiang Univ., Hangzhou, China
Abstract :
When we are dealing with community structure detecting in the blogosphere, we have come to face some obstacles. The data in a blog may be updated frequently by its owner, making the whole blogosphere become very large during a short period of time. It can be very expensive to deal with such huge amount of data using those traditional methods. Meanwhile, few blogs in the blogosphere can be identified as members of a specify community clearly from their own characters, while we have to judge most blogs depending on the relationship with other neighboring blogs using centrality metrics. Recently, a new method that combines active learning and semi-supervised learning gives quite a good performance on improving the speed and accuracy of machine learning on large scale of data. In this paper, we employ this method to solve the community clustering problem with a vast and complex data set. We try to show that this method really does a better job on labeling and clustering large scale of data by comparing the result with the one achieved in the traditional way. Afterward, we may make some improvements and use it to deal with community detecting in the blogosphere.
Keywords :
learning (artificial intelligence); minimisation; risk management; set theory; social networking (online); active learning; blogosphere; centrality metrics; community clustering problem; community structure detection; complex data set; empirical risk minimization; machine learning; semisupervised learning; Accuracy; Communities; Dolphins; Internet; Machine learning; Risk management; Web sites; active learning; blogosphere; community structure; empirical risk minimization; semi-supervised learning;
Conference_Titel :
Intelligent Systems and Knowledge Engineering (ISKE), 2010 International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4244-6791-4
DOI :
10.1109/ISKE.2010.5680843