Title :
Selecting Nodes with Inhomogeneous Profile for Labeling for Network-Based Semi-supervised Learning
Author :
Araujo, Bilza ; Liang Zhao
Author_Institution :
Dept. of Comput. Sci., ICMC - Univ. of Sao Paulo, Säo Carlos, Brazil
Abstract :
Network-based Semi-Supervised Learning (NbSSL) propagates labels in affinity-networks by taking advantage of the network topology likewise information spreading in trust networks. In NbSSL, not only the unlabeled data instances, but also the labeled ones, are able to bias the classification performance. Herein, we show some results and discussion on this phenomenon. Even the suitability of the free parameters of the NbSSL algorithms varies according to the available labeled data. Indeed, we propose a method for selecting representative data instances for labeling for NbSSL. In our sense the represent ability of a node is related to how inhomogeneous is its profile concerning the whole network. The proposed method uses Complex Networks centrality measures to identify which nodes present inhomogeneous profile. We perform this study by applying three NbSSL algorithms on Girvan-Newman and Lancichinetti-Fortunato-Radicchi modular networks. In the former, the nodes with high clustering coefficient are good representatives of the data and the nodes with high betweenness are the good representatives ones in the later. A high clustering coefficient means that the node lies in a much connected motif (clique) whereas a high betweenness means that the node lies interconnecting the modular structures. These results reveal the ability to improve the NbSSL performance by selecting representative data instances for manual labeling.
Keywords :
data mining; learning (artificial intelligence); network theory (graphs); pattern classification; pattern clustering; topology; Girvan-Newman modular network; Lancichinetti-Fortunato-Radicchi modular network; NbSSL labeling; affinity-networks; classification performance; clustering coefficient; complex networks centrality measures; data mining; inhomogeneous profile; manual labeling; modular structures; network topology; network-based semisupervised learning; node identification; node selection; representative data instance selection; trust networks; unlabeled data instances; Clustering algorithms; Complex networks; Labeling; Machine learning algorithms; Nonhomogeneous media; Semisupervised learning; Standards; classification; complex networks; data mining; semi-supervised learning;
Conference_Titel :
Computational Intelligence and 11th Brazilian Congress on Computational Intelligence (BRICS-CCI & CBIC), 2013 BRICS Congress on
Conference_Location :
Ipojuca
DOI :
10.1109/BRICS-CCI-CBIC.2013.77