DocumentCode :
671607
Title :
Detecting and labeling representative nodes for network-based semi-supervised learning
Author :
Araujo, Bilza ; Liang Zhao
Author_Institution :
Dept. of Comput. Sci., Univ. of Sao Paulo, São Carlos, Brazil
fYear :
2013
fDate :
4-9 Aug. 2013
Firstpage :
1
Lastpage :
8
Abstract :
Network-based Semi-Supervised Learning (NBSSL) propagates labels in networks constructed from the original vector-based data sets taking advantage of the network topology. However, the NBSSL classification performance often varies according to the representativeness of the labeled data instances. Herein, we address this issue. We adopt heuristic criteria for selecting data items for manual labeling based on complex networks centrality measures. The numerical analysis are performed on Girvan and Newman homogeneous networks and Lancichinetti-Fortunato-Radicchi heterogeneous networks. Counterintuitively, we found that the highly connective nodes (hubs) are usually not representative, in the sense that random samples performs as well as them or even better. Other than expected, nodes with high clustering coefficient are good representatives of the data in homogeneous networks. On the other hand, in heterogeneous networks, nodes with high betweenness are the good representatives. A high clustering coefficient means that the node lies in a much connected motif (clique) and a high betweenness means that the node lies interconnecting modular structures. Moreover, aggregating the complex networks measures through Principal Components Analysis, we observed that the second principal component (Z2) exhibits potentially promising properties. It appears that Z2 is able to extract discriminative characteristics allowing finding good representatives of the data. Our results reveal that the performance of the NBSSL can be significantly improved by finding and labeling representative data instances.
Keywords :
complex networks; learning (artificial intelligence); pattern classification; pattern clustering; principal component analysis; Girvan and Newman homogeneous networks; Lancichinetti-Fortunato-Radicchi heterogeneous networks; NBSSL classification performance; clustering coefficient; heuristic criteria; interconnecting modular structures; network topology; network-based semisupervised learning; principal components analysis; representative node detection; representative node labeling; vector-based data sets; Benchmark testing; Clustering algorithms; Complex networks; Indexes; Labeling; Numerical analysis; Principal component analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location :
Dallas, TX
ISSN :
2161-4393
Print_ISBN :
978-1-4673-6128-6
Type :
conf
DOI :
10.1109/IJCNN.2013.6706948
Filename :
6706948
Link To Document :
بازگشت