Title :
Classification of big velocity data via cross-domain Canonical Correlation Analysis
Author :
Bo Zhang ; Zhong-Zhi Shi
Author_Institution :
Key Lab. of Intell. Inf., Process., Inst. of Comput. Technol., Beijing, China
Abstract :
Many classification techniques work well only under a common assumption that the training and test data are drawn from the same feature space and the same distribution. However, big velocity data usually show disobedience of this assumption. For example, in the field of web-document classification, new document is continuously emerging every day. Transfer learning aims at leveraging the knowledge in labeled source domains to predict the unlabeled data in a target domain, where the distributions are different in domains. As one of the important research directions of transfer learning, one kind of approaches focus on the correspondence between pivot features and all the other specific features from different domains, to extract some relevant features that may reduce the difference between the domains, have attracted wide attention and study. However, the limitation caused by the vague meanings in different domains prevents these algorithms from further improvement. To tackle this problem, we propose a cross-domain canonical correlation analysis algorithm called CD-CCA by applying Canonical Correlation Analysis (CCA) to transfer learning. CD-CCA can learn a semantic space of multi-view correspondences from different domains respectively and transfer the knowledge by dimensionality reduction in a multi-view way. Experimental results on the 144×6 classification problems in 20Newsgroups, show that CD-CCA can significantly improve the prediction accuracy.
Keywords :
Big Data; learning (artificial intelligence); pattern classification; statistical analysis; 20Newsgroups; Web-document classification; big velocity data; classification techniques; cross-domain canonical correlation analysis; data classification; dimensionality reduction; labeled source domains; multiview correspondences; semantic space learning; transfer learning; unlabeled data prediction; Algorithm design and analysis; Classification algorithms; Correlation; Feature extraction; Mutual information; Semantics; Vectors; big velocity data classification; canonical correlation analysis; transfer learning;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691612