DocumentCode :
3144853
Title :
Knowledge transfer with low-quality data: A feature extraction issue
Author :
Quanz, Brian ; Huan, Jun ; Mishra, Meenakshi
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Kansas Lawrence, Lawrence, KS, USA
fYear :
2011
fDate :
11-16 April 2011
Firstpage :
769
Lastpage :
779
Abstract :
Effectively utilizing readily available auxiliary data to improve predictive performance on new modeling tasks is a key problem in data mining. In this research the goal is to transfer knowledge between sources of data, particularly when ground truth information for the new modeling task is scarce or is expensive to collect where leveraging any auxiliary sources of data becomes a necessity. Towards seamless knowledge transfer among tasks, effective representation of the data is a critical but yet not fully explored research area for the data engineer and data miner. Here we present a technique based on the idea of sparse coding, which essentially attempts to find an embedding for the data by assigning feature values based on subspace cluster membership. We modify the idea of sparse coding by focusing the identification of shared clusters between data when source and target data may have different distributions. In our paper, we point out cases where a direct application of sparse coding will lead to a failure of knowledge transfer. We then present the details of our extension to sparse coding, by incorporating distribution distance estimates for the embedded data, and show that the proposed algorithm can overcome the shortcomings of the sparse coding algorithm on synthetic data and achieve improved predictive performance on a real world chemical toxicity transfer learning task.
Keywords :
data mining; data structures; encoding; feature extraction; knowledge management; learning (artificial intelligence); pattern clustering; chemical toxicity transfer learning task; data cluster; data engineer; data mining; data representation; data source; embedded data; feature extraction; knowledge transfer; low quality data; sparse coding; subspace cluster membership; synthetic data; Encoding; Equations; Estimation; Feature extraction; Kernel; Knowledge transfer; Optimization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2011 IEEE 27th International Conference on
Conference_Location :
Hannover
ISSN :
1063-6382
Print_ISBN :
978-1-4244-8959-6
Electronic_ISBN :
1063-6382
Type :
conf
DOI :
10.1109/ICDE.2011.5767917
Filename :
5767917
Link To Document :
بازگشت