مرکز منطقه ای اطلاع رساني علوم و فناوري - Dimensionality Reduction on Heterogeneous Feature Space

DocumentCode :

2984618

Title :

Dimensionality Reduction on Heterogeneous Feature Space

Author :

Xiaoxiao Shi ; Yu, Paul

Author_Institution :

Comput. Sci. Dept., Univ. of Illinois at Chicago, Chicago, IL, USA

fYear :

2012

fDate :

10-13 Dec. 2012

Firstpage :

635

Lastpage :

644

Abstract :

Combining correlated data sources may help improve the learning performance of a given task. For example, in recommendation problems, one can combine (1) user profile database (e.g. genders, age, etc.), (2) users´ log data (e.g., clickthrough data, purchasing records, etc.), and (3) users´ social network (useful in social targeting) to build a recommendation model. All these data sources provide informative but heterogeneous features. For instance, user profile database usually has nominal features reflecting users´ background, log data provides term-based features about users´ historical behaviors, and social network database has graph relational features. Given multiple heterogeneous data sources, one important challenge is to find a unified feature subspace that captures the knowledge from all sources. To this aim, we propose a principle of collective component analysis (CoCA), in order to handle dimensionality reduction across a mixture of vector-based features and graph relational features. The CoCA principle is to find a feature subspace with maximal variance under two constraints. First, there should be consensus among the projections from different feature spaces. Second, the similarity between connected data (in any of the network databases) should be maximized. The optimal solution is obtained by solving an eigenvalue problem. Moreover, we discuss how to use prior knowledge to distinguish informative data sources, and optimally weight them in CoCA. Since there is no previous model that can be directly applied to solve the problem, we devised a straightforward comparison method by performing dimension reduction on the concatenation of the data sources. Three sets of experiments show that CoCA substantially outperforms the comparison method.

Keywords :

data mining; principal component analysis; social networking (online); vectors; CoCA; collective component analysis; correlated data sources; dimensionality reduction; graph relational features; heterogeneous feature space; maximal variance; multiple heterogeneous data sources; recommendation problem; social network database; social targeting; term-based feature; user historical behavior; user log data; user profile database; user social network; vector-based feature; Data models; Databases; Eigenvalues and eigenfunctions; Noise; Optimization; Principal component analysis; Social network services; component; formatting; style; styling;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining (ICDM), 2012 IEEE 12th International Conference on

Conference_Location :

Brussels

ISSN :

1550-4786

Print_ISBN :

978-1-4673-4649-8

Type :

conf

DOI :

10.1109/ICDM.2012.30

Filename :

6413864

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2984618