مرکز منطقه ای اطلاع رساني علوم و فناوري - Combining Super-Structuring and Abstraction on Sequence Classification

DocumentCode :

2773031

Title :

Combining Super-Structuring and Abstraction on Sequence Classification

Author :

Silvescu, Adrian ; Caragea, Cornelia ; Honavar, Vasant

Author_Institution :

Yahoo! Labs., Sunnyvale, CA, USA

fYear :

2009

fDate :

6-9 Dec. 2009

Firstpage :

986

Lastpage :

991

Abstract :

We present an approach to adapting the data representation used by a learner on sequence classification tasks. Our approach that exploits the complementary strengths of super-structuring (constructing complex features by combining existing features) and abstraction (grouping of similar features to generate more abstract features), yields smaller and, at the same time, accurate models. Super-structuring provides a way to increase the predictive accuracy of the learned models by enriching the data representation (and hence, increases the complexity of the learned models) whereas abstraction helps reduce the number of model parameters by simplifying the data representation. The results of our experiments on two data sets drawn from macromolecular sequence classification applications show that adapting data representation by combining super-structuring and abstraction, makes it possible to construct predictive models that use significantly smaller number of features (by one to three orders of magnitude) than those that are obtained using super-structuring alone, without sacrificing predictive accuracy. Our experiments also show that simplifying data representation using abstraction yields better performing models than those obtained using feature selection.

Keywords :

data structures; feature extraction; adapting data representation; combining existing features; combining super structuring; constructing complex features; data representation; data sets drawn; grouping similar features; learned models complexity; learner sequence classification tasks; macromolecular sequence classification; sequence classification; smaller number features; Accuracy; Amino acids; Biological system modeling; Computational biology; Computer science; Data mining; Diversity reception; Predictive models; Proteins; Sequences; abstraction; feature selection; super-structuring;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on

Conference_Location :

Miami, FL

ISSN :

1550-4786

Print_ISBN :

978-1-4244-5242-2

Electronic_ISBN :

1550-4786

Type :

conf

DOI :

10.1109/ICDM.2009.130

Filename :

5360344

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2773031