DocumentCode :
2709713
Title :
Enhancing the Stability of Spectral Ordering with Sparsification and Partial Supervision: Application to Paleontological Data
Author :
Mavroeidis, Dimitrios ; Bingham, Ella
Author_Institution :
Dept. of Inf., Athens Univ. of Econ. & Bus., Athens
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
462
Lastpage :
471
Abstract :
Recent studies have demonstrated the prospects of data mining algorithms for addressing the task of seriation in paleontological data (i.e. the age-based ordering of the sites of excavation). A prominent approach is spectral ordering that computes a similarity measure between the sites and orders them such that similar sites become adjacent and dissimilar sites are placed far apart. In the paleontological domain, the similarity measure is based on the mammal genera whose remains are retrieved at each site of excavation. Although spectral ordering achieves good performance in the seriation task, it ignores the background knowledge that is naturally present in the domain, as paleontologists can derive the ages of the sites of excavation within some accuracy. On the other hand, the age information is uncertain, so the best approach would be to combine the background knowledge with the information on mammal co-occurrences. Motivated by this kind of partial supervision we propose a novel semi-supervised spectral ordering algorithm. Our algorithm modifies the Laplacian matrix used in spectral ordering, such that domain knowledge of the ordering is taken into account. Also, it performs feature selection (sparsification) by discarding features that contribute most to the unwanted variability of the data in bootstrap sampling. The theoretical properties of the proposed algorithm are thoroughly analyzed and it is demonstrated that the proposed framework enhances the stability of the spectral ordering output and induces computational gains.
Keywords :
bootstrapping; data mining; geophysics computing; matrix algebra; palaeontology; Laplacian matrix; age-based ordering; bootstrap sampling; data mining algorithms; excavation sites; feature selection; mammal cooccurrences; mammal genera; paleontological data; semisupervised spectral ordering algorithm; similarity measure; spectral ordering stability; Algorithm design and analysis; Data mining; Eigenvalues and eigenfunctions; Informatics; Information technology; Laplace equations; Reliability theory; Sampling methods; Stability analysis; Uncertainty; Laplacian; eigengap; feature selection; spectral ordering; supervision;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location :
Pisa
ISSN :
1550-4786
Print_ISBN :
978-0-7695-3502-9
Type :
conf
DOI :
10.1109/ICDM.2008.120
Filename :
4781141
Link To Document :
بازگشت