Title :
Nonparametric Estimation of the Number of Unique Sequences in Biological Samples
Author :
Xu, Changjiang ; Xu, Luzhou ; Yu, Fahong ; Tan, Weihong ; Moroz, Leonid L. ; Li, Jian
Author_Institution :
Dept. of Telecommun. Eng., Nanjing Univ. of Posts & Telecommun.
Abstract :
Large-scale determination of uniquely expressed genes (or mRNAs) in specific cells and tissues is a challenging problem in computational and functional genomics. We consider nonparametric approaches for estimating the number of unique, nonredundant sequences in biological samples. By introducing the moments of species´ abundance in a population, we analyze the relative abundance of species in the population and present a lower bound estimator and so-called medial estimator for the number of distinct species in the population. The lower bound estimate is applicable to populations with small coefficients of variation (CV). The medial estimator works well for the populations with relatively large CV, especially gene expression data. Simulation analysis shows that the medial estimator performs better than existing methods. Finally, we apply our nonparametric approaches to estimate the number of expressed mRNAs in a normal colon epithelial tissue as well as unique clones in an amplified cDNA sample prepared from the CNS of the sea-slug Aplysia
Keywords :
DNA; biological tissues; genetics; sequences; statistical analysis; amplified cDNA sample; biological samples; coefficients of variation; computational genomics; functional genomics; gene expression data; lower bound estimator; mRNA; medial estimator; nonparametric estimation; normal colon epithelial tissue; sea-slug Aplysia; specie abundance; unique sequences; uniquely expressed genes; Analytical models; Bioinformatics; Biology computing; Cloning; Colon; Data analysis; Gene expression; Genomics; Large-scale systems; Performance analysis; Aplysia; expressed sequence tags; genomics; nonparametric estimation; relative abundance of species; transcriptome;
Journal_Title :
Signal Processing, IEEE Transactions on
DOI :
10.1109/TSP.2006.880211