DocumentCode
772603
Title
Nonstationary Analysis of Coding and Noncoding Regions in Nucleotide Sequences
Author
Bouaynaya, Nidhal ; Schonfeld, Dan
Author_Institution
Dept. of Syst. Eng., Univ. of Arkansas, Little Rock, AR
Volume
2
Issue
3
fYear
2008
fDate
6/1/2008 12:00:00 AM
Firstpage
357
Lastpage
364
Abstract
Previous statistical analysis efforts of DNA sequences revealed that noncoding regions exhibit long-range power law correlations, whereas coding regions behave like random sequences or sustain short-range correlations. A great deal of debate on the presence or absence of long-range correlations in nucleotide sequences, and more specifically in coding regions, has ensued. These results were obtained using signal processing techniques for stationary signals and statistical tools for signals with slowly varying trends superimposed on stationary signals. However, it can be verified using statistical tests that genomic sequences are nonstationary and the nature of their nonstationarity varies and is often much more complex than a simple trend. In this paper, we will bring to bear new tools to analyze nonstationary signals that have emerged in the statistical and signal processing community over the past few years. The emergence of these new methods will be used to shed new light and help resolve the issues of i) the existence of long-range correlations in DNA sequences and ii) whether they are present in both coding and noncoding segments or only in the latter. It turns out that the statistical differences between coding and noncoding segments are much more subtle than previously thought using stationary analysis. In particular, both coding and noncoding sequences exhibit long-range correlations, as asserted by a 1/fbeta(n) evolutionary (i.e., time-dependent) spectrum. However, we will use an index of randomness, which we derive from the Hilbert transform, to demonstrate that coding segments, although not random as previously suspected, are often "closer" to random sequences than noncoding segments. Moreover, we analytically justify the use of the Hilbert spectrum by proving that narrowband nonstationary signals result in a small demodulation error using the Hilbert transform.
Keywords
DNA; Hilbert transforms; biology computing; correlation methods; demodulation; encoding; genetics; molecular biophysics; sequences; signal processing; statistical analysis; DNA sequence; Hilbert transform; demodulation error; genomic sequence; nonstationary coding/noncoding region analysis; nucleotide sequence; power law correlation; random index; signal processing technique; Bioinformatics; DNA; Genomics; Narrowband; Random sequences; Signal analysis; Signal processing; Signal resolution; Statistical analysis; Testing; AM-FM signals; Hilbert transform; empirical mode decomposition; evolutionary periodogram; long-range correlations; nonstationary time-series analysis;
fLanguage
English
Journal_Title
Selected Topics in Signal Processing, IEEE Journal of
Publisher
ieee
ISSN
1932-4553
Type
jour
DOI
10.1109/JSTSP.2008.923852
Filename
4550547
Link To Document