• DocumentCode
    772603
  • Title

    Nonstationary Analysis of Coding and Noncoding Regions in Nucleotide Sequences

  • Author

    Bouaynaya, Nidhal ; Schonfeld, Dan

  • Author_Institution
    Dept. of Syst. Eng., Univ. of Arkansas, Little Rock, AR
  • Volume
    2
  • Issue
    3
  • fYear
    2008
  • fDate
    6/1/2008 12:00:00 AM
  • Firstpage
    357
  • Lastpage
    364
  • Abstract
    Previous statistical analysis efforts of DNA sequences revealed that noncoding regions exhibit long-range power law correlations, whereas coding regions behave like random sequences or sustain short-range correlations. A great deal of debate on the presence or absence of long-range correlations in nucleotide sequences, and more specifically in coding regions, has ensued. These results were obtained using signal processing techniques for stationary signals and statistical tools for signals with slowly varying trends superimposed on stationary signals. However, it can be verified using statistical tests that genomic sequences are nonstationary and the nature of their nonstationarity varies and is often much more complex than a simple trend. In this paper, we will bring to bear new tools to analyze nonstationary signals that have emerged in the statistical and signal processing community over the past few years. The emergence of these new methods will be used to shed new light and help resolve the issues of i) the existence of long-range correlations in DNA sequences and ii) whether they are present in both coding and noncoding segments or only in the latter. It turns out that the statistical differences between coding and noncoding segments are much more subtle than previously thought using stationary analysis. In particular, both coding and noncoding sequences exhibit long-range correlations, as asserted by a 1/fbeta(n) evolutionary (i.e., time-dependent) spectrum. However, we will use an index of randomness, which we derive from the Hilbert transform, to demonstrate that coding segments, although not random as previously suspected, are often "closer" to random sequences than noncoding segments. Moreover, we analytically justify the use of the Hilbert spectrum by proving that narrowband nonstationary signals result in a small demodulation error using the Hilbert transform.
  • Keywords
    DNA; Hilbert transforms; biology computing; correlation methods; demodulation; encoding; genetics; molecular biophysics; sequences; signal processing; statistical analysis; DNA sequence; Hilbert transform; demodulation error; genomic sequence; nonstationary coding/noncoding region analysis; nucleotide sequence; power law correlation; random index; signal processing technique; Bioinformatics; DNA; Genomics; Narrowband; Random sequences; Signal analysis; Signal processing; Signal resolution; Statistical analysis; Testing; AM-FM signals; Hilbert transform; empirical mode decomposition; evolutionary periodogram; long-range correlations; nonstationary time-series analysis;
  • fLanguage
    English
  • Journal_Title
    Selected Topics in Signal Processing, IEEE Journal of
  • Publisher
    ieee
  • ISSN
    1932-4553
  • Type

    jour

  • DOI
    10.1109/JSTSP.2008.923852
  • Filename
    4550547