Abstract :
The conversion of symbolic sequences into complex genomic signals is a methodology useful both for analyzing large scale features of chromosomes and for studying the mutations in small and medium pathogen genomes. Regularities in the distribution of nucleotides and pairs of nucleotides have been found, showing that a genome has an ordered large scale structure, despite its low compressibility. The long range regularities show that, from the structural point of view, a genome resembles less to a ldquoplain textrdquo, which simply expresses a semantics in accordance to certain grammar rules, but more to a ldquopoemrdquo, which also obeys additional rules of symmetry, giving it ldquorhythmrdquo and ldquorhymerdquo. The structural restrictions of genomic sequences are reflected in the regularities observed in the corresponding genomic signals. Mutations, such as those in the genomic signals of various pathogen strains, tend to compensate each other, so that the overall regularities are conserved. Re-combination happens at a rather extended scale, in opposition to SNPs which are localized, and it conserves the nucleotide pairs imbalance (unwrapped phase). Because of structural restrictions on nucleotide sequences, SNPs do not appear isolated, but in correlated groups, sometimes at quite large distances.
Keywords :
biology computing; cellular biophysics; genetics; molecular biophysics; molecular configurations; neural nets; principal component analysis; PCA; artificial neural nets; chromosomes; mutations; nucleotide genomic signals prediction; principal component analysis; symbolic sequences; Artificial neural networks; Bioinformatics; Discrete Fourier transforms; Genetic mutations; Genomics; Large-scale systems; Neural networks; Pathogens; Principal component analysis; Signal analysis;