Title :
Data mining T-RFLP profiles from urban water system sampling using self-organizing maps
Author :
Mounce, Stephen R. ; Jensen, Henriette S. ; Biggs, Catherine A. ; Boxall, Joby B.
Author_Institution :
Pennine Water Group, Univ. of Sheffield, Sheffield, UK
Abstract :
Descriptions of urban water system microbiological properties can range from single parameters such as microbial biomass to multiparameter qualitative and quantitative data that describes biochemical profiles, measurements of enzyme activities, and molecular analyses of microbial communities. Whilst most of the hydraulic and physico-chemical variables are quite well understood, measures of microbiological processes have so far been more difficult to use as part of decision support tools. The methods commonly used to assess the microbial quality of water and wastewater are mainly culture-dependent methods, which underestimate the actual microbial diversity within the system. To circumvent this limitation, DNA-based molecular techniques are now being used to analyze environmental samples. In the past few decades, technological innovations have led to the development of a new biological research paradigm, one that is data intensive and computer-driven. A range of data driven tools have been applied for exploring the interrelationships between various types of variables. A number of studies have used Artificial Neural Networks (ANNs) to probe such complex data sets. This paper demonstrates how Kohonen self-organizing maps (SOM) can be used for data mining of microbiological data sources from urban water systems. Genetic signatures acquired by terminal restriction fragment length polymorphisms (TRFLP) were obtained from samples and then post processed by the T-Align software tool before being reduced in dimensionality with Principal Component Analysis (PCA). These datasets were then analyzed by SOM networks and additional characteristics were used in the map labeling. Initial results show that the visual output of the SOM analysis provides a rapid and intuitive means of exploring hypotheses for increased understanding and interpretation of microbial ecology.
Keywords :
DNA; biology computing; data mining; decision support systems; environmental science computing; principal component analysis; self-organising feature maps; wastewater; water supply; ANN; DNA-based molecular techniques; Kohonen self-organizing maps; PCA; SOM networks; T-Align software tool; TRFLP; artificial neural networks; biochemical profiles; biological research paradigm; culture-dependent methods; data driven tools; data mining T-RFLP profiles; decision support tools; enzyme activity measurement; hydraulic variables; microbial biomass; microbial community molecular analyses; microbial diversity; microbial ecology; microbial wastewater quality; microbial water quality; physico-chemical variables; principal component analysis; terminal restriction fragment length polymorphisms; urban water system microbiological properties; urban water system sampling; Communities; Microorganisms; Neurons; Principal component analysis; Vectors; Wastewater; Water resources; PCA; SOM; T-RFLP; artificial neural network; bioinformatics; urban water systems;
Conference_Titel :
Natural Computation (ICNC), 2012 Eighth International Conference on
Conference_Location :
Chongqing
Print_ISBN :
978-1-4577-2130-4
DOI :
10.1109/ICNC.2012.6234528