• DocumentCode
    2094472
  • Title

    Automatic Generation of Integration and Preprocessing Ontologies for Biomedical Sources in a Distributed Scenario

  • Author

    Anguita, Alberto ; Perez-Rey, David ; Crespo, José ; Maojo, Víctor

  • Author_Institution
    Biomed. Inf. Group, Univ. Politec. de Madrid, Madrid
  • fYear
    2008
  • fDate
    17-19 June 2008
  • Firstpage
    336
  • Lastpage
    341
  • Abstract
    Access to a large number of remote data sources has boosted research in biomedicine, where different biological and clinical research projects are based on collaborative efforts among international organizations. In this scenario, the authors have developed various methods and tools in the area of database integration, using an ontological approach. This paper describes a method to automatically generate preprocessing structures (ontologies) within an ontology-based KDD model. These ontologies are obtained from the analysis of data sources, searching for: (i) valid numerical ranges (using clustering techniques), (ii) different scales, (iii) synonym transformations based on known dictionaries and (iv) typographical errors. To test the method, experiments were carried out with four biomedical databases -containing rheumatoid arthritis, gene expression patterns, biological processes and breast cancer patients- proving the performance of the approach. This method supports experts in data analysis processes, facilitating the detection of inconsistencies.
  • Keywords
    data analysis; database management systems; distributed processing; medical information systems; ontologies (artificial intelligence); pattern clustering; automatic generation; biological processes; biomedical database; biomedical sources; breast cancer patients; clustering technique; collaborative efforts; data analysis; database integration; distributed scenario; gene expression patterns; ontology-based KDD model; preprocessing ontologies; remote data sources; rheumatoid arthritis; synonym transformation; typographical error; valid numerical ranges; Arthritis; Biological processes; Biological system modeling; Data analysis; Databases; Dictionaries; Gene expression; International collaboration; Ontologies; Testing; Integration; KDD; Ontologies; Preprocessing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer-Based Medical Systems, 2008. CBMS '08. 21st IEEE International Symposium on
  • Conference_Location
    Jyvaskyla
  • ISSN
    1063-7125
  • Print_ISBN
    978-0-7695-3165-6
  • Type

    conf

  • DOI
    10.1109/CBMS.2008.71
  • Filename
    4562013