• DocumentCode
    5686
  • Title

    Data Quality Control in Biodiversity Informatics: The Case of Species Occurrence Data

  • Author

    Koch Veiga, Allan ; Cartolano, E.A. ; Saraiva, A.M.

  • Author_Institution
    Escola Politec., Univ. de Sao Paulo (USP), Sao Paulo, Brazil
  • Volume
    12
  • Issue
    4
  • fYear
    2014
  • fDate
    Jun-14
  • Firstpage
    683
  • Lastpage
    693
  • Abstract
    For fighting the current environment sustainability crisis, several studies on biodiversity and the environment have been conducted. These studies are based on the assessment and monitoring of biodiversity by means of the collection, storage, analysis, simulation, modeling, visualization and sharing of a significant volume of biodiversity data in broad temporal and spatial scale. Species occurrences data are a particularly important type of biodiversity data because they are widely used in various studies. Nevertheless, for the analysis and modeling obtained from these data to be reliable, the data used must be high-quality. Thus, to improve the Data Quality (DQ) of species occurrences, the aim of this work was to conduct a study about DQ applied to species occurrences data that allowed assessing and improving DQ, using mechanisms to prevent errors. For the most important data domains identified (taxonomic, geospatial and location), a study on DQ Assessment was performed, in which important DQ dimensions (aspects) and problems that affect theses dimensions were identified, defined and interrelated. Based upon this study, DQ mechanisms were identified that would allow improving the DQ by reducing errors. Using the error-preventing approach, 13 mechanisms to support the prevention of 8 DQ problems were identified, thus providing an improvement of accuracy, precision, completeness, consistency and credibility of source of taxonomic, geospatial and location data of species occurrences. This work showed that with the development of certain computing mechanisms, preventing errors reduces DQ problems. As a result of reducing some problems in particular, the DQ in specific data domains is improved for certain DQ dimensions.
  • Keywords
    bioinformatics; data analysis; data models; data visualisation; biodiversity informatics; computing mechanisms; data quality control; error preventing approach; Analytical models; Biodiversity; Contamination; Data models; Informatics; Quality control; Redundancy; Biodiversity Informatics; Data Quality; Data Quality Control; Species Occurrence;
  • fLanguage
    English
  • Journal_Title
    Latin America Transactions, IEEE (Revista IEEE America Latina)
  • Publisher
    ieee
  • ISSN
    1548-0992
  • Type

    jour

  • DOI
    10.1109/TLA.2014.6868870
  • Filename
    6868870