• DocumentCode
    579017
  • Title

    An Automatic Approach for Duplicate Bibliographic Metadata Identification Using Classification

  • Author

    Borges, E.N. ; Becker, Kurt ; Heuser, C.A. ; Galante, Renata

  • Author_Institution
    Comput. Sci. Center, Fed. Univ. of Rio Grande, Rio Grande, Brazil
  • fYear
    2011
  • fDate
    9-11 Nov. 2011
  • Firstpage
    47
  • Lastpage
    53
  • Abstract
    References are the main descriptive metadata used by digital libraries of scientific articles. These references can be represented by several formats and styles. Although considerable content variations can also occur in some metadata fields such as title, author names and publication venue. Duplicate records influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents an approach to identifying duplicated bibliographic metadata. We extend our previous work so that instead of setting thresholds based on the scores returned by similarity functions, we use the scores to train classification algorithms which automatically identify duplicated references. The experiments show that the classifiers increases up to 11% the quality of results when compared to our unsupervised heuristic-based approach.
  • Keywords
    bibliographic systems; digital libraries; meta data; pattern classification; unsupervised learning; automatic approach; descriptive metadata; digital libraries; duplicate bibliographic metadata identification; metadata fields; scientific articles; unsupervised heuristic based approach; Algorithm design and analysis; Bioinformatics; Databases; Genomics; Libraries; Standards; classification algorithms; information representation; information management;;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science Society (SCCC), 2011 30th International Conference of the Chilean
  • Conference_Location
    Curico
  • ISSN
    1522-4902
  • Print_ISBN
    978-1-4673-1364-3
  • Type

    conf

  • DOI
    10.1109/SCCC.2011.8
  • Filename
    6363382