• DocumentCode
    3281665
  • Title

    Automatic Information Extraction in Semi-structured Official Journals

  • Author

    Filho, Valmir Macário ; Prudencio, Ricardo B. C. ; de Carvalho, F.A.T. ; Torres, Leandro R. ; Rodrigues, Luis ; Lima, Marcos G.

  • Author_Institution
    Center of Inf., Fed. Univ. of Pernambuco, Recife
  • fYear
    2008
  • fDate
    26-30 Oct. 2008
  • Firstpage
    51
  • Lastpage
    56
  • Abstract
    Information extraction systems are used to extract only relevant text information in digital repositories. The current work proposes an automatic system to extract information in semi-structured official journals. In our approach, given an input document, a Machine Learning (ML) algorithm classifies the documentpsilas fragments into class labels which correspond to the data fields to be extracted. The implemented system deployed different features sets and algorithms used in the classification of the fragments. The system was evaluated through experiments on a sample containing 22770 lines of the Pernambucopsilas Official Journal. The experiments performed revealed, in general, good results in terms of precision, which ranged from 70.14% to 98.63% depending on the feature set and algorithm used in the classification of the fragments.
  • Keywords
    classification; information retrieval; learning (artificial intelligence); text analysis; classification; digital repositories; information extraction; machine learning; semistructured official journals; text information; Cities and towns; Data mining; Databases; Humans; Informatics; Information science; Machine learning; Machine learning algorithms; Neural networks; Performance evaluation; Semi-Structured text; information extraction; official journals; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2008. SBRN '08. 10th Brazilian Symposium on
  • Conference_Location
    Salvador
  • ISSN
    1522-4899
  • Print_ISBN
    978-1-4244-3219-6
  • Electronic_ISBN
    1522-4899
  • Type

    conf

  • DOI
    10.1109/SBRN.2008.36
  • Filename
    4665891