• DocumentCode
    189130
  • Title

    Portuguese Part-of-Speech Tagging with Large Margin Structure Learning

  • Author

    Rezende Fernandes, Eraldo ; Muller Rodrigues, Irving ; Luiz Milidiu, Ruy

  • Author_Institution
    FACOM - UFMS, Campo Grande, Brazil
  • fYear
    2014
  • fDate
    18-22 Oct. 2014
  • Firstpage
    25
  • Lastpage
    30
  • Abstract
    Part-of-Speech Tagging is a fundamental task on many Natural Language Processing systems. This task consists in identifying the syntactic category, i.e. the part of speech, of each word in a sentence. Despite the fact that the current state-of-the-art accuracy for this task is around 97%, any improvement has an immediate impact on more complex tasks, like Parsing, Semantic Role Labeling and Information Extraction. Thus, it is still relevant to explore this task. In this paper, we introduce a part-of-speech tagger based on the Structure Learning framework that reduces the smallest known error on the Portuguese Mac-Morpho corpus by 7.8%. We also apply our tagger to a recently revised version of Mac-Morpho. Our system accuracy on this latter version is competitive with a semi-supervised Neural Network trained on Mac-Morpho plus a very large non-annotated corpus. Additionally, our system is simpler than previous systems and uses a very limited feature set. Our system employs a Large Margin training criteria to derive a structure predictor that is more robust on unseen data.
  • Keywords
    grammars; identification technology; learning (artificial intelligence); natural language processing; neural nets; speech processing; Portuguese Mac-Morpho corpus; Portuguese part-of-speech tagging; information extraction; large margin training criteria; margin structure learning; natural language processing systems; nonannotated corpus; parsing; part-of-speech tagger; semantic role labeling; semisupervised neural network; structure learning framework; structure predictor; syntactic category; Accuracy; Hidden Markov models; Natural language processing; Prediction algorithms; Predictive models; Tagging; Training; Machine Learning; Natural Language Processing; POS Tagging; Structure Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems (BRACIS), 2014 Brazilian Conference on
  • Conference_Location
    Sao Paulo
  • Type

    conf

  • DOI
    10.1109/BRACIS.2014.16
  • Filename
    6984802