• DocumentCode
    1961007
  • Title

    Domain dependence of statistical named entity recognition and classification in Croatian texts

  • Author

    Agic, Zeljko ; Bekavac, Bozo

  • Author_Institution
    Dept. of Inf. & Commun. Sci., Univ. of Zagreb, Zagreb, Croatia
  • fYear
    2013
  • fDate
    24-27 June 2013
  • Firstpage
    277
  • Lastpage
    282
  • Abstract
    Influence of text domain selection on statistical named entity recognition and classification in Croatian texts is investigated. Two datasets of Croatian newspaper texts of differing text domains were manually annotated for named entities and used for training and testing the Stanford NER system for named entity recognition based on sequence labeling with CRF. State of the art scores were observed in both domains. A strong preference for systems trained on mixed text domains is established by the experiment. The top-performing system was recorded with an overall F1-score of 0.876 on mixed-domain test sets, scoring 0.899 in one of the selected domains and 0.852 in the other. The single best domain F1-scores were recorded at 0.910 and 0.858.
  • Keywords
    data mining; natural language processing; pattern classification; text analysis; Croatian newspaper texts; F1-score; Stanford NER system; domain dependence; statistical named entity classification; statistical named entity recognition; text domain mining; text domain selection; Accuracy; Data models; Organizations; Tagging; Testing; Text recognition; Training; Croatian language; domain dependence; named entity recognition; text domain;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology Interfaces (ITI), Proceedings of the ITI 2013 35th International Conference on
  • Conference_Location
    Cavtat
  • ISSN
    1334-2762
  • Print_ISBN
    978-953-7138-30-1
  • Type

    conf

  • Filename
    6649038