• DocumentCode
    3674633
  • Title

    Semi-supervised approach based on co-occurrence coefficient for named entity recognition on Twitter

  • Author

    Van Cuong Tran;Dosam Hwang;Jason J. Jung

  • Author_Institution
    Department of Computer Engineering, Yeungnam University, South Korea
  • fYear
    2015
  • Firstpage
    141
  • Lastpage
    146
  • Abstract
    The nature characteristics of data in Social Network Services (SNS) are usually short, contain insufficient information, and often are influenced by noise data, thus popular Named Entity Recognition (NER) methods applied for these data could provide wrong results even if they perform well on well-format documents. Most of NER methods are based on supervised learning techniques which often require a large amount of training dataset to train a good classifier. The Conditional Random Fields (CRF) is an example of supervised learning method, which is a statistical modeling method to predict labels for sequences of input samples. Weak point of these method is only perform well on well-format sentences. However the proper sentences are not used frequently in SNS, such as a lot of tweets on Twitter are combinations of independent terms which are implicitly belonged to a context of a certain discussion topic. In this paper, we propose a method to extract named entities from Social Data using a semi-supervised learning method, it is an extension of CRF method which adapts the new challenge with segmentations of data depending on its context rather considering entire dataset. In experiments, The method is applied on a dataset collected from Twitter, which includes 8,624 tweets for training with 1,915 labeled tweets and 1,690 tweets for testing. Our system product a promised result with the F score of the classification result be approximated to 83.9%.
  • Keywords
    "Training","Twitter","Tagging","Context","Data mining","Clustering algorithms","Feature extraction"
  • Publisher
    ieee
  • Conference_Titel
    Information and Computer Science (NICS), 2015 2nd National Foundation for Science and Technology Development Conference on
  • Print_ISBN
    978-1-4673-6639-7
  • Type

    conf

  • DOI
    10.1109/NICS.2015.7302179
  • Filename
    7302179