• DocumentCode
    3310696
  • Title

    Integrating feature ranking and clustering method to discover person relations in web news

  • Author

    Lihong Zhao ; Xiaojun Wan ; Yuqian Wu

  • Author_Institution
    Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
  • Volume
    3
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    1821
  • Lastpage
    1825
  • Abstract
    Extracting the social relation network of persons is challenging. Discovering significant binary person relations embedded in the web news would be appropriate as the starting point. Prior methods for this task, however, chose to define the relation types first, focused on a few limited types, and always took over a large amount of web information. This paper describes an unsupervised person relation extraction system. This system automatically extracts important people relations from a limited batch of web news, and then proceeds to cluster the instances of these relations and finds discriminative words to represent different clusters. We use various feature ranking strategies for filtering instead of simple bag-of-words representation. We present the experiments evaluation results and give an overview of possible enhancements of this system.
  • Keywords
    Internet; information filtering; information resources; pattern clustering; social networking (online); unsupervised learning; Web information; Web news; bag-of-words representation; feature clustering method; feature ranking method; feature ranking strategies; information extraction; person relation discovery; social relation network extraction; unsupervised person relation extraction system; Data mining; Entropy; Feature extraction; Filtering; Natural language processing; Noise; Sun; Feature Ranking and Filtering; Unsupervised Person Relation Extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-180-9
  • Type

    conf

  • DOI
    10.1109/FSKD.2011.6019861
  • Filename
    6019861