• DocumentCode
    3106963
  • Title

    High-Performance Unsupervised Relation Extraction from Large Corpora

  • Author

    Rozenfeld, Binjamin ; Feldman, Ronen

  • Author_Institution
    Bar-Ilan Univ., Ramat-Gan
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    1032
  • Lastpage
    1037
  • Abstract
    We present URIES - an unsupervised relation identification and extraction system. The system automatically identifies interesting binary relations between entities in the input corpus, and then proceeds to extract a large number of instances of these relations. The system discovers relations by clustering frequently co- occuring pairs of entities, based on the contexts in which they appear. Its complex pattern-based representation of the contexts allows the clustering step to achieve very high precision, sufficient for the clusters to perform as sets of seeds for bootstrapping a high-recall relation extraction process. In a series of experiments we demonstrate the successful performance of URIES and compare it to the two existing systems - a weakly supervised high-recall Web relation extraction system called SRES, and an unsupervised relation identification system that uses a simpler bag-ofwords representation of contexts. The experiments show that URIES performs comparably to SRES, but without any supervision, and that such performance is due to the power of its complex contexts representation and to its novel candidate selection method.
  • Keywords
    Internet; knowledge acquisition; unsupervised learning; Web relation extraction system; bag-of-words representation; pattern-based representation; unsupervised relation extraction; unsupervised relation identification; Data mining; Gallium nitride; Humans; Knowledge engineering; Machine learning; Relays; Strontium;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2006. ICDM '06. Sixth International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2701-7
  • Type

    conf

  • DOI
    10.1109/ICDM.2006.82
  • Filename
    4053148