• DocumentCode
    124207
  • Title

    Parallel Community Detection for Cross-Document Coreference

  • Author

    Rahimian, Fatemeh ; Girdzijauskas, Sarunas ; Haridi, Seif

  • Author_Institution
    Swedish Inst. of Comput. Sci., KTH - R. Inst. of Technol., Stockholm, Sweden
  • Volume
    2
  • fYear
    2014
  • fDate
    11-14 Aug. 2014
  • Firstpage
    46
  • Lastpage
    53
  • Abstract
    This paper presents a highly parallel solution for cross-document co reference resolution, which can deal with billions of documents that exist in the current web. At the core of our solution lies a novel algorithm for community detection in large scale graphs. We operate on graphs which we construct by representing documents´ keywords as nodes and the colocation of those keywords in a document as edges. We then exploit the particular nature of such graphs where co referent words are topologically clustered and can be efficiently discovered by our community detection algorithm. The accuracy of our technique is considerably higher than that of the state of the art, while the convergence time is by far shorter. In particular, we increase the accuracy for a baseline dataset by more than 15% compared to the best reported result so far. Moreover, we outperform the best reported result for a dataset provided for the Word Sense Induction task in SemEval 2010.
  • Keywords
    document handling; graph theory; natural language processing; SemEval 2010; cross-document coreference resolution; large scale graph; parallel community detection; word sense induction task; Accuracy; Clustering algorithms; Color; Communities; Context; Force; Measurement; community detection; coreference resolution; cross-document coreference; distributed algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Warsaw
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2014.79
  • Filename
    6927606