• DocumentCode
    2352883
  • Title

    Correlating summarization of a pair of multilingual documents

  • Author

    Ji, Xiang ; Zha, Hongyuan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
  • fYear
    2003
  • fDate
    10-11 March 2003
  • Firstpage
    39
  • Lastpage
    46
  • Abstract
    With the emergence of enormous amount of documents in multiple languages, it is desirable to construct text mining methods that can compare and highlight similarities of them. In this paper, we explore the research issue of comparative summarization for a pair of multilingual documents. A bipartite graph based algorithm is proposed to correlate textual content against sources in various languages. The algorithm aligns the (sub)topics of a pair of multilingual documents and summarizes their correlation by sentence extraction. A pair of documents in different languages is modeled with a weighted bipartite graph. A mutual reinforcement principle is applied to identify a dense subgraph of the weighted bipartite graph. Sentences corresponding to the subgraph are correlated well in textual content and convey the dominant shared topic of the pair of documents. As a further enhancement, a bi-clustering algorithm can first be used to partition the bipartite graph into several clusters, each containing sentences from the two documents. These clusters correspond to shared subtopics, and the above mutual reinforcement principle can be applied to extract topic sentences within each subtopic group.
  • Keywords
    data mining; graph theory; natural languages; text analysis; biclustering algorithm; bipartite graph-based algorithm; comparative summarization; multilingual documents; multiple languages; mutual reinforcement principle; sentence extraction; text mining; textual content; weighted bipartite graph; Algorithm design and analysis; Bipartite graph; Computer science; Data mining; Explosions; Feature extraction; Natural languages; Pressing; Web pages; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Research Issues in Data Engineering: Multi-lingual Information Management, 2003. RIDE-MLIM 2003. Proceedings. 13th International Workshop on
  • ISSN
    1066-1395
  • Print_ISBN
    0-7803-7868-7
  • Type

    conf

  • DOI
    10.1109/RIDE.2003.1249844
  • Filename
    1249844