• DocumentCode
    2383537
  • Title

    Document sentences as a small world

  • Author

    Balinsky, Helen ; Balinsky, Alexander ; Simske, Steven

  • Author_Institution
    Hewlett-Packard Labs., Bristol, UK
  • fYear
    2011
  • fDate
    9-12 Oct. 2011
  • Firstpage
    2583
  • Lastpage
    2588
  • Abstract
    In this paper we describe the possibility of constructing the well-known small world topology for an ordinary document, based on the actual document structure. Sentences in such a graph are represented by nodes, which are connected if and only if the corresponding sentences are neighbors or share at least one common keyword. This graph is built using a carefully selected one-parameter set of keywords. By varying this parameter - the level of meaningfulness - we transition the document-representing graph from a trivial path graph into a large random graph. During such a conversion, as the parameter is varied over its range, the graph becomes a small world. This in turn opens the possibility of applying many well-established ranking algorithms to the problem of ranking sentences and paragraphs in text documents. These rankings are, in turn, crucial for document understanding, summarization and information extraction. These graphs can also serve as a source of interesting small world graphs for the theory of complex networks.
  • Keywords
    graph theory; text analysis; complex network; document sentences; document summarization; document understanding; document-representing graph; information extraction; paragraph ranking; random graph; sentence ranking; small world topology; text document; Collaboration; Data mining; Educational institutions; Network topology; Silicon; Social network services; Topology; Text mining; affiliation networks; semantic text features; small world topology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on
  • Conference_Location
    Anchorage, AK
  • ISSN
    1062-922X
  • Print_ISBN
    978-1-4577-0652-3
  • Type

    conf

  • DOI
    10.1109/ICSMC.2011.6084065
  • Filename
    6084065