• DocumentCode
    3036366
  • Title

    SOM-based methodology for building large text archives

  • Author

    Azcarraga, Arnulfo P. ; Yap, Teddy N., Jr.

  • Author_Institution
    Program for Res. into Intelligent Syst., Nat. Univ. of Singapore, Singapore
  • fYear
    2001
  • fDate
    21-21 April 2001
  • Firstpage
    66
  • Lastpage
    73
  • Abstract
    Not only have self-organizing maps (SOMs), such as the WEBSOM, been shown to scale up to very large datasets, these maps also allow for a novel mode of navigating through a large collection of text documents. The entire text collection is presented to a user as a regular map, where each point in the map is associated to a group of documents that are likely to be composed of similar terms and phrases. In addition, the closer two points are in the map, the more similar are their respective associated documents. Thus, once an interesting document is found in the map, the user just has to click around the vicinity of that document to retrieve other similar documents. A major drawback of SOMs, however, is the long training time required, especially for document collections where both the volume and the dimensionality are huge. We demonstrate how the size of the initial text collection is progressively and drastically reduced from the raw document collection to the final SOM-based text archive. We demonstrate this using a widely studied Reuters collection.
  • Keywords
    document handling; full-text databases; information resources; learning (artificial intelligence); self-organising feature maps; very large databases; Reuters collection; SOM; WEBSOM; document retrieval; large text archives; self-organizing maps; training time; very large datasets; Artificial neural networks; Electronic mail; Frequency; Humans; Intelligent systems; Navigation; Pattern recognition; Self organizing feature maps; Statistical analysis; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Systems for Advanced Applications, 2001. Proceedings. Seventh International Conference on
  • Conference_Location
    Hong Kong, China
  • Print_ISBN
    0-7695-0996-7
  • Type

    conf

  • DOI
    10.1109/DASFAA.2001.916366
  • Filename
    916366