• DocumentCode
    2646490
  • Title

    Applying Semantic Suffix Net to suffix tree clustering

  • Author

    Janruang, Jongkol ; Guha, Sumanta

  • Author_Institution
    Comput. Sci. & Inf. Manage. Program, Asian Inst. of Technol., Pathumthani, Thailand
  • fYear
    2011
  • fDate
    28-29 June 2011
  • Firstpage
    146
  • Lastpage
    152
  • Abstract
    In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC method, which is a highly efficient heuristic for clustering web search results. However, a weakness of STC is that it cannot cluster semantic similar documents. To solve this problem, we propose a new data structure to represent suffixes of a single string, called a Semantic Suffix Net (SSN). A generalized semantic suffix net is created to represent suffixes of a set of strings by using a new operator to partially combine nets. A key feature of this new operator is to find a joint point by using semantic similarity and string matching; net pairs combination then begins at that joint point. This logic causes the number of nodes and branches of a generalized semantic suffix net to decrease. The operator then uses the line of suffix links as a boundary to separate the net. A generalized semantic suffix net is then incorporated into the STC algorithm so that it can cluster semantically similar snippets. Experimental results show that the proposed algorithm improves upon conventional STC.
  • Keywords
    data structures; document handling; information retrieval; pattern clustering; search engines; string matching; trees (mathematics); STC algorithm; data structure; search engines; semantic similarity; semantic suffix net; string matching; suffix tree clustering; Algorithm design and analysis; Clustering algorithms; Data structures; Joints; Pediatrics; Semantics; data mining; semantic suffix net; semantic web search results clustering; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining and Optimization (DMO), 2011 3rd Conference on
  • Conference_Location
    Putrajaya
  • ISSN
    2155-6938
  • Print_ISBN
    978-1-61284-211-0
  • Electronic_ISBN
    2155-6938
  • Type

    conf

  • DOI
    10.1109/DMO.2011.5976519
  • Filename
    5976519