• DocumentCode
    2708358
  • Title

    An Efficient Token-based Approach for Web-Snippet Clustering

  • Author

    Li, Jianchao ; Yao, Tianfang

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2006
  • fDate
    1-3 Nov. 2006
  • Firstpage
    13
  • Lastpage
    13
  • Abstract
    Online clustering of the results returned by search engines becomes prevailing in recent times. It addresses the problem of too many records returned by current search engines, which renders the manual search of actually desired information difficult, especially if the query encompasses several subtopics. Clustering is a useful technique to group records to clusters and thereby make it more convenient to retrieve information of interest. We first propose an innovative approach by using tokens as basic units for clustering, which avoids segmentation for oriental languages and can be applied to any language. Second, we introduce a Directed Probability Graph (DPG) model that identifies meaningful phrases as cluster labels using statistical methods without any external knowledge. The clustering procedure is performed without calculating the similarity between pair-wise documents. As shown by our experiments, our clustering algorithm is very efficient and suitable for online Web-snippet clustering.
  • Keywords
    Web services; directed graphs; document handling; pattern clustering; probability; query processing; search engines; token networks; DPG; cluster label; clustering algorithm; directed probability graph; group record; information retrieval; innovative approach; manual search rendering; online Web snippet clustering; pairwise document; query processing; search engine; statistical method;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantics, Knowledge and Grid, 2006. SKG '06. Second International Conference on
  • Conference_Location
    Guilin
  • Print_ISBN
    0-7695-2673-X
  • Type

    conf

  • DOI
    10.1109/SKG.2006.21
  • Filename
    5727650