• DocumentCode
    3111314
  • Title

    Detecting blog groups using vector space models for link structures

  • Author

    Sasaki, Yuichi ; Kurihara, Masahito

  • Author_Institution
    Grad. Sch. of Inf. Sci. & Technol., Hokkaido Univ., Sapporo
  • fYear
    2008
  • fDate
    12-15 Oct. 2008
  • Firstpage
    980
  • Lastpage
    985
  • Abstract
    In order to group web pages, vector space models for analyzing documents similarities have been often used variously. However, they are not used for analyzing link structures, partly because they are complex and links do not necessarily satisfy the similarity relation. If we can devise vector space models for link structures, we can combine them with those models for document similarity in order to develop the unified basis for grouping Web pages. In this paper, we present a vector space model for link structures, based on the notion of link vectors, the specifically designed characteristic vectors for link structures. We also discuss the extension of this model to the model called content-link vector space model, which can treat document information and link information of Web pages in a unified way. The preliminary experiments show that the models show good performance even when document information is ignored.
  • Keywords
    Web sites; document handling; information analysis; Web pages; blog groups; content-link vector space model; document information; document similarity analysis; link information; link structures analysis; link vectors; similarity relation; Bipartite graph; Frequency; Functional analysis; Information science; Information services; Internet; Space technology; Systems engineering and theory; Web pages; Web sites; community; grouping; link; weblog;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on
  • Conference_Location
    Singapore
  • ISSN
    1062-922X
  • Print_ISBN
    978-1-4244-2383-5
  • Electronic_ISBN
    1062-922X
  • Type

    conf

  • DOI
    10.1109/ICSMC.2008.4811408
  • Filename
    4811408