• DocumentCode
    1925201
  • Title

    Website clustering from query graph using social network analysis

  • Author

    Wang, Weiduo ; Wu, Bin ; Zhang, Zhonghui

  • Author_Institution
    Beijing Key Lab. of Intell. Telecommun. Software & Multimedia, Beijing Univ. of Posts & Telecommun., Beijing, China
  • fYear
    2010
  • fDate
    8-10 Aug. 2010
  • Firstpage
    439
  • Lastpage
    442
  • Abstract
    Along with informationization advancement thorough and Internet rapid development, there exists millions of websites on the Internet. Search engines become a mediator to connect web users and websites. The query logs in which recorded daily contains a wealth of knowledge about the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to search-engine results. By constructing a novel query graph, considering for the classification of queries, which is utilized to build multi-dimensional vector, we adopt social network analysis method to detect communities in the graph to implement website clustering. Website clustering can contribute to spam website, pornographic website and political sensitive website detection. So it can be applied to websites supervision.
  • Keywords
    Web sites; pattern clustering; query processing; search engines; social networking (online); Internet; Web site clustering; multidimensional vector; political sensitive Web site detection; pornographic Web site; query classification; query graph; query logs; search engines; social network analysis method; spam Web site; Communities; Economics; Education; Games; History; Image edge detection; Motion pictures; Query Logs; Social Network Analysis; Website Clustering; Websites supervision; component;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Emergency Management and Management Sciences (ICEMMS), 2010 IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-6064-9
  • Type

    conf

  • DOI
    10.1109/ICEMMS.2010.5563409
  • Filename
    5563409