• DocumentCode
    620354
  • Title

    Research on web topic detection based on domain lexicon

  • Author

    Zhao Zhibin ; Jia Yanfeng ; Bao Yubin

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
  • fYear
    2013
  • fDate
    25-27 May 2013
  • Firstpage
    3655
  • Lastpage
    3660
  • Abstract
    Web topic detection is a crucial prerequisite to web-based data integration and also a key component for Vertical Search Engine. So, it attracts much attention from not only the industry but also the literature. In this paper, we proposed a domain-lexicon-based framework for Web topic detection. In our framework, we extracted the topical features from the web page first. Next, we employed Vector Space Model(VSM) and Support Vector Machine (SVM) to compute the topical relevance between the Web page features and the domain that the user prefers so as to conclude whether the web page satisfies the user´s request. Vector Space Model is suitable for the domains where the corresponding domain lexicons need to be updated frequently. Oppositely, Support Vector Machine is suitable for the domains where the corresponding domain lexicons are relatively unchangeable. Moreover, in this work we also explored the mechanism of domain lexicon updating, which can guarantee the accuracy and freshness of the domain lexicon. Finally, we conducted extensive experiment to test our framework and analyze how the domain lexicon affects the judgement result.
  • Keywords
    Internet; data integration; feature extraction; relevance feedback; search engines; support vector machines; text analysis; SVM; VSM; Web page features; Web topic detection; Web-based data integration; domain lexicon updating mechanism; domain lexicon-based framework; support vector machine; topical feature extraction; topical relevance computation; user preference; vector space model; vertical search engine; Data integration; Educational institutions; Electronic mail; Feature extraction; Support vector machines; Vectors; Web pages; data integration; domain lexicon; text classification; topic detection; vertical search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control and Decision Conference (CCDC), 2013 25th Chinese
  • Conference_Location
    Guiyang
  • Print_ISBN
    978-1-4673-5533-9
  • Type

    conf

  • DOI
    10.1109/CCDC.2013.6561583
  • Filename
    6561583