DocumentCode
620354
Title
Research on web topic detection based on domain lexicon
Author
Zhao Zhibin ; Jia Yanfeng ; Bao Yubin
Author_Institution
Sch. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
fYear
2013
fDate
25-27 May 2013
Firstpage
3655
Lastpage
3660
Abstract
Web topic detection is a crucial prerequisite to web-based data integration and also a key component for Vertical Search Engine. So, it attracts much attention from not only the industry but also the literature. In this paper, we proposed a domain-lexicon-based framework for Web topic detection. In our framework, we extracted the topical features from the web page first. Next, we employed Vector Space Model(VSM) and Support Vector Machine (SVM) to compute the topical relevance between the Web page features and the domain that the user prefers so as to conclude whether the web page satisfies the user´s request. Vector Space Model is suitable for the domains where the corresponding domain lexicons need to be updated frequently. Oppositely, Support Vector Machine is suitable for the domains where the corresponding domain lexicons are relatively unchangeable. Moreover, in this work we also explored the mechanism of domain lexicon updating, which can guarantee the accuracy and freshness of the domain lexicon. Finally, we conducted extensive experiment to test our framework and analyze how the domain lexicon affects the judgement result.
Keywords
Internet; data integration; feature extraction; relevance feedback; search engines; support vector machines; text analysis; SVM; VSM; Web page features; Web topic detection; Web-based data integration; domain lexicon updating mechanism; domain lexicon-based framework; support vector machine; topical feature extraction; topical relevance computation; user preference; vector space model; vertical search engine; Data integration; Educational institutions; Electronic mail; Feature extraction; Support vector machines; Vectors; Web pages; data integration; domain lexicon; text classification; topic detection; vertical search;
fLanguage
English
Publisher
ieee
Conference_Titel
Control and Decision Conference (CCDC), 2013 25th Chinese
Conference_Location
Guiyang
Print_ISBN
978-1-4673-5533-9
Type
conf
DOI
10.1109/CCDC.2013.6561583
Filename
6561583
Link To Document