Title :
Development of a framework for sub-topic discovery from the Web
Author :
Uluhan, Eray ; Badur, Bertan
Author_Institution :
Manage. Inf. Syst. Dept., Bogazici Univ., Istanbul
Abstract :
The motivation behind sub-topic or topic specific keyword discovery through Web pages is helping a user, who is insufficient in knowledge and experience about a topic, to find important concepts without much effort. Intuitively, a Web user would start searching the Web via querying search engines, visiting some pages, and spending a lot of time on deciding what is important about the topic and what is not. In this study, we try to mine important sub-topics or key concepts of a given topic automatically, through HTML based Web pages. Starting with a search query, the system gathers top-ranking pages returned from a search engine; and selects informative pages among them. These pages are processed further for extracting important phrases and then applied data mining techniques on these phrases to find candidate sub-topics. Each candidate phrase is given scores based on its relevance with the search query over the Web space. Using the proposed technique, the user should be able to quickly learn sub-topics or key concepts about a topic without going through the ordeal of browsing a large number of non-informative pages returned by the search engine.
Keywords :
Internet; data mining; search engines; HTML based Web pages; World Wide Web; data mining; querying search engine; topic specific keyword discovery; Africa; Cities and towns; Data mining; HTML; Indexing; Information retrieval; Internet; Search engines; Web mining; Web pages;
Conference_Titel :
Management of Engineering & Technology, 2008. PICMET 2008. Portland International Conference on
Conference_Location :
Cape Town
Print_ISBN :
978-1-890843-17-5
Electronic_ISBN :
978-1-890843-18-2
DOI :
10.1109/PICMET.2008.4599696