Title :
Mining the Web for generating thematic metadata from textual data
Author :
Huang, Chien-Chung ; Chuang, Shui-Lung ; Chien, Lee-Feng
Author_Institution :
Acad. Sinica, Taipei, Taiwan
fDate :
30 March-2 April 2004
Abstract :
Conventional tools for automatic metadata creation mostly extract named entities or patterns from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often too primitive for more advanced intelligent applications such as concept-based search. Here, we try to generate semantically-deep metadata with limited human intervention. The main idea behind our approach is to use Web mining and categorization techniques to create thematic metadata. The proposed approach, comprises of three computational modules: feature extraction, HCQF (hier-concept query formulation) and text instance categorization. The feature extraction module sends the name of text instances to Web search engines, and the returned highly-ranked search-result pages are used to describe them.
Keywords :
Internet; data mining; feature extraction; meta data; query formulation; search engines; text analysis; Web mining; Web search engine; concept-based search; feature extraction; hier-concept query formulation; text instance categorization; thematic metadata generation; Application software; Computer science; Data mining; Feature extraction; Humans; Organizing; Search engines; Text categorization; Web mining; Web search;
Conference_Titel :
Data Engineering, 2004. Proceedings. 20th International Conference on
Print_ISBN :
0-7695-2065-0
DOI :
10.1109/ICDE.2004.1320065