Title :
Text data mining: discovery of important keywords in the cyberspace
Author :
Arimura, Hiroki ; Abe, Junichiro ; Fujino, Ryoichi ; Sakamoto, Hiroshi ; Shimozono, S. ; Arikawa, Setsuo ; Shimozono, S.
Author_Institution :
Dept. of Inf., Kyushu Univ., Fukuoka, Japan
Abstract :
This paper describes applications of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over phrases, called proximity phrase association patterns, and consider the problem of finding the patterns that optimize a given statistical measure within the whole class of patterns in a large collection of unstructured texts. For this class of patterns, we develop fast and robust text mining algorithms based on techniques in computational geometry and string matching. Finally, we successfully apply the developed text mining algorithms to the experiments on interactive document browsing in a large text database and keyword discovery from Web bases
Keywords :
computational geometry; data mining; information retrieval; string matching; Web bases; Web mining; combinatorial patterns; computational geometry; cyberspace; interactive document browsing; keyword discovery; keywords discovery; large text database; optimized pattern discovery framework; proximity phrase association patterns; statistical measure; string matching; text data mining; unstructured texts; Computer networks; Data mining; HTML; Informatics; Particle measurements; Pattern matching; Robustness; Text mining; Transaction databases; Web mining;
Conference_Titel :
Digital Libraries: Research and Practice, 2000 Kyoto, International Conference on.
Conference_Location :
Kyoto
Print_ISBN :
0-7695-1022-1
DOI :
10.1109/DLRP.2000.942178