Title :
Overview of Mondou Web search engine using text mining and information visualizing technologies
Author :
Kawano, Hiroyuki
Author_Institution :
Dept. of Syst. Sci., Kyoto Univ., Japan
Abstract :
As the volume of Web pages on the Internet is increasing rapidly, it is becoming hard for users to discover valuable Web resources. It is especially difficult for naive users to discover informative pages by popular Web search engines, since they don´t have background and domain knowledge about the status of Web systems. Therefore, many kinds of Web search engines have been developed in order to support the processes of Web information retrieval. We are developing the Japanese Web search engine “Mondou (RCAAU)”. Though our engine is one of the first generation of Web search engines, we tried to implement the rapidly emerging technologies of data mining in our search engine from 1995. We are also implementing Java applets based on information visualization. The author presents technical overviews of the Mondou Web search engine. One of the most important techniques is the text mining algorithms based on the primitive association rules. Mondou provides highly relevant feedback keywords to users, in order to support search steps. Using the associative keywords, users can modify the combination of keywords in the initial query. We also introduce the concept of an integrated query mechanism for different search engines based on the KQML agents. Furthermore, in order to visualize the characteristics of search results, we are developing Java applets to display the ROC graph and the clusters of specific documents. We are also trying the improve Web robots for the Mondou system from the view point of data cleaning. Finally, we discuss the effectiveness and performance of our Web search engine
Keywords :
data mining; data visualisation; information resources; information retrieval; natural languages; search engines; text analysis; Internet; Japanese Web search engine; Java applet; Java applets; KQML agents; Mondou Web search engine; ROC graph; Web information retrieval; Web pages; Web resources; Web robots; Web search engines; Web systems; associative keywords; data cleaning; data mining; domain knowledge; feedback keywords; information visualization; information visualizing technologies; informative pages; initial query; integrated query mechanism; naive users; primitive association rules; search steps; text mining; text mining algorithms; Association rules; Data mining; Data visualization; Information retrieval; Internet; Java; Search engines; Text mining; Web pages; Web search;
Conference_Titel :
Digital Libraries: Research and Practice, 2000 Kyoto, International Conference on.
Conference_Location :
Kyoto
Print_ISBN :
0-7695-1022-1
DOI :
10.1109/DLRP.2000.942180