Title :
Clustering Synonymous English and Chinese Keywords for Cross-Language Queries
Author :
Chen, Rung-Ching ; Huang, Chung-yi ; Huang, Yu-Len
Author_Institution :
Chaoy ang Univ. of Technol., Taichung
Abstract :
In this paper, we propose an automatic clustering method to find synonymous terms including cross-language keywords from Chinese and English thesis documents. First, Chinese and English keyword pairs were collected from an existing database. Then, the system calculates the support and confidence values of the keyword pairs. Next, high confidence and support values are selected for keyword pairs. Subsequently, keyword pairs are merged by applying a clustering algorithm to various keyword pairs with similar meanings which are clustered into the same subset. Finally, effective applications can be applied based the subsets of collected words including cross-language or synonymous queries. The experimental results achieved 98.4% precision identifying correct terms from 1220 keyword pair clusters from the collected subsets. The primary experimental results show that the system can provide effective information for users when making queries online.
Keywords :
natural language processing; pattern clustering; query processing; text analysis; Chinese keywords; English keywords; automatic clustering; cross-language queries; keyword pairs; synonymous keywords clustering; Abstracts; Clustering algorithms; Cybernetics; Data mining; Databases; Information management; Internet; Machine learning; Natural languages; Text categorization; Cross-language; Keyword clustering; Keyword pairs; Synonymous terms;
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
DOI :
10.1109/ICMLC.2007.4370454