DocumentCode
3099320
Title
Theses cluster based on bilingual and synonymous keyword sets using mutual information
Author
Huang, Chung-yi ; Chen, Rung-Ching
Author_Institution
Dept. of Inf. Manage., Chaoyang Univ. of Technol., Wufong, Taiwan
Volume
5
fYear
2009
fDate
12-15 July 2009
Firstpage
2999
Lastpage
3004
Abstract
Searching published papers is a required activity for the researching process. Since articles are presented in various languages, it makes precise queries hard to achieve. In this paper, we propose an automatic theses clustering method based on bilingual and synonymous keyword sets which includes Chinese and English keywords. We also provide a clustering computation to speedup operation. First, the system automatically generates bilingual and synonymous keyword sets, and then based on bilingual and synonymous keyword sets, clustering the theses. The method not only solves the weakness of using digital dictionaries to solve clustering problems, but also makes error problem, the query by bilingual and synonymous keywords, be restricted. The system was implemented by a clustering computation technology to solve traditional documents clustering systems performance problems. Through many computer processes, the system not only can save a lot of time, but also can attain high availability and load balancing effectiveness. Primary experiments prove that the system makes the theses clustering work effectively.
Keywords
data mining; dictionaries; text analysis; word processing; automatic theses clustering method; bilingual keyword; digital dictionary; error problem; mutual information; synonymous keyword sets; Classification tree analysis; Cybernetics; Databases; Dictionaries; Frequency; Machine learning; Mutual information; Natural languages; Wireless LAN; Wireless networks; Bilingual and synonymous keyword; Document clustering; Keyword set;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location
Baoding
Print_ISBN
978-1-4244-3702-3
Electronic_ISBN
978-1-4244-3703-0
Type
conf
DOI
10.1109/ICMLC.2009.5212598
Filename
5212598
Link To Document