DocumentCode :
3456290
Title :
The Keyless Technique of Text Data Mining and Its Application
Author :
Chen, Heng-sheng ; Su, Jaw-Sin
Author_Institution :
Dept. of Inf. Manage., Chinese Culture Univ., Taipei, Taiwan
fYear :
2009
fDate :
7-9 Dec. 2009
Firstpage :
834
Lastpage :
840
Abstract :
A questionnaire may consist of two parts: the respondent characteristics for classification and analysis, and the target data for the core problem that is being studied. The use of multiple-choice questions is the norm. To expand the content of a questionnaire, an open-ended written response to questions may be added as a third part so that respondents can provide additional information. In this paper, we first describe the construction of a word in English and in Chinese from letter (alphabet) and character to words and phrases respectively. Second, the keyless research system is used to obtain the deferential number of characters in a phrase, from a lot of Chinese text documents. The ¿2 uniform distribution test is introduced to decide the appropriate phrase with number of words in documents. Some Chinese phrases could be meaningless, so a careen process is used to delete the meaningless phrases and to keep the useful phrases. The synonyms could be found from the synonymous process. To prove the model could be used in practical field studies, students from the Chinese Culture University in Taiwan responded with their opinions on teachers´ effectiveness in the classroom with the open-ended writing part in the questionnaire. About 50 top key words and 50 bottom key words were obtained to describe the quality of classroom teaching. We conclude that the key words can be taken from the text documents when the key words are not defined beforehand, the frequency of key word appearance in text documents can be obtained, and the model can be applied to other fields.
Keywords :
classification; data mining; document handling; text analysis; Chinese Culture University; Chinese text documents; Chinese word; English word; Taiwan; classification; data analysis; keyless research system; open-ended written response; text data mining; uniform distribution test; ¿2 uniform distribution test; Data analysis; Data mining; Data structures; Databases; Electronic mail; Information management; Natural languages; Stock markets; System testing; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Computing, Information and Control (ICICIC), 2009 Fourth International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-4244-5543-0
Type :
conf
DOI :
10.1109/ICICIC.2009.359
Filename :
5412336
Link To Document :
بازگشت