Title :
Extracting Thai compound nouns for paragraph extraction in Thai text
Author :
Suwanno, Nuttida ; Suzuki, Yoshimi ; Yamazaki, Haruaki
Author_Institution :
Dept. of Comput. Sci. & Media Eng., Yamanashi Univ., Kofu, Japan
fDate :
30 Oct.-1 Nov. 2005
Abstract :
In this paper, we propose a method for Thai text summarization by paragraph extraction based on the extracted Thai compound nouns and term weighting method in terms of term frequency inverse document frequency (TF·IDF). According to the highly frequent and highly productive of Thai compound nouns in Thai text, this property shows that Thai compound nouns play the important role in summarization. The morphological analysis is used to determine Thai compound nouns and all paragraphs are ranked by summation of term weighting score. The cosine similarity between each paragraph is calculated in order to select the important paragraphs among all paragraphs. The result shows that 0.469 F-score for 45% summary of our proposed method yield the most effective approach among all experiments.
Keywords :
information retrieval; natural languages; statistical analysis; text analysis; Thai compound nouns; Thai text summarization; morphological analysis; paragraph extraction; term frequency inverse document frequency; term weighting method; Computer science; Data mining; Frequency; Information retrieval; Natural languages; Power capacitors; Singular value decomposition; Testing;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598818