مرکز منطقه ای اطلاع رساني علوم و فناوري - Evaluation of Partition-Based Text Clustering Techniques to Categorize Indic Language Documents

DocumentCode :

3076814

Title :

Evaluation of Partition-Based Text Clustering Techniques to Categorize Indic Language Documents

Author :

Meedeniya, D.A. ; Perera, A.S.

Author_Institution :

Dept. of Comput. Sci. & Eng., Univ. of Moratuwa, Moratuwa

fYear :

2009

fDate :

6-7 March 2009

Firstpage :

1497

Lastpage :

1500

Abstract :

Wide availability of electronic data has led to the vast interest in text analysis, information retrieval and text categorization methods. To provide a better service, there is a need for non-English based document analysis and categorizing systems, as is currently available for English text documents. This study is mainly focused on categorizing Indic language documents. The main techniques examined in this study include data pre-processing and document clustering. The approach makes use of a transformation based on the text frequency and the inverse document frequency, which enhances the clustering performance. This approach is based on latent semantic analysis, k-means clustering and Gaussian mixture model clustering. A text corpus categorized by human readers is utilized to test the validity of the suggested approach. The technique introduced in this work enables the processing of text documents written in Sinhala, and empowers citizens and organizations to do their daily work eficiently.

Keywords :

Gaussian processes; natural languages; pattern clustering; text analysis; Gaussian mixture model clustering; Indic language document categorization; data pre-processing; document clustering; information retrieval; inverse document frequency; k-means clustering; latent semantic analysis; partition-based text clustering technique; text categorization; text frequency; Availability; Computer science; Data engineering; Frequency; Humans; Information retrieval; Natural languages; Testing; Text analysis; Text categorization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advance Computing Conference, 2009. IACC 2009. IEEE International

Conference_Location :

Patiala

Print_ISBN :

978-1-4244-2927-1

Electronic_ISBN :

978-1-4244-2928-8

Type :

conf

DOI :

10.1109/IADCC.2009.4809239

Filename :

4809239

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3076814