Title :
Developing a persian chunker using a hybrid approach
Author :
Kian, Soheila ; Akhavan, Tara ; Shamsfard, Mehrnoush
Author_Institution :
Elecrical & Comput. Eng. Dept., Shahid Beheashti Univ., Tehran, Iran
Abstract :
Text segmentation is the process of recognizing boundaries of text constituents, such as sentences, phrases and words. This paper focuses on phrase segmentation also known as chunking. This task has different problems in various natural languages depending on linguistic features and prescribed form of writing. In this paper, we will discuss the problems and solutions especially for the Persian language and present our system for Persian phrase segmentation. Our system exploits a hybrid method for automatic chunking of Persian texts. The method at first exploits a rule-based approach to create a tagged corpus for training a neural network and then uses a multilayer perceptron neural network and Fuzzy C-Means Clustering to chunk new sentences. Experimental results show the average precision of %85.7 for the chunking result.
Keywords :
image segmentation; linguistics; logic programming; perceptrons; Fuzzy C; Persian chunker development; Persian phrase segmentation; hybrid approach; linguistic features; neural network perceptron; phrase segmentation chunking; rule based approach; text constituents boundaries; text segmentation; Fuzzy neural networks; Multi-layer neural network; Multilayer perceptrons; Natural languages; Neural networks; Text recognition; Writing;
Conference_Titel :
Computer Science and Information Technology, 2009. IMCSIT '09. International Multiconference on
Conference_Location :
Mragowo
Print_ISBN :
978-1-4244-5314-6
DOI :
10.1109/IMCSIT.2009.5352723