DocumentCode
2261043
Title
A method for stemming and eliminating common words for Persian text summarization
Author
Berenjkoob, Marzieh ; Mehri, Razieh ; Khosravi, Hadi ; Nematbakhsh, Mohammad Ali
Author_Institution
Dept. of Comput. Eng., Univ. of Isfahan., Isfahan, Iran
fYear
2009
fDate
24-27 Sept. 2009
Firstpage
1
Lastpage
6
Abstract
With high increasing documents and electronic texts in Persian language, the use of fast methods to achieve texts through huge sets of documents is highly crucial. Persian text summarization which shows the main concept of a text in minimum size is an effective solution. One of the steps in Persian text summarization is to stem and eliminate common words. The aim of this research is to stem words from Persian documents to make their use more efficient in text summarization, the present method is to eliminate words and stem keywords. The compound of existing techniques in the words network was used to create a Persian database using the Dehkhoda dictionary. The algorithm used for summarization is based on statistical techniques. In this method each sentence is given an important weight, sentences with higher weight are used for summarization. By comparing the results of other algorithms on Persian texts we concluded that our technique extracts the root of the existing words with more precision.
Keywords
natural language processing; statistical analysis; text analysis; Dehkhoda dictionary; Persian language; Persian text summarization; common words elimination; common words stemming; statistical technique; Data mining; Databases; Dictionaries; Frequency measurement; Information retrieval; Natural language processing; Ontologies; Statistical analysis; Text recognition; Database; Text Summarization; common words; stemming;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location
Dalian
Print_ISBN
978-1-4244-4538-7
Electronic_ISBN
978-1-4244-4540-0
Type
conf
DOI
10.1109/NLPKE.2009.5313836
Filename
5313836
Link To Document