• DocumentCode
    2261043
  • Title

    A method for stemming and eliminating common words for Persian text summarization

  • Author

    Berenjkoob, Marzieh ; Mehri, Razieh ; Khosravi, Hadi ; Nematbakhsh, Mohammad Ali

  • Author_Institution
    Dept. of Comput. Eng., Univ. of Isfahan., Isfahan, Iran
  • fYear
    2009
  • fDate
    24-27 Sept. 2009
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    With high increasing documents and electronic texts in Persian language, the use of fast methods to achieve texts through huge sets of documents is highly crucial. Persian text summarization which shows the main concept of a text in minimum size is an effective solution. One of the steps in Persian text summarization is to stem and eliminate common words. The aim of this research is to stem words from Persian documents to make their use more efficient in text summarization, the present method is to eliminate words and stem keywords. The compound of existing techniques in the words network was used to create a Persian database using the Dehkhoda dictionary. The algorithm used for summarization is based on statistical techniques. In this method each sentence is given an important weight, sentences with higher weight are used for summarization. By comparing the results of other algorithms on Persian texts we concluded that our technique extracts the root of the existing words with more precision.
  • Keywords
    natural language processing; statistical analysis; text analysis; Dehkhoda dictionary; Persian language; Persian text summarization; common words elimination; common words stemming; statistical technique; Data mining; Databases; Dictionaries; Frequency measurement; Information retrieval; Natural language processing; Ontologies; Statistical analysis; Text recognition; Database; Text Summarization; common words; stemming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-4538-7
  • Electronic_ISBN
    978-1-4244-4540-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2009.5313836
  • Filename
    5313836