• DocumentCode
    2635014
  • Title

    A Novel Algorithm for Normalizing Noisy Arabic Text

  • Author

    Al-Shammari, Eiman Tamah

  • Author_Institution
    Kuwait Univ., Kuwait
  • Volume
    4
  • fYear
    2009
  • fDate
    March 31 2009-April 2 2009
  • Firstpage
    477
  • Lastpage
    482
  • Abstract
    In this paper, an algorithm to normalize noisy text, which only focuses on the Arabic language, is introduced. Although there have been many theories that discuss Arabic text processing, there has not been, so far, one theory that focuses on noisy Arabic texts. Additionally, this paper introduces a new similarity measure to stem Arabic noisy document. The need for such a new measure stems from the fact that the common rules applied in stemming cannot be applied on noisy texts, which do not conform to the known grammatical rules and have various spelling mistakes. Thus, the proposed normalization algorithm automatically group words after applying the similarity measure. In order to make sure of such a theory of algorithm, the new normalization technique is evaluated by the under-stemming errors reduction technique introduced by Paice.
  • Keywords
    grammars; natural language processing; text analysis; Arabic document text processing; Arabic language; grammatical rule; noisy Arabic text normalization algorithm; similarity measure; under-stemming error reduction technique; Books; Computer science; Indexing; Information retrieval; Noise reduction; Search engines; Text analysis; Text processing; Vocabulary; Web search; Arabic; Stemming; Text processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Engineering, 2009 WRI World Congress on
  • Conference_Location
    Los Angeles, CA
  • Print_ISBN
    978-0-7695-3507-4
  • Type

    conf

  • DOI
    10.1109/CSIE.2009.952
  • Filename
    5171042