• DocumentCode
    1507057
  • Title

    Automatic Stochastic Arabic Spelling Correction With Emphasis on Space Insertions and Deletions

  • Author

    Alkanhal, Mohamed I. ; Al-Badrashiny, Mohamed A. ; Alghamdi, Mansour M. ; Al-Qabbany, Abdulaziz O.

  • Author_Institution
    Comput. Res. Inst. (CRI), King Abdulaziz City for Sci. & Technol. (KACST), Riyadh, Saudi Arabia
  • Volume
    20
  • Issue
    7
  • fYear
    2012
  • Firstpage
    2111
  • Lastpage
    2122
  • Abstract
    This paper presents a stochastic-based approach for misspelling correction of Arabic text. In this approach, a context-based two-layer system is utilized to automatically correct misspelled words in large datasets. The first layer produces a list in which possible alternatives for each misspelled word are ranked using the Damerau-Levenshtein edit distance. The same layer also considers merged and split words resulting from deletion and insertion of space character. The right alternative for each misspelled word is stochastically selected based on the maximum marginal probability via A* lattice search and m-gram probability estimation. A large dataset was utilized to build and test the system. The testing results show that as we increase the size of the training set, the performance improves reaching 97.9% of F1 score for detection and 92.3% of F1 score for correction.
  • Keywords
    natural language processing; probability; stochastic processes; text analysis; Arabic text; Damerau-Levenshtein edit distance; automatic stochastic Arabic spelling correction; context-based two-layer system; large datasets; lattice search; m-gram probability estimation; maximum marginal probability; misspelled words; misspelling correction; space character deletion; space character insertion; Context; Dictionaries; Helium; Noise measurement; Semantics; System performance; Training; A* lattice search; Arabic language processing; space deletion errors; space insertion errors; spelling correction; statistical disambiguation; word distance;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2197612
  • Filename
    6193415