Title :
Chinese-Uighur Sentence Alignment Based on Hybrid Strategy with Mistake Spread Suppression
Author :
Tian, Shengwei ; Ibrahim, Turgun ; Umal, Hasan ; Yu, Long
Author_Institution :
Coll. of Inf. Sci. & Eng., Xinjiang Universtiy, Urumqi, China
Abstract :
This paper proposes a hybrid algorithm based on mistake spread suppression to align Chinese-Uighur sentences. Aiming at the shortcoming of mistake spread in alignment algorithm based on length, this paper presents a new kind of suppression strategy for mistake spread. This strategy omits Chinese segmentation and processing for post tagging. By using characteristics of punctuation, sentence length and Chinese-Uighur correspondence information,the anchor points with 1:1 pattern sentence pairs are identified to suppress mistakes spread. Among anchor points, a hybrid strategy based on both length and punctuation is used to align sentences. Experimental results verified the high precision of identifying anchor points and the effective restraint of the spread of alignment mistakes; Hybrid alignment algorithm avoids the weakness of high time complexity alignment algorithms based on word. In addition, its performance is improved more compare with traditional alignment algorithms, and alignment mistake ratio is reduced from 4.8% to 2.3%.
Keywords :
computational complexity; natural language processing; Chinese-Uighur sentence alignment; bilingual corpora; hybrid alignment algorithm; mistake spread suppression; pattern sentence pair; time complexity alignment algorithm; Computational Intelligence Society; Data mining; Dictionaries; Educational institutions; Information retrieval; Information science; Large-scale systems; Paper technology; Pattern matching; Tagging; Bilingual Corpora; Hybrid Strategy; Mistake Spread Suppression; Sentence Alignment;
Conference_Titel :
Environmental Science and Information Application Technology, 2009. ESIAT 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3682-8
DOI :
10.1109/ESIAT.2009.208