DocumentCode
3587489
Title
Noisy SMS text normalization model
Author
Jose, Greety ; Raj, Nisha S.
Author_Institution
Dept. of Comput. Sci., SCMS Sch. of Eng. & Technol., Ernakulam, India
fYear
2014
Firstpage
1
Lastpage
6
Abstract
Today digital media such as social networks, chat rooms, and forums have gained much importance in human life for information sharing. Users will share their knowledge and emotions in their own languages. This will create a novel syntax to communicate their messages with as much as pithiness as possible. Noisy text is characterized by unusual forms such as abbreviations, phonetic translations, short forms etc. This led to the emergence of text normalization. Cleaning the noisy text has become an important factor for adequate development and deployment of NLP (Natural Language Processing) services such as text-to-speech and automatic translation. In this paper we introduce a channel based normalization model for cleaning the noisy texts. The normalization is based on the types of distortion such as grapheme distortion, abbreviation and phoneme distortion. The model will explore the type of distortion occurred in the noisy word and replace it by using the different channel list. Precursory evaluation shows that the channel model will normalize the noisy word to their standard peer with 96.43 % accuracy.
Keywords
natural language processing; social networking (online); text analysis; automatic translation; channel model; chat rooms; digital media; forums; grapheme distortion; information sharing; natural language processing services; noisy SMS text normalization model; noisy text; noisy word; phoneme distortion; social networks; text normalization; text-to-speech; Adaptation models; Computational modeling; Databases; Hidden Markov models; Natural language processing; Noise measurement; Standards; Machine Translation; Natural Language Processing; Noisy words; Social Media; Text Normalization;
fLanguage
English
Publisher
ieee
Conference_Titel
Convergence of Technology (I2CT), 2014 International Conference for
Print_ISBN
978-1-4799-3758-5
Type
conf
DOI
10.1109/I2CT.2014.7092164
Filename
7092164
Link To Document