DocumentCode :
258693
Title :
Lexical normalization model for noisy SMS text
Author :
Jose, Greety ; Raj, Nisha S.
Author_Institution :
Dept. of Comput. Sci., SCMS Sch. of Eng. & Technol., Ernakulam, India
fYear :
2014
fDate :
17-18 Dec. 2014
Firstpage :
57
Lastpage :
62
Abstract :
In day to day life, digital mediated interactions and communications being an important constituent. The expeditious growth of electronic communications such as E-mails, micro blogs, SMS and chats etc has fabricated extensively noisy forms of text. It predominantly in young urbanités. The tremendous growth of noises in text are due to a variety of factors, such as the small number of characters allowed per text messages (160 characters is allowed per SMS and 140 characters allowed per tweets), inventing new abbreviations, using non standard orthographic forms, phonetic substitution etc. In this paper we introduce a lexical normalization model for cleaning the noisy texts. The normalization is based on the channelized database. The model will capture the user interaction for improving the model accuracy. Precursory evaluation shows that the channel model will normalize the noisy word to their standard peer with 97.5 % accuracy.
Keywords :
electronic messaging; text analysis; E-mails; channelized database; electronic communications; lexical normalization model; micro blogs; natural language processing; noisy SMS text; phonetic substitution; short message services; Computational modeling; Databases; Hidden Markov models; Natural language processing; Noise; Noise measurement; Standards; Lexical Normalization; Machine Translation; Natural Language Processing; Noisy words; Non-noisy word; SMS; Social Media; Text Normalization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Systems and Communications (ICCSC), 2014 First International Conference on
Conference_Location :
Trivandrum
Print_ISBN :
978-1-4799-6012-5
Type :
conf
DOI :
10.1109/COMPSC.2014.7032621
Filename :
7032621
Link To Document :
بازگشت