DocumentCode
2179331
Title
Toward text message normalization: Modeling abbreviation generation
Author
Pennell, Deana ; Liu, Yang
Author_Institution
Comput. Sci. Dept., Univ. of Texas at Dallas, Dallas, TX, USA
fYear
2011
fDate
22-27 May 2011
Firstpage
5364
Lastpage
5367
Abstract
This paper describes a text normalization system for deletion-based abbreviations in informal text. We propose using statistical classifiers to learn the probability of deleting a given character using features based on character context, position in the word and containing syllable, and function within the word. To ensure that our system is robust to different and previously unseen abbreviations for a word, we generate multiple abbreviation hypotheses for a word using the predictions from the classifiers. We then reverse the mappings to enable recovery of English words from the abbreviations. Different knowledge sources are used to disambiguate word candidates: abbreviation likelihood, length, and language model scores. Our results show that this approach is feasible and warrants further exploration in the future.
Keywords
electronic messaging; probability; speech synthesis; text analysis; word processing; English word; SMS; abbreviation likelihood; character context; deletion-based abbreviation; probability; toward text message normalization; Computational modeling; Context; Decoding; Error analysis; Hidden Markov models; Mathematical model; Twitter; abbreviation modeling; noisy text processing; text normalization; twitter;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location
Prague
ISSN
1520-6149
Print_ISBN
978-1-4577-0538-0
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2011.5947570
Filename
5947570
Link To Document