Title :
Automatic ranking of swear words using word embeddings and pseudo-relevance feedback
Author :
Luis Fernando D´Haro;Rafael E. Banchs
Author_Institution :
Human Language Technologies, A?STAR, Singapore
Abstract :
This paper describes a method for automatically ranking a dictionary of swear words based on their level of rudeness. The final ranking is generated by combining two baseline rankings: 1) using the normalized accumulated cosine similarity between the word embeddings of the swear word and the n-best list of closest neighborhoods, and 2) using a pseudo-relevance feedback and bootstrapping algorithm. The proposed methods are trained using dialogues extracted from movies scripts and evaluated against a list of swear words ranked manually in 5 categories by four different annotators. The Spearman correlation coefficient between the rankings generated by the proposed system and a consolidated gold standard reaches a similar value to the ones obtained among the different human annotators, proving that the proposed method is a good alternative to the manual process.
Keywords :
"Dictionaries","Motion pictures","Context","Internet","Encyclopedias","Electronic publishing"
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
DOI :
10.1109/APSIPA.2015.7415386