• DocumentCode
    3752137
  • Title

    Automatic ranking of swear words using word embeddings and pseudo-relevance feedback

  • Author

    Luis Fernando D´Haro;Rafael E. Banchs

  • Author_Institution
    Human Language Technologies, A?STAR, Singapore
  • fYear
    2015
  • Firstpage
    815
  • Lastpage
    820
  • Abstract
    This paper describes a method for automatically ranking a dictionary of swear words based on their level of rudeness. The final ranking is generated by combining two baseline rankings: 1) using the normalized accumulated cosine similarity between the word embeddings of the swear word and the n-best list of closest neighborhoods, and 2) using a pseudo-relevance feedback and bootstrapping algorithm. The proposed methods are trained using dialogues extracted from movies scripts and evaluated against a list of swear words ranked manually in 5 categories by four different annotators. The Spearman correlation coefficient between the rankings generated by the proposed system and a consolidated gold standard reaches a similar value to the ones obtained among the different human annotators, proving that the proposed method is a good alternative to the manual process.
  • Keywords
    "Dictionaries","Motion pictures","Context","Internet","Encyclopedias","Electronic publishing"
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
  • Type

    conf

  • DOI
    10.1109/APSIPA.2015.7415386
  • Filename
    7415386