Title :
Heuristic Methods for Filtering Newly Coined Profanities Using Phylogenetic Analysis
Author :
Yoon, Taijin ; Park, Sun-Young ; Chung, WooKeun ; Cho, Hwan-Gue
Author_Institution :
Dept. of Comput. Sci., Pusan Nat. Univ., Busan, South Korea
Abstract :
We proposed a smart filtering system for newly coined profanities, using approximate string searching and sequence alignment. However there are a lot of coined profanities. For example, game portal Nexon has a forbidden word list of 60,000 words, so even our system still requires too much computational time for application to a real-time chat system. Therefore we need to manage a profanity database, discard redundancy and divide the elements into several groups by priority. In this paper, we propose a management algorithm for a profanity database. We use phylogenetic analysis, make evolution trees and classify profanities. We compare input words and a root of a group. We decrease the elements of the database from 6302 to 2229.
Keywords :
behavioural sciences computing; evolutionary computation; information filtering; natural language processing; string matching; text analysis; word processing; coined profanities filtering; evolution tree; heuristic filtering method; phylogenetic analysis; profanity database; sequence alignment; smart filtering system; string searching; Algorithm design and analysis; Databases; Extraterrestrial measurements; Games; Phylogeny; USA Councils; Edit Distance; Profanity; Sequence Alignment;
Conference_Titel :
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2010 International Conference on
Conference_Location :
Huangshan
Print_ISBN :
978-1-4244-8434-8
Electronic_ISBN :
978-0-7695-4235-5
DOI :
10.1109/CyberC.2010.70