• DocumentCode
    679955
  • Title

    Authorship detection of SMS messages using unigrams

  • Author

    Ragel, Roshan ; Herath, P. ; Senanayake, Upul

  • Author_Institution
    Dept. of Comput. Eng., Univ. of Peradeniya, Peradeniya, Sri Lanka
  • fYear
    2013
  • fDate
    17-20 Dec. 2013
  • Firstpage
    387
  • Lastpage
    392
  • Abstract
    SMS messaging is a popular media of communication. Because of its popularity and privacy, it could be used for many illegal purposes. Additionally, since they are part of the day to day life, SMSes can be used as evidence for many legal disputes. Since a cellular phone might be accessible to people close to the owner, it is important to establish the fact that the sender of the message is indeed the owner of the phone. For this purpose, the straight forward solutions seem to be the use of popular stylometric methods. However, in comparison with the data used for stylometry in the literature, SMSes have unusual characteristics making it hard or impossible to apply these methods in a conventional way. Our target is to come up with a method of authorship detection of SMS messages that could still give a usable accuracy. We argue that, considering the methods of author attribution, the best method that could be applied to SMS messages is an n-gram method. To prove our point, we checked two different methods of distribution comparison with varying number of training and testing data. We specifically try to compare how well our algorithms work under less amount of testing data and large number of candidate authors (which we believe to be the real world scenario) against controlled tests with less number of authors and selected SMSes with large number of words. To counter the lack of information in an SMS message, we propose the method of stacking together few SMSes.
  • Keywords
    cellular radio; electronic messaging; SMS messages; SMS messaging; author attribution; authorship detection; candidate authors; cellular phone; n-gram method; popular media; popular stylometric methods; stacking; straight forward solutions; stylometry; unigrams; Accuracy; Databases; Measurement; Pragmatics; Testing; Training; Vectors; SMS messaging; author attributing; stylometry unigrams;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial and Information Systems (ICIIS), 2013 8th IEEE International Conference on
  • Conference_Location
    Peradeniya
  • Print_ISBN
    978-1-4799-0908-7
  • Type

    conf

  • DOI
    10.1109/ICIInfS.2013.6732015
  • Filename
    6732015