• DocumentCode
    166250
  • Title

    A fused forensic text comparison system using lexical features, word and character N-grams

  • Author

    Ishihara, Sayaka

  • Author_Institution
    Dept. of Linguistics, Australian Nat. Univ., Canberra, ACT, Australia
  • fYear
    2014
  • fDate
    24-27 Sept. 2014
  • Firstpage
    2762
  • Lastpage
    2768
  • Abstract
    This study investigates the degree that the performance of a likelihood ratio (LR)-based forensic text comparison (FTC) system improves by using logistic-regression fusion on LRs that were separately estimated by three different procedures, involving lexical features, word-based N-grams and character-based N-grams. This study uses predatory chatlog messages. The number of words used for modelling each group of messages is 500 words. The performance of the FTC system is assessed in terms of its validity (= accuracy) and reliability (= precision) using the log-likelihood-ratio cost (Cllr) and 95% credible intervals (CI), respectively. It is demonstrated that 1) out of the three procedures, the lexical features procedure performed best in terms of Cllr; and that 2) the fused system outperformed all three of the single procedures. The Cllr value of the fused system is better than that of the procedure with lexical features by a value of 0.14. It is also reported that the validity and reliability of a system is negatively correlated; the fused system that yielded the best result in terms of Cllr has the worst CI value.
  • Keywords
    digital forensics; feature extraction; logistics; natural language processing; regression analysis; text analysis; FTC system; character n-grams; fused forensic text comparison system; lexical features; likelihood ratio-based forensic text comparison system; log-likelihood-ratio cost; logistic-regression fusion; predatory chatlog messages; word n-grams; Calibration; Databases; Forensics; Kernel; Probability; Reliability; Vectors; 95% credible intervals; N-grams; Tippett plot; forensic text comparison; lexical features; likelihood ratio; log likelihood ratio cost; logistic-regression fusion;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
  • Conference_Location
    New Delhi
  • Print_ISBN
    978-1-4799-3078-4
  • Type

    conf

  • DOI
    10.1109/ICACCI.2014.6968504
  • Filename
    6968504