• DocumentCode
    3254977
  • Title

    Document Clustering for Forensic Computing: An Approach for Improving Computer Inspection

  • Author

    Da Cruz Nassif, Luís Filipe ; Hruschka, Eduardo Raul

  • Author_Institution
    Brazilian Fed. Police Dept., Sao Paulo, Brazil
  • Volume
    1
  • fYear
    2011
  • fDate
    18-21 Dec. 2011
  • Firstpage
    265
  • Lastpage
    268
  • Abstract
    In computer forensic analysis, hundreds of thousands of files are usually examined. Much of those files consist of unstructured text, whose analysis by computer examiners is difficult to be performed. In this context, automated methods of analysis are of great interest. In particular, algorithms for clustering documents can facilitate the discovery of new and useful knowledge from the documents under analysis. We present an approach that applies clustering algorithms to forensic analysis of computers seized in police investigations. We illustrate the proposed approach by carrying out experimentation with five clustering algorithms (K-means, K-medoids, Single Link, Complete Link, and Average Link) applied to five datasets obtained from computers seized in real-world investigations. In addition, two relative validity indexes were used to automatically estimate the number of clusters. Related studies in the literature are significantly more limited than our study. Our experiments show that the Average Link and Complete Link algorithms provide the best results for our application domain. If suitably initialized, partitional algorithms (K-means and K-medoids) can also yield to very good results. Finally, we also present and discuss practical results that can be useful for researchers and practitioners of forensic computing.
  • Keywords
    computer forensics; document handling; pattern clustering; police data processing; K-means clustering algorithm; K-medoids clustering algorithm; average link clustering algorithm; complete link clustering algorithm; computer forensic analysis; computer inspection improvement; document clustering; forensic computing; police investigations; single link clustering algorithm; Algorithm design and analysis; Clustering algorithms; Computational efficiency; Computers; Forensics; Partitioning algorithms; TV; Forensic computing; clustering; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    978-1-4577-2134-2
  • Type

    conf

  • DOI
    10.1109/ICMLA.2011.59
  • Filename
    6146981