Title :
Document Clustering for Forensic Computing: An Approach for Improving Computer Inspection
Author :
Da Cruz Nassif, Luís Filipe ; Hruschka, Eduardo Raul
Author_Institution :
Brazilian Fed. Police Dept., Sao Paulo, Brazil
Abstract :
In computer forensic analysis, hundreds of thousands of files are usually examined. Much of those files consist of unstructured text, whose analysis by computer examiners is difficult to be performed. In this context, automated methods of analysis are of great interest. In particular, algorithms for clustering documents can facilitate the discovery of new and useful knowledge from the documents under analysis. We present an approach that applies clustering algorithms to forensic analysis of computers seized in police investigations. We illustrate the proposed approach by carrying out experimentation with five clustering algorithms (K-means, K-medoids, Single Link, Complete Link, and Average Link) applied to five datasets obtained from computers seized in real-world investigations. In addition, two relative validity indexes were used to automatically estimate the number of clusters. Related studies in the literature are significantly more limited than our study. Our experiments show that the Average Link and Complete Link algorithms provide the best results for our application domain. If suitably initialized, partitional algorithms (K-means and K-medoids) can also yield to very good results. Finally, we also present and discuss practical results that can be useful for researchers and practitioners of forensic computing.
Keywords :
computer forensics; document handling; pattern clustering; police data processing; K-means clustering algorithm; K-medoids clustering algorithm; average link clustering algorithm; complete link clustering algorithm; computer forensic analysis; computer inspection improvement; document clustering; forensic computing; police investigations; single link clustering algorithm; Algorithm design and analysis; Clustering algorithms; Computational efficiency; Computers; Forensics; Partitioning algorithms; TV; Forensic computing; clustering; text mining;
Conference_Titel :
Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
978-1-4577-2134-2
DOI :
10.1109/ICMLA.2011.59