DocumentCode
3254977
Title
Document Clustering for Forensic Computing: An Approach for Improving Computer Inspection
Author
Da Cruz Nassif, Luís Filipe ; Hruschka, Eduardo Raul
Author_Institution
Brazilian Fed. Police Dept., Sao Paulo, Brazil
Volume
1
fYear
2011
fDate
18-21 Dec. 2011
Firstpage
265
Lastpage
268
Abstract
In computer forensic analysis, hundreds of thousands of files are usually examined. Much of those files consist of unstructured text, whose analysis by computer examiners is difficult to be performed. In this context, automated methods of analysis are of great interest. In particular, algorithms for clustering documents can facilitate the discovery of new and useful knowledge from the documents under analysis. We present an approach that applies clustering algorithms to forensic analysis of computers seized in police investigations. We illustrate the proposed approach by carrying out experimentation with five clustering algorithms (K-means, K-medoids, Single Link, Complete Link, and Average Link) applied to five datasets obtained from computers seized in real-world investigations. In addition, two relative validity indexes were used to automatically estimate the number of clusters. Related studies in the literature are significantly more limited than our study. Our experiments show that the Average Link and Complete Link algorithms provide the best results for our application domain. If suitably initialized, partitional algorithms (K-means and K-medoids) can also yield to very good results. Finally, we also present and discuss practical results that can be useful for researchers and practitioners of forensic computing.
Keywords
computer forensics; document handling; pattern clustering; police data processing; K-means clustering algorithm; K-medoids clustering algorithm; average link clustering algorithm; complete link clustering algorithm; computer forensic analysis; computer inspection improvement; document clustering; forensic computing; police investigations; single link clustering algorithm; Algorithm design and analysis; Clustering algorithms; Computational efficiency; Computers; Forensics; Partitioning algorithms; TV; Forensic computing; clustering; text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
Conference_Location
Honolulu, HI
Print_ISBN
978-1-4577-2134-2
Type
conf
DOI
10.1109/ICMLA.2011.59
Filename
6146981
Link To Document