DocumentCode
3217154
Title
File format identification and information extraction
Author
Dhanalakshmi, R. ; Chellappan, C.
Author_Institution
Dept. of Comput. Sci. & Eng., Anna Univ., Chennai, India
fYear
2009
fDate
9-11 Dec. 2009
Firstpage
1497
Lastpage
1501
Abstract
The world have witnessed an explosive growth in the information available on the World Wide Web (WWW) over the last decade and people use it than ever before. With the everyday increasing importance of privacy, security, and wise use of computational resources, the corresponding technologies are increasingly being faced with the problem of file type detection. True identification of computer file types is a difficult task especially when dealing with suspicious goals. An extension to the file name with the file type is stored in the disk directory, but when a file is deleted, the entry for the file in the directory may be overwritten and hence quite difficult to identify its type which is serious issue in computer forensics. But if the fragment of file has its header information containing type identifying information the mentioned problem may be solved. But it is difficult to identify the type of fragment from the middle or if the header information is deleted or unavailable the identification becomes more complex. Added to it is the content available in those files needs to be extracted which may be of more importance. Hence file type identification and managing the information is of paramount importance nowadays, especially when it comes to the diffusion, reuse and information extraction of existing vast database. This paper focuses on identifying the file types addressing the various scenarios of file type being changed by the malicious user or proprietary file types or the obsolence of hardware and software.
Keywords
Internet; data structures; information retrieval; file format identification; file type detection; information extraction; Data mining; Databases; Explosives; Face detection; Forensics; Information management; Information security; Privacy; Web sites; World Wide Web; Fileprints; Header and Trailer; content structure;
fLanguage
English
Publisher
ieee
Conference_Titel
Nature & Biologically Inspired Computing, 2009. NaBIC 2009. World Congress on
Conference_Location
Coimbatore
Print_ISBN
978-1-4244-5053-4
Type
conf
DOI
10.1109/NABIC.2009.5393688
Filename
5393688
Link To Document