• DocumentCode
    3217154
  • Title

    File format identification and information extraction

  • Author

    Dhanalakshmi, R. ; Chellappan, C.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Anna Univ., Chennai, India
  • fYear
    2009
  • fDate
    9-11 Dec. 2009
  • Firstpage
    1497
  • Lastpage
    1501
  • Abstract
    The world have witnessed an explosive growth in the information available on the World Wide Web (WWW) over the last decade and people use it than ever before. With the everyday increasing importance of privacy, security, and wise use of computational resources, the corresponding technologies are increasingly being faced with the problem of file type detection. True identification of computer file types is a difficult task especially when dealing with suspicious goals. An extension to the file name with the file type is stored in the disk directory, but when a file is deleted, the entry for the file in the directory may be overwritten and hence quite difficult to identify its type which is serious issue in computer forensics. But if the fragment of file has its header information containing type identifying information the mentioned problem may be solved. But it is difficult to identify the type of fragment from the middle or if the header information is deleted or unavailable the identification becomes more complex. Added to it is the content available in those files needs to be extracted which may be of more importance. Hence file type identification and managing the information is of paramount importance nowadays, especially when it comes to the diffusion, reuse and information extraction of existing vast database. This paper focuses on identifying the file types addressing the various scenarios of file type being changed by the malicious user or proprietary file types or the obsolence of hardware and software.
  • Keywords
    Internet; data structures; information retrieval; file format identification; file type detection; information extraction; Data mining; Databases; Explosives; Face detection; Forensics; Information management; Information security; Privacy; Web sites; World Wide Web; Fileprints; Header and Trailer; content structure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Nature & Biologically Inspired Computing, 2009. NaBIC 2009. World Congress on
  • Conference_Location
    Coimbatore
  • Print_ISBN
    978-1-4244-5053-4
  • Type

    conf

  • DOI
    10.1109/NABIC.2009.5393688
  • Filename
    5393688