• DocumentCode
    2264177
  • Title

    Applying Tesseract-OCR to detection of image spam mails

  • Author

    Yamakawa, Daisuke ; Yoshiura, Noriaki

  • Author_Institution
    Dept. of Inf. & Comput. Sci., Saitama Univ., Saitama, Japan
  • fYear
    2012
  • fDate
    25-27 Sept. 2012
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    This paper applies Tesseract-OCR, optical character recognition software, to image spam mail filters. Tesseract-OCR can be specific to a certain language and this paper makes Tesseract-OCR specific to spam words. This specialization decreases times and CPU power that it takes to check whether images of mails include spam words. This paper examines the ability of the spam mail filter of Tesseract-OCR by experiment.
  • Keywords
    object detection; optical character recognition; unsolicited e-mail; Tesseract-OCR; image spam mails detection; optical character recognition software; spam mail filter; spam words; Character recognition; Drugs; Image recognition; Optical character recognition software; Postal services; Training; Unsolicited electronic mail; Computer Network Security; Image processing; Spam mail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network Operations and Management Symposium (APNOMS), 2012 14th Asia-Pacific
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4673-4494-4
  • Electronic_ISBN
    978-1-4673-4495-1
  • Type

    conf

  • DOI
    10.1109/APNOMS.2012.6356068
  • Filename
    6356068