• DocumentCode
    3245603
  • Title

    Automatic junk e-mail filtering based on latent content

  • Author

    Bellegarda, Jerome R. ; Naik, Devang ; Silverman, Kim E A

  • Author_Institution
    Spoken Language Group, Apple Comput. Inc., Cupertino, CA, USA
  • fYear
    2003
  • fDate
    30 Nov.-3 Dec. 2003
  • Firstpage
    465
  • Lastpage
    470
  • Abstract
    The explosion in unsolicited mass electronic mail (junk e-mail) over the past decade has sparked interest in automatic filtering solutions. Traditional techniques tend to rely on header analysis, keyword/keyphrase matching and analogous rule-based predicates, and/or some probabilistic model of text generation. This paper aims instead at deciding whether or not the latent subject matter is consistent with the user´s interests. The underlying framework is latent semantic analysis: each e-mail is automatically classified against two semantic anchors, one for legitimate and one for junk messages. Experiments show that this approach is competitive with the state-of-the-art in e-mail classification, and potentially advantageous in real-world applications with high junk-to-legitimate ratios. The resulting technology has been successfully released in August 2002 as part of the e-mail client bundled with the MacOS 10.2 operating system.
  • Keywords
    classification; text analysis; unsolicited e-mail; automatic junk e-mail filtering; e-mail classification; e-mail client; e-mail latent content; header analysis; junk-to-legitimate ratio; keyword/keyphrase matching; latent semantic analysis; rule-based predicates; semantic anchors; unsolicited mass electronic mail; Business; Costs; Databases; Electronic mail; Explosions; Filtering; Internet; Natural languages; Operating systems; Unsolicited electronic mail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
  • Print_ISBN
    0-7803-7980-2
  • Type

    conf

  • DOI
    10.1109/ASRU.2003.1318485
  • Filename
    1318485