Title :
Spam filtering by semantic indexing and duplicate detection
Author_Institution :
YMCA Univ. of Sci. & Technol., Faridabad, India
Abstract :
Internet is becoming an integral part of everyday life and email has become a powerful tool to exchange ideas and information. Through email users connect socially and commercially. But mass mailing i.e. Spam has become one of the biggest worldwide problem. So there is requirement to develop a spam filter which can detect the spam and can stop the mass mailing. In this paper, Latent Semantic Indexing (LSI) approach is used for spam filtering. Latent semantic indexing (LSI) is an indexing and retrieval method which uses singular value decomposition (SVD) to find the relationships between the terms and unstructured collection of message. LSI is based on the principle that terms that are used in the same contexts tend to have similar meanings. Apart from this spam are obtained more or less abundant replications in a significant numbers. The detection of these duplicates is important because it allows to lighten email box and will improve the efficiency of spam filter. The Duplicate Message Detection Techniques (DMDT) has been used to improve the efficiency and effectiveness of spam filter.
Keywords :
indexing; information filtering; singular value decomposition; unsolicited e-mail; DMDT; Internet; LSI approach; SVD; duplicate message detection technique; electronic mail; email; indexing method; latent semantic indexing; retrieval method; singular value decomposition; spam filtering; Computational modeling; Filtering; Indexing; Large scale integration; Semantics; Unsolicited electronic mail; Latent Semantic Indexing (LSI); Spam; Spam Filter; singular value decomposition (SVD) and Duplicate Message Detection Techniques (DMDT);
Conference_Titel :
Computing, Communication & Automation (ICCCA), 2015 International Conference on
Conference_Location :
Noida
Print_ISBN :
978-1-4799-8889-1
DOI :
10.1109/CCAA.2015.7148489