Title :
Preventing False Positives in Content-Based Phishing Detection
Author :
Nakayama, Shinta ; Echizen, Isao ; Yoshiura, Hiroshi
Author_Institution :
Grad. Sch. of Human Commun., Univ. of Electro-Commun., Chofu, Japan
Abstract :
Content-based phishing detection extracts keywords from a target Web page, uses these keywords to retrieve the corresponding legitimate site, and detects phishing when the domain of the target page does not match that of the retrieved site. It often misidentifies a legitimate target site as a phishing site, however, because the extracted keywords do not charecterize the legitimate site with sufficient accuracy. Two methods are described for extracting keywords: domain keyword extraction, which extracts keywords from not only the page on the browser but also from pages linked from this page, and time-invariant keyword extraction, which extracts keywords from the page and previous versions of the page. Experiments using 172 legitimate sites demonstrated a reduction in the false detection rate from 14.0% to 7.6%, while experiments using 172 phishing sites demonstrated no change in the rate of overlooking phishing pages.
Keywords :
Web sites; computer crime; content-based retrieval; unsolicited e-mail; Web page; content-based phishing detection; domain keyword extraction; false positives prevention; legitimate site; phishing sites; time-invariant keyword extraction; Content based retrieval; Data mining; HTML; Humans; IP networks; Informatics; Information retrieval; National security; Signal processing; Web pages; Internet; network; phishing; security;
Conference_Titel :
Intelligent Information Hiding and Multimedia Signal Processing, 2009. IIH-MSP '09. Fifth International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-4717-6
Electronic_ISBN :
978-0-7695-3762-7
DOI :
10.1109/IIH-MSP.2009.147