Analyzing tweets to identify malicious messages

Author

Beck, Kristofer

Author_Institution

Nat. Center for the Protection of Financial Infrastruct., Dakota State Univ., Madison, SD, USA

fYear

2011

fDate

15-17 May 2011

Firstpage

1

Lastpage

5

Abstract

With social networking becoming a popular medium, a new frontier of communication begins. Sites like Facebook, Linkedin, and Twitter are changing the way we communicate, often replacing a phone call or an email. In this paper, we will look at detecting spam and phishing over the Twitter network. We argue that spammers and phishers use specific keywords to entice a twitter to click on a link. This link could lead them to a malicious web form. A phishing or spam message has both words and a URL. Twitter is also limited to 140 characters per message. This makes the words used in the message much more important. Bayesian is a popular spam email approach that uses the absence or presence of a word to indicate what to label the message as a whole. We will eliminate Bayesian as a viable option and propose the use of logistic regression model. Current studies place emphases on the follower/followee ratio. We are going to prove that ratio is wrong. Our goal is to effectively detect the presence of spam and try to minimize its influence.

Keywords

Bayes methods; regression analysis; security of data; social networking (online); Bayesian; Facebook; Linkedin; Twitter; logistic regression model; malicious Web form; malicious messages identification; phishing detection; social networking; spam detection; tweet analysis; Bayesian methods; Observers; Security; Twitter; Unsolicited electronic mail; Twitter; machine learning; phishing; social networking; spam;

fLanguage

English

Publisher

ieee

Conference_Titel

Electro/Information Technology (EIT), 2011 IEEE International Conference on

Conference_Location

Mankato, MN

ISSN

2154-0357

Print_ISBN

978-1-61284-465-7

Type

conf

DOI

10.1109/EIT.2011.5978594

Filename

5978594