Title :
A Semantic Triplet Based Story Classifier
Author :
Ceran, B. ; Karad, R. ; Mandvekar, A. ; Corman, S.R. ; Davulcu, Hasan
Author_Institution :
Sch. of Comput., Inf. & Decision Syst. Eng., Arizona State Univ., Tempe, AZ, USA
Abstract :
A story is defined as “an actor(s) taking action(s) that culminates in a resolution(s).” In this paper, we investigate the utility of standard keyword based features, statistical features based on shallow-parsing (such as density of POS tags and named entities), and a new set of semantic features to develop a story classifier. This classifier is trained to identify a paragraph as a “story,” if the paragraph contains mostly story(ies). Training data is a collection of expert-coded story and non-story paragraphs from RSS feeds from a list of extremist web sites. Our proposed semantic features are based on suitable aggregation and generalization of <;Subject, Verb, Object>; triplets that can be extracted using a parser. Experimental results show that a model of statistical features alongside memory-based semantic linguistic features achieves the best accuracy with a Support Vector Machine (SVM) classifier.
Keywords :
Web sites; grammars; linguistics; literature; pattern classification; statistical analysis; support vector machines; POS tag; RSS feed; SVM classifier; Web site; expert-coded story; keyword based feature; memory-based semantic linguistic feature; named entities; nonstory paragraph; parser; semantic triplet based story classifier; shallow-parsing; statistical features; support vector machine; Accuracy; Feature extraction; Humans; Organizations; Semantics; Standards organizations; Support vector machines;
Conference_Titel :
Advances in Social Networks Analysis and Mining (ASONAM), 2012 IEEE/ACM International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4673-2497-7
DOI :
10.1109/ASONAM.2012.97