Title :
Structural Parse Tree Features for Text Representation
Author :
Massung, Sean ; Chengxiang Zhai ; Hockenmaier, Julia
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
Abstract :
We propose and study novel text representation features created from parse tree structures. Unlike the traditional parse tree features which include all the attached syntactic categories to capture linguistic properties of text, the new features are solely or primarily defined based on the tree structure, and thus better reflect the pure structural properties of parse trees. We hypothesize that these new complex structural features capture an orthogonal perspective of text even compared to advanced syntactic ones. Evaluation based on three different text categorization tasks (i.e., nationality detection, essay scoring, and sentiment analysis) shows that the proposed new tree structure features complement the existing ones to enrich text representation. Experiment results further show that a combination of the proposed new structure features with word n-grams can improve F1 score and classification accuracy.
Keywords :
classification; computational linguistics; text analysis; trees (mathematics); F1 score; classification accuracy; complex structural features; essay scoring; linguistic properties; nationality detection; orthogonal perspective; parse tree structures; sentiment analysis; structural parse tree features; structural properties; syntactic categories; text categorization tasks; text representation; word n-grams; Accuracy; Feature extraction; Information retrieval; Production; Skeleton; Syntactics; Text categorization;
Conference_Titel :
Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on
Conference_Location :
Irvine, CA
DOI :
10.1109/ICSC.2013.13