DocumentCode :
3717183
Title :
SyntacticDiff: Operator-based transformation for comparative text mining
Author :
Sean Massung;ChengXiang Zhai
Author_Institution :
Department of Computer Science, College of Engineering University of Illinois at Urbana-Champaign
fYear :
2015
Firstpage :
571
Lastpage :
580
Abstract :
We describe SyntacticDiff, a novel, general, and efficient edit-based method for transforming sequences of words given a reference text collection. These transformations can be used directly or can be employed as features to represent text data in a wide variety of text mining applications. As case studies, we apply SyntacticDiff to three quite different tasks, including grammatical error correction, student essay clustering and analysis, and native language identification, showing its benefit in each case. SyntacticDiff is completely general and can thus be potentially applied to any text data in any natural language. It is highly efficient, customizable, and able to capture syntactic differences from a reference text collection at the sentence, document, and subcollection levels. This enables both a rich translation method and feature representation for many text mining tasks that deal with word usage and syntax beyond bag-of-words.
Keywords :
"Text mining","Transforms","Syntactics","Natural language processing","Robustness","Writing"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7363801
Filename :
7363801
Link To Document :
بازگشت