Title :
QuIET: A Text Classification Technique Using Automatically Generated Span Queries
Author :
Polychronopoulos, Vassilis ; Pendar, Nick ; Jeffery, Shawn R.
Author_Institution :
Univ. of California, Santa Cruz, Santa Cruz, CA, USA
Abstract :
We propose a novel algorithm, QuIET, for binary classification of texts. The method automatically generates a set of span queries from a set of annotated documents and uses the query set to categorize unlabeled texts. QuIET generates models that are human understandable. We describe the method and evaluate it empirically against Support Vector Machines, demonstrating a comparable performance for a known curated dataset and a superior performance for some categories of noisy local businesses data. We also describe an active learning approach that is applicable to QuIET and can boost its performance.
Keywords :
learning (artificial intelligence); pattern classification; query processing; support vector machines; text analysis; QuIET technique; active learning approach; annotated documents; automatically generated span queries; noisy local businesses data; support vector machines; text binary classification; text categorization; text classification technique; Arrays; Business; Feature extraction; Measurement; Support vector machines; Text categorization; Training; automatically generated; human understandable; span queries; text categorization; text classification; text classifier; text tagging;
Conference_Titel :
Semantic Computing (ICSC), 2014 IEEE International Conference on
Conference_Location :
Newport Beach, CA
Print_ISBN :
978-1-4799-4002-8
DOI :
10.1109/ICSC.2014.18