Title :
Parameterized Contrast in Second Order Soft Co-occurrences: A Novel Text Representation Technique in Text Mining and Knowledge Extraction
Author :
Razavi, Amir H. ; Matwin, Stan ; Inkpen, Diana ; Kouznetsov, Alexandre
Author_Institution :
Sch. of Inf. Technol. & Eng. (SITE), Univ. of Ottawa, Ottawa, ON, Canada
Abstract :
In this article, we present a novel statistical representation method for knowledge extraction from a corpus containing short texts. Then we introduce the contrast parameter which could be adjusted for targeting different conceptual levels in text mining and knowledge extraction. The method is based on second order co-occurrence vectors whose efficiency for representing meaning has been established in many applications, especially for representing word senses in different contexts and for disambiguation purposes. We evaluate our method on two tasks: classification of textual description of dreams, and classification of medical abstracts for systematic reviews.
Keywords :
data mining; statistical analysis; text analysis; vectors; knowledge extraction; parameterized contrast; second order co-occurrence vectors; second order soft co-occurrences; statistical representation method; text mining; text representation technique; Abstracts; Conferences; Data engineering; Data mining; Frequency; Information technology; Knowledge engineering; Machine learning; Social network services; Text mining;
Conference_Titel :
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-5384-9
Electronic_ISBN :
978-0-7695-3902-7
DOI :
10.1109/ICDMW.2009.49