DocumentCode
130348
Title
Extracting semantic prototypes and factual information from a large scale corpus using variable size window topic modelling
Author
Korzycki, Michal ; Korczynski, Wojciech
Author_Institution
AGH Univ. of Sci. & Technol. in Krakow, Kraków, Poland
fYear
2014
fDate
7-10 Sept. 2014
Firstpage
261
Lastpage
268
Abstract
In this paper a model of textual events composed of a mixture of semantic stereotypes and factual information is proposed. A method is introduced that enables distinguishing automatically semantic prototypes of a general nature describing general categories of events from factual elements specific to a given event. Next, this paper presents the results of an experiment of unsupervised topic extraction performed on documents from a large-scale corpus with an additional temporal structure. This experiment was realized as a comparison of the nature of information provided by Latent Dirichlet Allocation and Vector Space modelling based on Log-Entropy weights. The impact of using different time windows of the corpus on the results of topic modelling is presented. Finally, a discussion is suggested on the issue if unsupervised topic modelling may reflect deeper semantic information, such as elements describing a given event or its causes and results, and discern it from pure factual data.
Keywords
entropy; information retrieval; text analysis; vectors; factual elements; factual information extraction; large scale corpus; latent Dirichlet allocation; log-entropy weights; semantic prototype extraction; semantic prototypes; semantic stereotypes; temporal structure; textual events; time windows; unsupervised topic extraction; unsupervised topic modelling; variable size window topic modelling; vector space modelling; Accidents; Analytical models; Inductors; Prototypes; Semantics; Underwater vehicles;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on
Conference_Location
Warsaw
Type
conf
DOI
10.15439/2014F253
Filename
6933023
Link To Document