• DocumentCode
    130348
  • Title

    Extracting semantic prototypes and factual information from a large scale corpus using variable size window topic modelling

  • Author

    Korzycki, Michal ; Korczynski, Wojciech

  • Author_Institution
    AGH Univ. of Sci. & Technol. in Krakow, Kraków, Poland
  • fYear
    2014
  • fDate
    7-10 Sept. 2014
  • Firstpage
    261
  • Lastpage
    268
  • Abstract
    In this paper a model of textual events composed of a mixture of semantic stereotypes and factual information is proposed. A method is introduced that enables distinguishing automatically semantic prototypes of a general nature describing general categories of events from factual elements specific to a given event. Next, this paper presents the results of an experiment of unsupervised topic extraction performed on documents from a large-scale corpus with an additional temporal structure. This experiment was realized as a comparison of the nature of information provided by Latent Dirichlet Allocation and Vector Space modelling based on Log-Entropy weights. The impact of using different time windows of the corpus on the results of topic modelling is presented. Finally, a discussion is suggested on the issue if unsupervised topic modelling may reflect deeper semantic information, such as elements describing a given event or its causes and results, and discern it from pure factual data.
  • Keywords
    entropy; information retrieval; text analysis; vectors; factual elements; factual information extraction; large scale corpus; latent Dirichlet allocation; log-entropy weights; semantic prototype extraction; semantic prototypes; semantic stereotypes; temporal structure; textual events; time windows; unsupervised topic extraction; unsupervised topic modelling; variable size window topic modelling; vector space modelling; Accidents; Analytical models; Inductors; Prototypes; Semantics; Underwater vehicles;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on
  • Conference_Location
    Warsaw
  • Type

    conf

  • DOI
    10.15439/2014F253
  • Filename
    6933023