Title :
Detecting and Describing Historical Periods in a Large Corpora
Author :
Popa, Tiberiu ; Rebedea, Traian ; Chiru, Costin
Author_Institution :
Fac. of Autom. Control & Comput., Univ. Politeh. of Bucharest, Bucharest, Romania
Abstract :
Many historic periods (or events) are remembered by slogans, expressions or words that are strongly linked to them. Educated people are also able to determine whether a particular word or expression is related to a specific period in human history. The present paper aims to establish correlations between significant historic periods (or events) and the texts written in that period. In order to achieve this, we have developed a system that automatically links words (and topics discovered using Latent Dirichlet Allocation) to periods of time in the recent history. For this analysis to be relevant and conclusive, it must be undertaken on a representative set of texts written throughout history. To this end, instead of relying on manually selected texts, the Google Books Ngram corpus has been chosen as a basis for the analysis. Although it provides only word n-gram statistics for the texts written in a given year, the resulting time series can be used to provide insights about the most important periods and events in recent history, by automatically linking them with specific keywords or even LDA topics.
Keywords :
history; statistical analysis; text analysis; time series; Google Books Ngram corpus; LDA topics; historical period detection; latent Dirichlet allocation; specific keywords; time series; word n-gram statistics; Analytical models; Books; Equations; Google; History; Mathematical model; Time series analysis; Historical Events Identification; Historical Periods Summarization; Latent Dirichlet Allocation; Time Series Analysis; Topic Models;
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2014 IEEE 26th International Conference on
Conference_Location :
Limassol
DOI :
10.1109/ICTAI.2014.118