Title :
Culturomics on a Bengali Newspaper Corpus
Author :
Phani, S. ; Lahiri, S. ; Biswas, Arijit
Author_Institution :
Dept. of IT, BESU, Howrah, India
Abstract :
We introduce culturomic studies on a leading Bengali newspaper corpus - Ananda Bazar Patrika, in the same spirit as [15]. Based on 11 years´ worth of Bengali newswire text, we are able to extract trajectories of salient words that are of importance in contemporary West Bengal. To the best of our knowledge, this is the first time a culturomic trend analysis is being performed on an Indic language. As a result of our analysis, we obtain interesting insights into word usage and cultural shift in contemporary West Bengal. Moreover, we model culturomic trajectories using ARIMA and obtain word usage predictions that closely follow actual usage patterns.
Keywords :
autoregressive moving average processes; cultural aspects; humanities; natural language processing; publishing; text analysis; word processing; ARIMA process; Ananda Bazar Patrika; Bengali newspaper corpus; Bengali newswire text; Indic language; West Bengal; cultural shift; culturomic trajectory model; culturomic trend analysis; salient word trajectory extraction; word usage predictions; Google; Market research; Nominations and elections; Predictive models; Smoothing methods; Time series analysis; Trajectory; ARIMA; Ananda Bazar Patrika; Bengali; Indic language; culture shift; culturomics; time series; trend analysis;
Conference_Titel :
Asian Language Processing (IALP), 2012 International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4673-6113-2
Electronic_ISBN :
978-0-7695-4886-9
DOI :
10.1109/IALP.2012.68