DocumentCode :
3490413
Title :
Culturomics on a Bengali Newspaper Corpus
Author :
Phani, S. ; Lahiri, S. ; Biswas, Arijit
Author_Institution :
Dept. of IT, BESU, Howrah, India
fYear :
2012
fDate :
13-15 Nov. 2012
Firstpage :
237
Lastpage :
240
Abstract :
We introduce culturomic studies on a leading Bengali newspaper corpus - Ananda Bazar Patrika, in the same spirit as [15]. Based on 11 years´ worth of Bengali newswire text, we are able to extract trajectories of salient words that are of importance in contemporary West Bengal. To the best of our knowledge, this is the first time a culturomic trend analysis is being performed on an Indic language. As a result of our analysis, we obtain interesting insights into word usage and cultural shift in contemporary West Bengal. Moreover, we model culturomic trajectories using ARIMA and obtain word usage predictions that closely follow actual usage patterns.
Keywords :
autoregressive moving average processes; cultural aspects; humanities; natural language processing; publishing; text analysis; word processing; ARIMA process; Ananda Bazar Patrika; Bengali newspaper corpus; Bengali newswire text; Indic language; West Bengal; cultural shift; culturomic trajectory model; culturomic trend analysis; salient word trajectory extraction; word usage predictions; Google; Market research; Nominations and elections; Predictive models; Smoothing methods; Time series analysis; Trajectory; ARIMA; Ananda Bazar Patrika; Bengali; Indic language; culture shift; culturomics; time series; trend analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2012 International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4673-6113-2
Electronic_ISBN :
978-0-7695-4886-9
Type :
conf
DOI :
10.1109/IALP.2012.68
Filename :
6473740
Link To Document :
بازگشت