DocumentCode :
3641896
Title :
A study on the statistical structure of words and of word digrams in a literary romanian corpus
Author :
Adriana Vlad;Adrian Mitrea;Stefan Ciuca;Adrian Luca
Author_Institution :
Faculty of Electronics, Telecommunications and Information Technology, POLITEHNICA University of Bucharest, Romania
fYear :
2011
fDate :
5/1/2011 12:00:00 AM
Firstpage :
1
Lastpage :
8
Abstract :
By resuming and extending an original method for verifying natural language stationarity, the paper presents a study on statistical structures of words and word diagrams (groups of two successive words) in printed Romanian. The paper also contains an evaluation of natural language redundancy based on word diagrams. The experimental study was carried out on a literary linguistic corpus of novels and short stories summing up over 12.5 million words with orthography and punctuation marks.
Keywords :
"Probability","Pragmatics","Natural languages","Books","Entropy","Redundancy","Distributed databases"
Publisher :
ieee
Conference_Titel :
Speech Technology and Human-Computer Dialogue (SpeD), 2011 6th Conference on
Print_ISBN :
978-1-4577-0440-6
Type :
conf
DOI :
10.1109/SPED.2011.5940743
Filename :
5940743
Link To Document :
بازگشت