Title :
A study on the statistical structure of words and of word digrams in a literary romanian corpus
Author :
Adriana Vlad;Adrian Mitrea;Stefan Ciuca;Adrian Luca
Author_Institution :
Faculty of Electronics, Telecommunications and Information Technology, POLITEHNICA University of Bucharest, Romania
fDate :
5/1/2011 12:00:00 AM
Abstract :
By resuming and extending an original method for verifying natural language stationarity, the paper presents a study on statistical structures of words and word diagrams (groups of two successive words) in printed Romanian. The paper also contains an evaluation of natural language redundancy based on word diagrams. The experimental study was carried out on a literary linguistic corpus of novels and short stories summing up over 12.5 million words with orthography and punctuation marks.
Keywords :
"Probability","Pragmatics","Natural languages","Books","Entropy","Redundancy","Distributed databases"
Conference_Titel :
Speech Technology and Human-Computer Dialogue (SpeD), 2011 6th Conference on
Print_ISBN :
978-1-4577-0440-6
DOI :
10.1109/SPED.2011.5940743