• DocumentCode
    3641896
  • Title

    A study on the statistical structure of words and of word digrams in a literary romanian corpus

  • Author

    Adriana Vlad;Adrian Mitrea;Stefan Ciuca;Adrian Luca

  • Author_Institution
    Faculty of Electronics, Telecommunications and Information Technology, POLITEHNICA University of Bucharest, Romania
  • fYear
    2011
  • fDate
    5/1/2011 12:00:00 AM
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    By resuming and extending an original method for verifying natural language stationarity, the paper presents a study on statistical structures of words and word diagrams (groups of two successive words) in printed Romanian. The paper also contains an evaluation of natural language redundancy based on word diagrams. The experimental study was carried out on a literary linguistic corpus of novels and short stories summing up over 12.5 million words with orthography and punctuation marks.
  • Keywords
    "Probability","Pragmatics","Natural languages","Books","Entropy","Redundancy","Distributed databases"
  • Publisher
    ieee
  • Conference_Titel
    Speech Technology and Human-Computer Dialogue (SpeD), 2011 6th Conference on
  • Print_ISBN
    978-1-4577-0440-6
  • Type

    conf

  • DOI
    10.1109/SPED.2011.5940743
  • Filename
    5940743