• DocumentCode
    189203
  • Title

    Adaptive Distribution of Vocabulary Frequencies: A Novel Estimation Suitable for Social Media Corpus

  • Author

    Igawa, Rodrigo Augusto ; Sakaji Kido, Guilherme ; Seixas, Jose Luis ; Barbon, Sylvio

  • Author_Institution
    Dept. of Comput., State Univ. of Londrina, Londrina, Brazil
  • fYear
    2014
  • fDate
    18-22 Oct. 2014
  • Firstpage
    282
  • Lastpage
    287
  • Abstract
    This paper aims to propose a mathematical model that evaluates the distribution of the vocabulary frequency terms in proportion to a probabilistic ideal. Once we are able to evaluate it, the main objective of this work is to use it in order to examine text demising. We propose this new metric based on the classic Zipf´s law statistic method. The experimental set to test the classic Zipf´s law and our developed model is based on some books of the classic literature and some tweets sets of Twitter. Thus, our main result is that the model proposed in this work is more sensitive to the presence of text noises than Zipf´s law and is asymptotically quicker, suitable to corpus of social media networks.
  • Keywords
    mathematical analysis; social networking (online); text analysis; Twitter; Zipf law statistic method; adaptive distribution; mathematical model; social media corpus; social media networks; text demising; text noises; tweets sets; vocabulary frequency terms; Mathematical model; Media; Noise; Noise measurement; Noise reduction; Twitter; Vocabulary; Information Retrieval; Social Media Networks; Text preprocessing; Zipfs Law;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems (BRACIS), 2014 Brazilian Conference on
  • Conference_Location
    Sao Paulo
  • Type

    conf

  • DOI
    10.1109/BRACIS.2014.58
  • Filename
    6984844