• DocumentCode
    4579
  • Title

    An HMM-Based Algorithm for Content Ranking and Coherence-Feature Extraction

  • Author

    Chien-Liang Liu ; Wen-Hoar Hsaio ; Chia-Hoang Lee ; Hsiao-Cheng Chi

  • Author_Institution
    Dept. of Comput. Sci., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • Volume
    43
  • Issue
    2
  • fYear
    2013
  • fDate
    Mar-13
  • Firstpage
    440
  • Lastpage
    450
  • Abstract
    In this paper, we propose an algorithm called coherence hidden Markov model (HMM) to extract coherence features and rank content. Coherence HMM is a variant of HMM and is used to model the stochastic process of essay writing and identify topics as hidden states, given sequenced clauses as observations. This study uses probabilistic latent semantic analysis for parameter estimation of coherence HMM. In coherence-feature extraction, support vector regression (SVR) with surface features and coherence features is used for essay grading. The experimental results indicate that SVR can benefit from coherence features. The adjacent agreement rate and the exact agreement rate are 95.24% and 59.80%, respectively. Moreover, this study submits high-scoring essays to the same experiment and finds that the adjacent agreement rate and exact agreement rate are 98.33% and 64.50%, respectively. In content ranking, we design and implement an intelligent assisted blog writing system based on the coherence-HMM ranking model. Several corpora are employed to help users efficiently compose blog articles. When users finish composing a clause or sentence, the system provides candidate texts for their reference based on current clause or sentence content. The experimental results demonstrate that all participants can benefit from the system and save considerable time on writing articles.
  • Keywords
    Web sites; content management; feature extraction; hidden Markov models; parameter estimation; probability; regression analysis; support vector machines; HMM-based algorithm; SVR; adjacent agreement rate; article writing; coherence hidden Markov model; coherence-feature extraction; content ranking; essay grading; essay writing; exact agreement rate; high-scoring essays; intelligent assisted blog writing system; parameter estimation; probabilistic latent semantic analysis; sequenced clause; stochastic process; support vector regression; surface features; topic identification; Blogs; Coherence; Feature extraction; Hidden Markov models; Indexes; Parameter estimation; Writing; Coherence-feature extraction; hidden Markov model (HMM); input devices and strategies; natural language processing (NLP); predictive content;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics: Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2216
  • Type

    jour

  • DOI
    10.1109/TSMCA.2012.2207104
  • Filename
    6408207