• DocumentCode
    686514
  • Title

    Auto-clustering of conversation corpus based on syntactic, semantic and pragmatic features

  • Author

    Baojian Chen ; Minghu Jiang

  • Author_Institution
    Sch. of Humanities, Tsinghua Univ., Beijing, China
  • fYear
    2013
  • fDate
    22-258 Nov. 2013
  • Firstpage
    295
  • Lastpage
    300
  • Abstract
    To understand natural language accurately, we not only need to do natural language morphology and syntactic analysis, but also need to combine semantic knowledge and pragmatic information with a specific context. Due to short knowledge and lack in background information of conversation corpus which related to the pragmatic, there is a long way to go for computer fully understand natural language. In this paper, the pragmatic features were added to the text vector space model of language spoken conversation, and hierarchical clustering is executed. Our experimental results show that the clustering effect with pragmatic features outperforms than non-pragmatic features, and precision, recall rate and F values of the former were increased by 6.67%, 6.34% and 6.6%, respectively. It indicates that pragmatic information has played an important role in enhancing the effect of the text clustering.
  • Keywords
    natural language processing; pattern clustering; programming language semantics; text analysis; conversation corpus auto-clustering; hierarchical clustering; language spoken conversation; natural language morphology; pragmatic feature; pragmatic information; semantic feature; semantic knowledge; syntactic analysis; syntactic feature; text clustering effect enhancement; text vector space model; hierarchical clustering; pragmatic features; text vector space mode;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Wireless, Mobile and Multimedia Networks (ICWMMN 2013), 5th IET International Conference on
  • Conference_Location
    Beijing
  • Electronic_ISBN
    978-1-84919-726-7
  • Type

    conf

  • DOI
    10.1049/cp.2013.2428
  • Filename
    6827845