• DocumentCode
    3071055
  • Title

    Clustering sentence level-text using fuzzy hierarchical algorithm

  • Author

    Priya, G. Krishna ; Anupriya, G.

  • Author_Institution
    Dr. Mahalingam Coll. of Eng. & Technol., Pollachi, India
  • fYear
    2013
  • fDate
    23-24 Aug. 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Clustering is a popular technique for unsupervised text analysis, often used to explore the content of large amounts of sentences. It is performed based on the similarity of sentences. Sentences may contain interrelated concepts and implementing flat clustering algorithms allows one sentence to be present only in one cluster. Also, sentences are semantically related to each other and so word co-occurrence is not a valid measure for sentence level flat clustering. So, WordNet based semantic similarity measure along with fuzzy sentence clustering algorithm is proposed. The existing system uses the Fuzzy C-Means algorithm where the cluster size should be specified as an input. Due to the rigorous convergence criteria, the time complexity is much larger. Most of the NLP documents are hierarchical in nature and so fuzzy hierarchical sentence clustering algorithm is used here. Labeling is performed for each cluster depending on the hierarchy formed and instead of considering all the terms in a sentence, only the verbs and nouns are considered for the similarity computation. Agglomerative clustering based on the verbs and divisive clustering based on nouns is proposed. This methodology is validated through various performance measures like Purity, Entropy and Time. Upon comparing the results for various datasets, it was observed that the overall improvement in purity is 36.6% and entropy is 31%. The time complexity of the hierarchical algorithm is very much less than the EM algorithm. Thus better quality clusters are formed in comparatively less time by using the Fuzzy Hierarchical Sentence Clustering Algorithm.
  • Keywords
    computational complexity; fuzzy set theory; natural language processing; pattern clustering; text analysis; NLP documents; WordNet based semantic similarity measure; fuzzy c-means algorithm; fuzzy hierarchical sentence clustering algorithm; natural language processing; sentence level-text clustering; time complexity; unsupervised text analysis; Algorithm design and analysis; Clustering algorithms; Convergence; Natural languages; Semantics; Speech; Time complexity; Agglomerative and Divisive Clustering; Fuzzy C-Means(FCM) Clustering; Natural Language Processing(NLP); WordNet Similarity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Human Computer Interactions (ICHCI), 2013 International Conference on
  • Conference_Location
    Chennai
  • Type

    conf

  • DOI
    10.1109/ICHCI-IEEE.2013.6887778
  • Filename
    6887778