• DocumentCode
    2907783
  • Title

    Fuzzy named entity-based document clustering

  • Author

    Cao, Tru H. ; Do, Hai T. ; Hong, Dung T. ; Quan, Tho T.

  • Author_Institution
    Fac. of Comput. Sci. & Eng., Ho Chi Minh City Univ. of Technol., Ho Chi Minh City
  • fYear
    2008
  • fDate
    1-6 June 2008
  • Firstpage
    2028
  • Lastpage
    2034
  • Abstract
    Traditional keyword-based document clustering techniques have limitations due to simple treatment of words and hard separation of clusters. In this paper, we introduce named entities as objectives into fuzzy document clustering, which are the key elements defining document semantics and in many cases are of user concerns. First, the traditional keyword-based vector space model is adapted with vectors defined over spaces of entity names, types, name-type pairs, and identifiers, instead of keywords. Then, hierarchical fuzzy document clustering can be performed using a similarity measure of the vectors representing documents. For evaluating fuzzy clustering quality, we propose a fuzzy information variation measure to compare two fuzzy partitions. Experimental results are presented and discussed.
  • Keywords
    document handling; fuzzy set theory; pattern clustering; vectors; document semantics; entity-based document clustering; fuzzy document clustering; fuzzy information variation; keyword-based document clustering technique; keyword-based vector space model; Fuzzy systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems, 2008. FUZZ-IEEE 2008. (IEEE World Congress on Computational Intelligence). IEEE International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1098-7584
  • Print_ISBN
    978-1-4244-1818-3
  • Electronic_ISBN
    1098-7584
  • Type

    conf

  • DOI
    10.1109/FUZZY.2008.4630648
  • Filename
    4630648