• DocumentCode
    475322
  • Title

    Constructing term thesaurus using text association rule mining

  • Author

    Kongthon, Alisa ; Haruechaiyasak, Choochart ; Thaiprayoon, Santipong

  • Author_Institution
    Nat. Electron. & Comput. Technol. Center, Human Language Technol. (HLT) Lab., Pathumthani
  • Volume
    1
  • fYear
    2008
  • fDate
    14-17 May 2008
  • Firstpage
    137
  • Lastpage
    140
  • Abstract
    This paper presents a new algorithm called ldquoconcept-groupingrdquo that adapts an association rule mining technique to construct term thesaurus for data preprocessing purpose. Similar terms, which are written differently, can be grouped together into the same concept based on their associations before they are used for subsequent analysis. This data preprocessing is important since it has an impact on the quality of other data mining techniques such as data clustering. The algorithm is applied to bibliographic databases such as INSPEC and EI Compendex toward the objective of enhancing traditional bibliometrics and content analysis. From the experiments with a set of publication abstracts, applying the proposed algorithm to combine similar terms into a pertinent concept before clustering process yields better cluster quality.
  • Keywords
    bibliographic systems; data mining; pattern clustering; text analysis; thesauri; bibliographic databases; bibliometrics; concept-grouping algorithm; content analysis; data clustering; data mining; data preprocessing; term thesaurus; text association rule mining; Algorithm design and analysis; Association rules; Bibliometrics; Clustering algorithms; Data analysis; Data mining; Data preprocessing; Databases; Information retrieval; Thesauri;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, 2008. ECTI-CON 2008. 5th International Conference on
  • Conference_Location
    Krabi
  • Print_ISBN
    978-1-4244-2101-5
  • Electronic_ISBN
    978-1-4244-2102-2
  • Type

    conf

  • DOI
    10.1109/ECTICON.2008.4600391
  • Filename
    4600391