• DocumentCode
    2831022
  • Title

    Automatic Labeling of Topics

  • Author

    Magatti, Davide ; Calegari, Silvia ; Ciucci, Davide ; Stella, Fabio

  • Author_Institution
    Dept. of Inf., Syst. & Commun., Univ. degli Studi di Milano-Bicocca, Milan, Italy
  • fYear
    2009
  • fDate
    Nov. 30 2009-Dec. 2 2009
  • Firstpage
    1227
  • Lastpage
    1232
  • Abstract
    An algorithm for the automatic labeling of topics accordingly to a hierarchy is presented. Its main ingredients are a set of similarity measures and a set of topic labeling rules. The labeling rules are specifically designed to find the most agreed labels between the given topic and the hierarchy. The hierarchy is obtained from the Google Directory service, extracted via an ad-hoc developed software procedure and expanded through the use of the OpenOffice English Thesaurus. The performance of the proposed algorithm is investigated by using a document corpus consisting of 33,801 documents and a dictionary consisting of 111,795 words. The results are encouraging, while particularly interesting and significant labeling cases emerged.
  • Keywords
    dictionaries; information analysis; thesauri; Google directory service; OpenOffice English Thesaurus; automatic labeling; dictionary; document corpus; similarity measures; topic labeling rules; Clustering algorithms; Data mining; Dictionaries; Informatics; Intelligent systems; Labeling; Ontologies; Probability distribution; Sampling methods; Thesauri; Automatic Topic Labeling; Latent Dirichlet Allocation; Topics Tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications, 2009. ISDA '09. Ninth International Conference on
  • Conference_Location
    Pisa
  • Print_ISBN
    978-1-4244-4735-0
  • Electronic_ISBN
    978-0-7695-3872-3
  • Type

    conf

  • DOI
    10.1109/ISDA.2009.165
  • Filename
    5364126