• DocumentCode
    2192199
  • Title

    Semantic Content Filtering with Wikipedia and Ontologies

  • Author

    Malo, Pekka ; Siitari, Pyry ; Ahlgren, Oskar ; Wallenius, Jyrki ; Korhonen, Pekka

  • Author_Institution
    Sch. of Econ., Dept. of Bus. Technol., Aalto Univ., Helsinki, Finland
  • fYear
    2010
  • fDate
    13-13 Dec. 2010
  • Firstpage
    518
  • Lastpage
    526
  • Abstract
    The use of domain knowledge is generally found to improve query efficiency in content filtering applications. In particular, tangible benefits have been achieved when using knowledge-based approaches within more specialized fields, such as medical free texts or legal documents. However, the problem is that sources of domain knowledge are time consuming to build and equally costly to maintain. As a potential remedy, recent studies on Wikipedia suggest that this large body of socially constructed knowledge can be effectively harnessed to provide not only facts but also accurate information about semantic concept-similarities. This paper describes a framework for document filtering, where Wikipedia´s concept relatedness information is combined with a domain ontology to produce semantic content classifiers. The approach is evaluated using Reuters RCV1 corpus and TREC-11 filtering task definitions. In a comparative study, the approach shows robust performance and appears to outperform content classifiers based on Support Vector Machines (SVM) and C4.5 algorithm.
  • Keywords
    Web sites; information filtering; knowledge based systems; ontologies (artificial intelligence); support vector machines; C4.5 algorithm; SVM; TREC-11 filtering task definitions; Wikipedia concept- relatedness information; document filtering; domain knowledge; domain ontology; knowledge-based approaches; legal documents; medical free texts; query efficiency improvement; reuters RCV1 corpus; semantic concept-similarities; semantic content classifiers; semantic content filtering; support vector machines; Concept-relatedness; Named-entity recognition; Ontology; SVM; Semantic; Wikipedia;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-1-4244-9244-2
  • Electronic_ISBN
    978-0-7695-4257-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2010.74
  • Filename
    5693341