• DocumentCode
    2215270
  • Title

    Assessing documents´ credibility with genetic programming

  • Author

    Palotti, João ; Salles, Thiago ; Pappa, Gisele L. ; Gonçalves, Marcos A. ; Meira, Wagner, Jr.

  • Author_Institution
    Dept. of Comput. Sci., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
  • fYear
    2011
  • fDate
    5-8 June 2011
  • Firstpage
    200
  • Lastpage
    207
  • Abstract
    The concept of example credibility evaluates how much a classifier can trust an example when building a classification model. It is given by a credibility function, which is application dependent and estimated according to a series of factors that influence the credibility of the examples. Here we deal with automatic document classification and study the credibility of a document according to three factors: content, authorship and citations. We propose a genetic programming algorithm to estimate the credibility of training examples, and then add this estimation to a credibility-aware classifier. For that, we model the authorship and citation data as a complex network, and select a set of structural metrics that can be used to estimate credibility. These metrics are then merged with other content-related ones, and used as terminals for the GP. The GP was tested in a subset of the ACM-DL, and results showed that the credibility-aware classifier obtained results of micro and macroF1 from 5% to 8% better than the traditional classifiers.
  • Keywords
    citation analysis; document handling; genetic algorithms; pattern classification; authorship; automatic document classification; citations; classifier; document credibility assessing; genetic programming; structural metrics; Complex networks; Computer science; Feature extraction; Genetic programming; Measurement; Support vector machines; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation (CEC), 2011 IEEE Congress on
  • Conference_Location
    New Orleans, LA
  • ISSN
    Pending
  • Print_ISBN
    978-1-4244-7834-7
  • Type

    conf

  • DOI
    10.1109/CEC.2011.5949619
  • Filename
    5949619