• DocumentCode
    174892
  • Title

    On the Use of Reliable-Negatives Selection Strategies in the PU Learning Approach for Quality Flaws Prediction in Wikipedia

  • Author

    Ferretti, Edgardo ; Errecalde, Marcelo L. ; Anderka, Maik ; Stein, Bernardo

  • Author_Institution
    Dept. de Inf., Univ. Nac. de San Luis, San Luis, Argentina
  • fYear
    2014
  • fDate
    1-5 Sept. 2014
  • Firstpage
    211
  • Lastpage
    215
  • Abstract
    Learning from positive and unlabeled examples (PU learning) has proven to be an effective method in several Web mining applications. In particular, in the 1st International Competition on Quality Flaw Prediction in Wikipedia in 2012, a tailored PU learning approach performed best amongst the competitors. A key feature of that approach is the introduction of sampling strategies within the original PU learning procedure. The paper in hand revisits the winner approach of 2012 and elaborates on neglected aspects in order to provide evidence for the usefulness of sampling in PU learning. In this regard, we propose a modification to this PU learning approach, and we show how the different sampling strategies affect the flaw prediction effectiveness. Our analysis is based on the original evaluation corpus of the 2012-competition on quality flaw prediction. A main outcome is that under the best sampling strategy, our new modified version of PU learning increases in average the flaw prediction effectiveness by 18.31%, when compared against the winning approach of the competition.
  • Keywords
    Web sites; learning (artificial intelligence); sampling methods; PU learning approach; Web mining applications; Wikipedia; flaw prediction effectiveness; positive and unlabeled examples; quality flaws prediction; reliable-negative selection strategies; sampling strategies; Electronic publishing; Encyclopedias; Internet; Reliability; Support vector machines; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on
  • Conference_Location
    Munich
  • ISSN
    1529-4188
  • Print_ISBN
    978-1-4799-5721-7
  • Type

    conf

  • DOI
    10.1109/DEXA.2014.52
  • Filename
    6974851