DocumentCode
1829043
Title
A New Approach to Detecting Content Anomalies in Wikipedia
Author
Sinanc, Duygu ; Yavanoglu, Uraz
Author_Institution
Dept. of Comput., Gazi Univ., Ankara, Turkey
Volume
2
fYear
2013
fDate
4-7 Dec. 2013
Firstpage
288
Lastpage
293
Abstract
The rapid growth of the web has caused to availability of data effective if its content is well organized. Despite the fact that Wikipedia is the biggest encyclopedia on the web, its quality is suspect due to its Open Editing Schemas (OES). In this study, zoology and botany pages are selected in English Wikipedia and their html contents are converted to text then Artificial Neural Network (ANN) is used for classification to prevent disinformation or misinformation. After the train phase, some irrelevant words added in the content about politics or terrorism in proportion to the size of the text. By the time unsuitable content is added in a page until the moderators´ intervention, the proposed system realized the error via wrong categorization. The results have shown that, when words number 2% of the content is added anomaly rate begins to cross the 50% border.
Keywords
Internet; Web sites; botany; data mining; hypermedia markup languages; neural nets; pattern classification; text analysis; text editing; zoology; ANN; English Wikipedia; HTML contents; OES; Web mining techniques; anomaly rate; artificial neural network; botany pages; content anomaly detection; encyclopedia; open editing schemas; politics; terrorism; text classification; train phase; wrong categorization; zoology pages; Artificial neural networks; Electronic publishing; Encyclopedias; Internet; Web pages; artificial neural networks; class mapping; data mining; open editing schemas; web classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2013 12th International Conference on
Conference_Location
Miami, FL
Type
conf
DOI
10.1109/ICMLA.2013.137
Filename
6786122
Link To Document