DocumentCode
1799948
Title
Mining YouTube metadata for detecting privacy invading harassment and misdemeanor videos
Author
Aggarwal, Nitish ; Agrawal, Sanjay ; Sureka, A.
Author_Institution
Indraprastha Inst. of Inf. Technol., New Delhi, India
fYear
2014
fDate
23-24 July 2014
Firstpage
84
Lastpage
93
Abstract
YouTube is one of the most popular and largest video sharing websites (with social networking features) on the Internet. A significant percentage of videos uploaded on YouTube contains objectionable content and violates YouTube community guidelines. YouTube contains several copyright violated videos, commercial spam, hate and extremism promoting videos, vulgar and pornographic material and privacy invading content. This is primarily due to the low publication barrier and anonymity. We present an approach to identify privacy invading harassment and misdemeanor videos by mining the video metadata. We divide the problem into sub-problems: vulgar video detection, abuse and violence in public places and ragging video detection in school and colleges. We conduct a characterization study on a training dataset by downloading several videos using YouTube API and manually annotating the dataset. We define several discriminatory features for recognizing the target class objects. We employ a one class classifier approach to detect the objectionable video and frame the problem as a recognition problem. Our empirical analysis on test dataset reveals that linguistic features (presence of certain terms and people in the title and description of the main and related videos), popularity based, duration and category of videos can be used to predict the video type. We validate our hypothesis by conducting a series of experiments on evaluation dataset acquired from YouTube. Empirical results reveal that accuracy of proposed approach is more than 80% demonstrating the effectiveness of the approach.
Keywords
Internet; application program interfaces; computational linguistics; copyright; data mining; data privacy; feature extraction; image classification; meta data; object detection; object recognition; social networking (online); Internet; YouTube API; YouTube metadata mining; abuse; classifier approach; commercial spam; copyright violated videos; dataset annotation; discriminatory features; extremism promoting videos; harassment; hate promoting videos; linguistic features; misdemeanor videos; objectionable video detection; popularity based video; pornographic material; privacy invading content; privacy invading harassment detection; ragging video detection; social networking features; target class object recognition; test dataset; training dataset; video category; video duration; video metadata mining; video sharing Websites; violence; vulgar material; vulgar video detection; Educational institutions; Feature extraction; Pragmatics; Privacy; Videos; YouTube; Information Retrieval; Mining User Generated Content; One Class Classifier; Privacy Invading Video Detection; Social Media Analytics; YouTube;
fLanguage
English
Publisher
ieee
Conference_Titel
Privacy, Security and Trust (PST), 2014 Twelfth Annual International Conference on
Conference_Location
Toronto, ON
Print_ISBN
978-1-4799-3502-4
Type
conf
DOI
10.1109/PST.2014.6890927
Filename
6890927
Link To Document