Title :
Mining YouTube metadata for detecting privacy invading harassment and misdemeanor videos
Author :
Aggarwal, Nitish ; Agrawal, Sanjay ; Sureka, A.
Author_Institution :
Indraprastha Inst. of Inf. Technol., New Delhi, India
Abstract :
YouTube is one of the most popular and largest video sharing websites (with social networking features) on the Internet. A significant percentage of videos uploaded on YouTube contains objectionable content and violates YouTube community guidelines. YouTube contains several copyright violated videos, commercial spam, hate and extremism promoting videos, vulgar and pornographic material and privacy invading content. This is primarily due to the low publication barrier and anonymity. We present an approach to identify privacy invading harassment and misdemeanor videos by mining the video metadata. We divide the problem into sub-problems: vulgar video detection, abuse and violence in public places and ragging video detection in school and colleges. We conduct a characterization study on a training dataset by downloading several videos using YouTube API and manually annotating the dataset. We define several discriminatory features for recognizing the target class objects. We employ a one class classifier approach to detect the objectionable video and frame the problem as a recognition problem. Our empirical analysis on test dataset reveals that linguistic features (presence of certain terms and people in the title and description of the main and related videos), popularity based, duration and category of videos can be used to predict the video type. We validate our hypothesis by conducting a series of experiments on evaluation dataset acquired from YouTube. Empirical results reveal that accuracy of proposed approach is more than 80% demonstrating the effectiveness of the approach.
Keywords :
Internet; application program interfaces; computational linguistics; copyright; data mining; data privacy; feature extraction; image classification; meta data; object detection; object recognition; social networking (online); Internet; YouTube API; YouTube metadata mining; abuse; classifier approach; commercial spam; copyright violated videos; dataset annotation; discriminatory features; extremism promoting videos; harassment; hate promoting videos; linguistic features; misdemeanor videos; objectionable video detection; popularity based video; pornographic material; privacy invading content; privacy invading harassment detection; ragging video detection; social networking features; target class object recognition; test dataset; training dataset; video category; video duration; video metadata mining; video sharing Websites; violence; vulgar material; vulgar video detection; Educational institutions; Feature extraction; Pragmatics; Privacy; Videos; YouTube; Information Retrieval; Mining User Generated Content; One Class Classifier; Privacy Invading Video Detection; Social Media Analytics; YouTube;
Conference_Titel :
Privacy, Security and Trust (PST), 2014 Twelfth Annual International Conference on
Conference_Location :
Toronto, ON
Print_ISBN :
978-1-4799-3502-4
DOI :
10.1109/PST.2014.6890927