• DocumentCode
    1799948
  • Title

    Mining YouTube metadata for detecting privacy invading harassment and misdemeanor videos

  • Author

    Aggarwal, Nitish ; Agrawal, Sanjay ; Sureka, A.

  • Author_Institution
    Indraprastha Inst. of Inf. Technol., New Delhi, India
  • fYear
    2014
  • fDate
    23-24 July 2014
  • Firstpage
    84
  • Lastpage
    93
  • Abstract
    YouTube is one of the most popular and largest video sharing websites (with social networking features) on the Internet. A significant percentage of videos uploaded on YouTube contains objectionable content and violates YouTube community guidelines. YouTube contains several copyright violated videos, commercial spam, hate and extremism promoting videos, vulgar and pornographic material and privacy invading content. This is primarily due to the low publication barrier and anonymity. We present an approach to identify privacy invading harassment and misdemeanor videos by mining the video metadata. We divide the problem into sub-problems: vulgar video detection, abuse and violence in public places and ragging video detection in school and colleges. We conduct a characterization study on a training dataset by downloading several videos using YouTube API and manually annotating the dataset. We define several discriminatory features for recognizing the target class objects. We employ a one class classifier approach to detect the objectionable video and frame the problem as a recognition problem. Our empirical analysis on test dataset reveals that linguistic features (presence of certain terms and people in the title and description of the main and related videos), popularity based, duration and category of videos can be used to predict the video type. We validate our hypothesis by conducting a series of experiments on evaluation dataset acquired from YouTube. Empirical results reveal that accuracy of proposed approach is more than 80% demonstrating the effectiveness of the approach.
  • Keywords
    Internet; application program interfaces; computational linguistics; copyright; data mining; data privacy; feature extraction; image classification; meta data; object detection; object recognition; social networking (online); Internet; YouTube API; YouTube metadata mining; abuse; classifier approach; commercial spam; copyright violated videos; dataset annotation; discriminatory features; extremism promoting videos; harassment; hate promoting videos; linguistic features; misdemeanor videos; objectionable video detection; popularity based video; pornographic material; privacy invading content; privacy invading harassment detection; ragging video detection; social networking features; target class object recognition; test dataset; training dataset; video category; video duration; video metadata mining; video sharing Websites; violence; vulgar material; vulgar video detection; Educational institutions; Feature extraction; Pragmatics; Privacy; Videos; YouTube; Information Retrieval; Mining User Generated Content; One Class Classifier; Privacy Invading Video Detection; Social Media Analytics; YouTube;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Privacy, Security and Trust (PST), 2014 Twelfth Annual International Conference on
  • Conference_Location
    Toronto, ON
  • Print_ISBN
    978-1-4799-3502-4
  • Type

    conf

  • DOI
    10.1109/PST.2014.6890927
  • Filename
    6890927