• DocumentCode
    3740501
  • Title

    Effective 20 Newsgroups Dataset Cleaning

  • Author

    Khaled Albishre;Mubarak Albathan;Yuefeng Li

  • Author_Institution
    Sci. &
  • Volume
    3
  • fYear
    2015
  • Firstpage
    98
  • Lastpage
    101
  • Abstract
    The rapid increase in the number of text documents available on the Internet has created pressure to use effective cleaning techniques. Cleaning techniques are needed for converting these documents to structured documents. Text cleaning techniques are one of the key mechanisms in typical text mining application frameworks. In this paper, we explore the role of text cleaning in the 20 newsgroups dataset, and report on experimental results.
  • Keywords
    "Cleaning","Text mining","Feature extraction","Electronic mail","Natural language processing","Noise measurement","Testing"
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2015.90
  • Filename
    7397431