DocumentCode
3740501
Title
Effective 20 Newsgroups Dataset Cleaning
Author
Khaled Albishre;Mubarak Albathan;Yuefeng Li
Author_Institution
Sci. &
Volume
3
fYear
2015
Firstpage
98
Lastpage
101
Abstract
The rapid increase in the number of text documents available on the Internet has created pressure to use effective cleaning techniques. Cleaning techniques are needed for converting these documents to structured documents. Text cleaning techniques are one of the key mechanisms in typical text mining application frameworks. In this paper, we explore the role of text cleaning in the 20 newsgroups dataset, and report on experimental results.
Keywords
"Cleaning","Text mining","Feature extraction","Electronic mail","Natural language processing","Noise measurement","Testing"
Publisher
ieee
Conference_Titel
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on
Type
conf
DOI
10.1109/WI-IAT.2015.90
Filename
7397431
Link To Document