DocumentCode :
2294988
Title :
Automated Data Augmentation Services Using Text Mining, Data Cleansing and Web Crawling Techniques
Author :
Jacob, Matthias ; Kuscher, Alexander ; Plauth, Max ; Thiele, Christoph
Author_Institution :
Hasso Plattner Inst. of Software Syst. Eng., Potsdam
fYear :
2008
fDate :
6-11 July 2008
Firstpage :
136
Lastpage :
143
Abstract :
There is a large amount of information about celebrities spread all over the Web hidden inside innumerable news and blogs, pictures on Flickr or videos on YouTube. Having this information combined and aggregated would be of great benefit to many customers. In this document we will describe the architecture and the (business) value of a system that not only collates information pre-formatted by other Web services but also provides a self-developed named entity recognition algorithm for extracting the names of celebrities from different data sources and then processes and enriches them by our mash-up application.
Keywords :
Web services; data mining; software architecture; Web crawling techniques; Web services; automated data augmentation services; blogs; data cleansing; text mining; Blogs; Data mining; Feeds; Internet; Publishing; TV; Text mining; Videos; Web services; YouTube; Named Entity Recognition; REST; celebrity; data cleansing; mash-up; text mining; vipster; web 2.0; web crawling techniques; web service;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Services - Part I, 2008. IEEE Congress on
Conference_Location :
Honolulu, HI
Print_ISBN :
978-0-7695-3286-8
Type :
conf
DOI :
10.1109/SERVICES-1.2008.67
Filename :
4578316
Link To Document :
بازگشت