مرکز منطقه ای اطلاع رساني علوم و فناوري - Automated Data Augmentation Services Using Text Mining, Data Cleansing and Web Crawling Techniques

DocumentCode :

2294988

Title :

Automated Data Augmentation Services Using Text Mining, Data Cleansing and Web Crawling Techniques

Author :

Jacob, Matthias ; Kuscher, Alexander ; Plauth, Max ; Thiele, Christoph

Author_Institution :

Hasso Plattner Inst. of Software Syst. Eng., Potsdam

fYear :

2008

fDate :

6-11 July 2008

Firstpage :

136

Lastpage :

143

Abstract :

There is a large amount of information about celebrities spread all over the Web hidden inside innumerable news and blogs, pictures on Flickr or videos on YouTube. Having this information combined and aggregated would be of great benefit to many customers. In this document we will describe the architecture and the (business) value of a system that not only collates information pre-formatted by other Web services but also provides a self-developed named entity recognition algorithm for extracting the names of celebrities from different data sources and then processes and enriches them by our mash-up application.

Keywords :

Web services; data mining; software architecture; Web crawling techniques; Web services; automated data augmentation services; blogs; data cleansing; text mining; Blogs; Data mining; Feeds; Internet; Publishing; TV; Text mining; Videos; Web services; YouTube; Named Entity Recognition; REST; celebrity; data cleansing; mash-up; text mining; vipster; web 2.0; web crawling techniques; web service;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Services - Part I, 2008. IEEE Congress on

Conference_Location :

Honolulu, HI

Print_ISBN :

978-0-7695-3286-8

Type :

conf

DOI :

10.1109/SERVICES-1.2008.67

Filename :

4578316

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2294988