DocumentCode :
1785275
Title :
InfoSuggest: A System for Automated Information Gathering: With a Real-World Case Study
Author :
Kate, Kiran ; Prapanca, Andy ; Kalagnanam, Jayant
Author_Institution :
IBM Res. Collaboratory, Singapore, Singapore
fYear :
2014
fDate :
23-25 April 2014
Firstpage :
203
Lastpage :
212
Abstract :
Departments of many organizations treat the World Wide Web as an important information source. They have a need to keep themselves up-to-date with current information in their domain. Such information gathering is a time consuming process due to overload of available information and there are dedicated teams in many organizations for this task. In this paper, we present Info Suggest, a system for end-to-end information gathering from the web. Info Suggest improves efficiency of such focused information gathering process with the use of machine learning. We employ a semi-supervised document classification method called Transductive Support Vector Machines (TSVMs) for learning user preferences based on example articles provided by them. We also devise a strategy for unlabeled data selection TSVM-Meta that is applicable for an information gathering setting. In the paper, we discuss the system architecture and also present a case study for information gathering for food safety in an environmental health department of a government agency. We conduct experiments and demonstrate that our system results in improving the efficiency by as much as 35% by making it easier to find relevant content.
Keywords :
Internet; Web sites; information retrieval systems; learning (artificial intelligence); support vector machines; InfoSuggest; TSVM-Meta; World Wide Web; automated information gathering system; end-to-end information gathering process; environmental health department; food safety; government agency; information source; machine learning; semisupervised document classification method; transductive support vector machines; unlabeled data selection; user preference learning; Crawlers; Food technology; Labeling; Safety; Search engines; Support vector machines; Training; Document classification; Information gathering; Semi-supervised learning; TSVMs; Transductive support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Global Conference (SRII), 2014 Annual SRII
Conference_Location :
San Jose, CA
Type :
conf
DOI :
10.1109/SRII.2014.36
Filename :
6879683
Link To Document :
بازگشت