DocumentCode :
2024133
Title :
Text Mining-Supported Information Extraction: An Extended Methodology for Developing Information Extraction Systems
Author :
Feilmayr, Christina
Author_Institution :
Inst. of Applic. Oriented Knowledge Process. (FAW), Johannes Kepler Univ., Linz, Austria
fYear :
2011
fDate :
Aug. 29 2011-Sept. 2 2011
Firstpage :
217
Lastpage :
221
Abstract :
Information extraction (IE) and knowledge discovery in databases (KDD) are both useful approaches for discovering information in textual corpora, but they have some deficiencies. Information extraction can identify relevant sub-sequences of text, but is usually unaware of emerging, previously unknown knowledge and regularities in a text and thus cannot form new facts or new hypotheses. Complementary to information extraction, emerging data mining methods and techniques promise to overcome the deficiencies of information extraction. This research work combines the benefits of both approaches by integrating data mining and information extraction methods. The aim is to provide a new high-quality information extraction methodology and, at the same time, to improve the performance of the underlying extraction system. Consequently, the new methodology should shorten the life cycle of information extraction engineering because information predicted in early extraction phases can be used in further extraction steps, and the extraction rules developed require fewer arduous test-and-debug iterations. Effectiveness and applicability are validated by processing online documents from the areas of eHealth and eRecruitment.
Keywords :
data mining; information retrieval; information retrieval systems; text analysis; data mining methods; eHealth; eRecruitment; extraction rules; extraction steps; information extraction engineering; information extraction systems; knowledge discovery-in-databases; online document processing; performance improvement; test-and-debug iterations; text mining-supported information extraction; Data models; Feature extraction; Learning systems; Machine learning; Semantics; Text mining; (Web-) Information Extraction; Data Mining; Information Extraction Methodology; Machine Learning Algorithms; Text Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2011 22nd International Workshop on
Conference_Location :
Toulouse
ISSN :
1529-4188
Print_ISBN :
978-1-4577-0982-1
Type :
conf
DOI :
10.1109/DEXA.2011.79
Filename :
6059820
Link To Document :
بازگشت