DocumentCode
2335943
Title
Mining the Web with active hidden Markov models
Author
Scheffer, Tobias ; Decomain, Christian ; Wrobel, Stefan
Author_Institution
Univ. of Magdeburg, Germany
fYear
2001
fDate
2001
Firstpage
645
Lastpage
646
Abstract
Given the enormous amounts of information available only in unstructured or semi-structured textual documents, tools for information extraction (IE) have become enormously important. IE tools identify the relevant information in such documents and convert it into a structured format such as a database or an XML document. While first IE algorithms were hand-crafted sets of rules, researchers soon turned to learning extraction rules from hand-labeled documents. Unfortunately, rule-based approaches sometimes fail to provide the necessary robustness against the inherent variability of document, structure, which has led to the recent interest in using hidden Markov models (HMMs). By using additional unlabeled documents as they are usually readily available in most applications, we can perform active learning of HMMs. The idea of active learning algorithms is to identify unlabeled observations that would be most useful when labeled by the user. Such algorithms are known for classification, clustering, and regression; we present the first algorithm for active learning of hidden Markov models
Keywords
data mining; hidden Markov models; information resources; information retrieval; learning (artificial intelligence); Web mining; active hidden Markov models; active learning; information extraction; semi-structured textual documents; unlabeled documents; unstructured textual documents; Clustering algorithms; Data mining; Databases; Hidden Markov models; Probability; Robustness; Sequences; Speech recognition; Tin; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location
San Jose, CA
Print_ISBN
0-7695-1119-8
Type
conf
DOI
10.1109/ICDM.2001.989591
Filename
989591
Link To Document