DocumentCode :
3259087
Title :
Unsupervised Learning of Tree Alignment Models for Information Extraction
Author :
Zigoris, Philip ; Eads, Damian ; Zhang, Yi
Author_Institution :
Dept. of Comput. Sci., California Univ., Santa Cruz, CA
fYear :
2006
fDate :
Dec. 2006
Firstpage :
45
Lastpage :
49
Abstract :
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table - a data structure that better lends itself to high-level data mining and information exploitation. Our algorithm effectively combines tree and string alignment algorithms, as well as domain-specific feature extraction to match semantically related data across search results. The applications of our approach are vast and include hidden Web crawling, semantic tagging, and federated search. We build on earlier research on the use of tree alignment for information extraction. In contrast to previous approaches that rely on hand tuned parameters, our algorithm makes use of a variant of support vector machines (SVMs) to learn a parameterized, site-independent tree alignment model. This model can then be used to deduce common structural and textual elements of a set of HTML parse trees. We report some preliminary results of our system´s performance on data from Web sites with a variety of different layouts
Keywords :
Web sites; data mining; feature extraction; information retrieval; support vector machines; tree data structures; unsupervised learning; HTML parse trees; HTML search results; Web sites; data structure; database table; domain-specific feature extraction; federated search; hidden Web crawling; high-level data mining; information exploitation; information extraction; semantic tagging; string alignment algorithms; structural elements; support vector machines; textual elements; tree alignment models; unsupervised learning; Data mining; Data structures; Feature extraction; HTML; Metasearch; Spatial databases; Support vector machines; System performance; Tagging; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2702-7
Type :
conf
DOI :
10.1109/ICDMW.2006.166
Filename :
4063596
Link To Document :
بازگشت