DocumentCode
2369909
Title
On precision and recall of multi-attribute data extraction from semistructured sources
Author
Yang, Guizhen ; Mukherjee, Saikat ; Ramakrishnan, I.V.
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of Buffalo, NY, USA
fYear
2003
fDate
19-22 Nov. 2003
Firstpage
395
Lastpage
402
Abstract
Machine learning techniques for data extraction from semistructured sources exhibit different precision and recall characteristics. However to date the formal relationship between learning algorithms and their impact on these two metrics remains unexplored. We propose a formalization of precision and recall of extraction and investigates the complexity-theoretic aspects of learning algorithms for multiattribute data extraction based on this formalism. We show that there is a tradeoff between precision/recall of extraction and computational efficiency and present experimental results to demonstrate the practical utility of these concepts in designing scalable data extraction algorithms for improving recall without compromising on precision.
Keywords
Internet; computational complexity; data mining; learning (artificial intelligence); Internet; complexity-theoretic aspects; machine learning algorithms; multiattribute data extraction; semistructured sources; Animals; Computational efficiency; Computer science; Data engineering; Data mining; Hospitals; Labeling; Machine learning; Machine learning algorithms; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN
0-7695-1978-4
Type
conf
DOI
10.1109/ICDM.2003.1250945
Filename
1250945
Link To Document