DocumentCode :
3427702
Title :
HTML Pattern Generator--Automatic Data Extraction from Web Pages
Author :
Cosulschi, Mirel ; Giurca, Adrian ; Udrescu, Bogdan ; Constantinescu, Nicolae ; Gabroveanu, Mihai
Author_Institution :
Dept. of Comput. Sci., Craiova Univ.
fYear :
2006
fDate :
Sept. 2006
Firstpage :
75
Lastpage :
78
Abstract :
Existing methods of information extraction from HTML documents include manual approach, supervised learning and automatic techniques. The manual method has high precision and recall values but it is difficult to apply it for large number of pages. Supervised learning involves human interaction to create positive and negative samples. Automatic techniques benefit from less human effort but they are not highly reliable regarding the information retrieved
Keywords :
Web sites; hypermedia markup languages; information retrieval; knowledge acquisition; learning (artificial intelligence); HTML documents; HTML pattern generator; Web pages; automatic data extraction; information extraction; information retrieval; supervised learning; Computer science; Costs; Data mining; Databases; HTML; Humans; Internet; Manuals; Supervised learning; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Symbolic and Numeric Algorithms for Scientific Computing, 2006. SYNASC '06. Eighth International Symposium on
Conference_Location :
Timisoara
Print_ISBN :
0-7695-2740-X
Type :
conf
DOI :
10.1109/SYNASC.2006.43
Filename :
4090300
Link To Document :
بازگشت