Title :
A basic study on attribute name extraction from the web
Author :
Nakane, Fumitaka ; Otsubo, Masanori ; Hijikata, Yoshinori ; Nishida, Shogo
Author_Institution :
Grad. Sch. of Eng. Sci., Osaka Univ., Toyonaka
Abstract :
A large number of semistructured documents exist on the Web. We can find pages that contain keywords by using a search engine. But when we want to obtain information about an object like a notebook computer with 1 GB memory, a method is needed that automatically extracts attribute name (in this example, ldquomemoryrdquo) and attribute value (in this example, ldquo1 GBrdquo). In the past, many researchers examined extracting attribute values corresponding to each attribute name. This paper discribes a method that extracts schemas (sets of attribute names) using bootstrapping algorithm.
Keywords :
information retrieval; search engines; text analysis; attribute name extraction; bootstrapping algorithm; information extraction; search engine; semistructured Web document; text substring; Data mining; Dictionaries; Hard disks; Personal communication networks; Relational databases; Search engines; Web pages; Information extraction; attribute name; bootstrapping;
Conference_Titel :
Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-2383-5
Electronic_ISBN :
1062-922X
DOI :
10.1109/ICSMC.2008.4811612