Title :
Thai herb information extraction from multiple websites
Author :
Chainapaporn, Phakphoom ; Netisopakul, Ponrudee
Author_Institution :
Fac. of Inf. Technol., King Mongkut´´s Inst. of Technol. Ladkrabang, Bangkok, Thailand
Abstract :
Thai herbs have increasingly gained public attention. Recently, there are a number of Thai herb websites. Each website has similar information but quite different details. For example, some webpages do not provide information indicating which part of Thai herb can treat the specified symptom. In order to collect more complete Thai herb information, we have developed information extraction process to extract Thai herb information from multiple websites. The process employed a HTML parser and file templates to recognize useful information in various webpage formats. Preliminary experiments gave satisfactory precision and recall over 85 percent.
Keywords :
Web sites; grammars; hypermedia markup languages; information retrieval; medical information systems; HTML parser; Thai herb information extraction; Webpage formats; file templates; information extraction process; multiple Websites; Argon; HTML; Ontologies; Optimized production technology; HTML parser; Thai herbs; Web information extraction;
Conference_Titel :
Knowledge and Smart Technology (KST), 2012 4th International Conference on
Conference_Location :
Chonburi
Print_ISBN :
978-1-4673-2166-2
DOI :
10.1109/KST.2012.6287734