Title :
Automated extraction of health resource URLs from biomedical abstracts
Author :
Young, Jodi-Ann ; Frenz, Christopher M.
Author_Institution :
Dept. of Comput. Eng. Technol., New York City Coll. of Technol. (CUNY), Brooklyn, OH, USA
Abstract :
The World Wide Web has grown to become one of the most pervasive and comprehensive information repositories available today and many compare the knowledge contained within it to a modern day library of Alexandria. Yet, despite its vastness, one of the downsides to using Web-based information sources is that the information contained in most Web pages has never been reviewed for accuracy or quality and thus it is often considered unsuitable for application where accuracy is of critical importance. While sites of questionable quality clearly should be avoided, one cannot deny the utility and advantages of Web based resources, and thus a methodology has been developed to identify expert vetted health-related Web resources. The resultant computer software searches for biomedical abstracts via Pubmed that pertain to a health topic of interest and via regular expression based pattern matching extracts all of the URLs that appear in the article abstracts. Given that articles that appear in Pubmed typically endure peer-review processes it can be assumed that all of the resources referenced by these articles have a reasonable level of quality, since they are being cited by experts. Thus the program is able to compile a list of quality Web resources pertaining to health topics. This system has successfully been used to create a list of Influenza related Web resources as a means of illustrating its utility.
Keywords :
bioinformatics; information retrieval; pattern matching; Pubmed; Web resources; Web-based information sources; biomedical abstracts; health resource URL extraction; peer-review process; regular expression based pattern matching; Abstracts; Application software; Biomedical computing; Data mining; Influenza; Libraries; Pattern matching; Uniform resource locators; Web pages; Web sites; bioinformatics; healthcare; medical informatics; text mining;
Conference_Titel :
Applications and Technology Conference (LISAT), 2010 Long Island Systems
Conference_Location :
Farmingdale, NY
Print_ISBN :
978-1-4244-5548-5
Electronic_ISBN :
978-1-4244-5550-8
DOI :
10.1109/LISAT.2010.5478290