Title :
Unsupervised structure discovery for biodiversity information
Author :
Cui, Hong ; McCourt, Richard M. ; Feist, Monique
Author_Institution :
Univ. of Western Ontario, London, Ont.
Abstract :
The project presented concerns with improving the access to biodiversity information in legacy formats. The majority of biodiversity information, for example, floras or faunas, is still in legacy format. To mobilize these information resources, a variety of techniques have been used to 1) fit the biodiversity information into the predefined structure of, typically, a relational database, using techniques such as information extraction, 2) make explicit the inherent yet implicit semantic structure in the documents, using techniques such as XML tagging. In either case, the expected outcome is to structure the originally less structured information prepared by taxonomists. In current research, domain experts and existing literature seem to play a crucial role in defining the target structure, either the templates for information extraction tasks, or the XML schema for the markup tasks
Keywords :
XML; biology computing; botany; information retrieval; relational databases; unsupervised learning; XML tagging; biodiversity information; information extraction; legacy formats; relational database; semantic structure; unsupervised structure discovery; Biodiversity; Buildings; Data mining; Information resources; Machine learning; North America; Prototypes; Relational databases; Tagging; XML; document structure; unsupervised machine learning;
Conference_Titel :
Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
Conference_Location :
Chapel Hill, NC
Print_ISBN :
1-59593-354-9
DOI :
10.1145/1141753.1141878