مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

3049164

Title :

Tag insertion complexity

Author :

Yeates, Stuart ; Witten, Ian H. ; Bainbridge, David

Author_Institution :

Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand

fYear :

2001

fDate :

2001

Firstpage :

243

Lastpage :

252

Abstract :

This paper is about inferring markup information, a generalization of part-of-speech tagging. We use compression models based on a marked-up training corpus and apply them to fresh, unmarked, text. In effect, this technique builds filters that extract information from text in a way that is generalized because it depends on training text rather than preprogrammed heuristics. As illustrated, we use SGML tags to represent the extracted information. However, we work in a more controlled textual environment: we use bibliographic text rather than plain English and mark up entities such as author, date, and titles rather than syntactic parts of speech. Such entities are generically called “metadata”-data about data-and form an important component of the information present in a bibliography. The aim of our work is to automatically enhance bibliographies with metadata tags, based on a training corpus of annotated bibliography entries

Keywords :

data compression; meta data; page description languages; search problems; SGML tags; Viterbi search; bibliographic text; bibliography; compression models; extracted information; filters; marked-up training corpus; markup information; metadata; part-of-speech tagging; tag insertion complexity; training text; Bibliographies; Computer science; Data mining; Dictionaries; Information filtering; Information filters; SGML; Tagging; Technical Activities Guide -TAG; Testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Compression Conference, 2001. Proceedings. DCC 2001.

Conference_Location :

Snowbird, UT

ISSN :

1068-0314

Print_ISBN :

0-7695-1031-0

Type :

conf

DOI :

10.1109/DCC.2001.917155

Filename :

917155

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3049164