Title of article :
Generating grammars for SGML tagged texts lacking DTD
Author/Authors :
Ahonen، نويسنده , , H. and Mannila، نويسنده , , H. and Nikunen، نويسنده , , E.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 1997
Abstract :
We describe a technique for forming a context free grammar for a document that has some kind of tagging—structural or typographical—but no concise description of the structure is available. The technique is based on ideas from machine learning. It forms first a set of finite-state automata describing the document completely. These automata are modified by considering certain context conditions; the modifications correspond to generalizing the underlying languages. Finally, the automata are converted into regular expressions, which are then used to construct the grammar. An alternative representation, characteristic k-grams, is also introduced. Additionally, the paper describes some interactive operations necessary for generating a grammar for a large and complicated document.
Keywords :
Structured documents , SGML , Grammatical inference , Inductive inference , Context-Free Grammars
Journal title :
Mathematical and Computer Modelling
Journal title :
Mathematical and Computer Modelling