• DocumentCode
    2441802
  • Title

    Constructing parser for industrial software specifications containing formal and natural language description

  • Author

    Iwama, Futoshi ; Nakamura, Taiga ; Takeuchi, Hironori

  • Author_Institution
    IBM Res. - Tokyo, IBM Japan, Yamato, Japan
  • fYear
    2012
  • fDate
    2-9 June 2012
  • Firstpage
    1012
  • Lastpage
    1021
  • Abstract
    This paper describes a novel framework for creating a parser to process and analyze texts written in a “partially structured” natural language. In many projects, the contents of document artifacts tend to be described as a mixture of formal parts (i.e. the text constructs follow specific conventions) and parts written in arbitrary free text. Formal parsers, typically defined and used to process a description with rigidly defined syntax such as program source code are very precise and efficient in processing the formal part, while parsers developed for natural language processing (NLP) are good at robustly interpreting the free-text part. Therefore, combining these parsers with different characteristics can allow for more flexible and practical processing of various project documents. Unfortunately, conventional approaches to constructing a parser from multiple small parsers were studied extensively only for formal language parsers and are not directly applicable to NLP parsers due to the differences in the way the input text is extracted and evaluated. We propose a method to configure and generate a combined parser by extending an approach based on parser combinator, the operators for composing multiple formal parsers, to support both NLP and formal parsers. The resulting text parser is based on Parsing Expression Grammars, and it benefits from the strength of both parser types. We demonstrate an application of such combined parser in practical situations and show that the proposed approach can efficiently construct a parser for analyzing project-specific industrial specification documents.
  • Keywords
    formal specification; grammars; natural language processing; NLP parsers; arbitrary free text; document artifacts; formal language description; formal parsers; industrial software specifications; natural language description; natural language processing; parser combinator; parsing expression grammars; program source code; Abstracts; Formal languages; Grammar; Natural language processing; Semantics; Syntactics; Document Analysis; Parser Combinator; Parsing Expression Grammars; Requirement Engineering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering (ICSE), 2012 34th International Conference on
  • Conference_Location
    Zurich
  • ISSN
    0270-5257
  • Print_ISBN
    978-1-4673-1066-6
  • Electronic_ISBN
    0270-5257
  • Type

    conf

  • DOI
    10.1109/ICSE.2012.6227119
  • Filename
    6227119