• DocumentCode
    654984
  • Title

    DAG Based Feature Additive XML Schema Generation for Unstructured Text

  • Author

    Rajbabu, K. ; Sudha, S.

  • Author_Institution
    Bharat Heavy Electricals Ltd., Tiruchirappalli, India
  • fYear
    2013
  • fDate
    10-12 Oct. 2013
  • Firstpage
    117
  • Lastpage
    124
  • Abstract
    Recent works on handling unstructured text employ multilevel filtering techniques for identifying the key terms in documents and then apply mining techniques to extract necessary information. Though these techniques are more efficient in information retrieval, they cannot be applied directly for information extraction, for documents that are more critical in context and also accuracy cannot be expected. Further, loss of hidden and significant information cannot be tolerated in data critical applications emerging based on unstructured documents. Hence, a novel idea of re-organizing the unstructured textual model into feature enriched structured graphical model by adding spatial, logical, lexical, syntactical and semantic features is proposed. The generated graph depicts relationships across the document at all levels from its micro level token to macro level document. Moreover, a structural pattern identification algorithm for generating an XML schema from the generated graph is also recommended. The experimental outcome for a real-time dataset is presented.
  • Keywords
    XML; data mining; directed graphs; information filtering; text analysis; DAG based feature additive XML schema generation; data critical applications; directed acyclic graph; information extraction; information retrieval; lexical features; logical features; macro level document; microlevel token; mining techniques; multilevel filtering techniques; semantic features; spatial features; structural pattern identification algorithm; structured graphical model; syntactical features; unstructured documents; unstructured textual model; Context; Data mining; Feature extraction; Information retrieval; Object oriented modeling; Semantics; XML; DAG; Feature Categorization; Feature Enrichment; XML Schema; unstructured text;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2013 International Conference on
  • Conference_Location
    Beijing
  • Type

    conf

  • DOI
    10.1109/CyberC.2013.27
  • Filename
    6685668