Title :
DAG Based Feature Additive XML Schema Generation for Unstructured Text
Author :
Rajbabu, K. ; Sudha, S.
Author_Institution :
Bharat Heavy Electricals Ltd., Tiruchirappalli, India
Abstract :
Recent works on handling unstructured text employ multilevel filtering techniques for identifying the key terms in documents and then apply mining techniques to extract necessary information. Though these techniques are more efficient in information retrieval, they cannot be applied directly for information extraction, for documents that are more critical in context and also accuracy cannot be expected. Further, loss of hidden and significant information cannot be tolerated in data critical applications emerging based on unstructured documents. Hence, a novel idea of re-organizing the unstructured textual model into feature enriched structured graphical model by adding spatial, logical, lexical, syntactical and semantic features is proposed. The generated graph depicts relationships across the document at all levels from its micro level token to macro level document. Moreover, a structural pattern identification algorithm for generating an XML schema from the generated graph is also recommended. The experimental outcome for a real-time dataset is presented.
Keywords :
XML; data mining; directed graphs; information filtering; text analysis; DAG based feature additive XML schema generation; data critical applications; directed acyclic graph; information extraction; information retrieval; lexical features; logical features; macro level document; microlevel token; mining techniques; multilevel filtering techniques; semantic features; spatial features; structural pattern identification algorithm; structured graphical model; syntactical features; unstructured documents; unstructured textual model; Context; Data mining; Feature extraction; Information retrieval; Object oriented modeling; Semantics; XML; DAG; Feature Categorization; Feature Enrichment; XML Schema; unstructured text;
Conference_Titel :
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2013 International Conference on
Conference_Location :
Beijing
DOI :
10.1109/CyberC.2013.27