DocumentCode
654984
Title
DAG Based Feature Additive XML Schema Generation for Unstructured Text
Author
Rajbabu, K. ; Sudha, S.
Author_Institution
Bharat Heavy Electricals Ltd., Tiruchirappalli, India
fYear
2013
fDate
10-12 Oct. 2013
Firstpage
117
Lastpage
124
Abstract
Recent works on handling unstructured text employ multilevel filtering techniques for identifying the key terms in documents and then apply mining techniques to extract necessary information. Though these techniques are more efficient in information retrieval, they cannot be applied directly for information extraction, for documents that are more critical in context and also accuracy cannot be expected. Further, loss of hidden and significant information cannot be tolerated in data critical applications emerging based on unstructured documents. Hence, a novel idea of re-organizing the unstructured textual model into feature enriched structured graphical model by adding spatial, logical, lexical, syntactical and semantic features is proposed. The generated graph depicts relationships across the document at all levels from its micro level token to macro level document. Moreover, a structural pattern identification algorithm for generating an XML schema from the generated graph is also recommended. The experimental outcome for a real-time dataset is presented.
Keywords
XML; data mining; directed graphs; information filtering; text analysis; DAG based feature additive XML schema generation; data critical applications; directed acyclic graph; information extraction; information retrieval; lexical features; logical features; macro level document; microlevel token; mining techniques; multilevel filtering techniques; semantic features; spatial features; structural pattern identification algorithm; structured graphical model; syntactical features; unstructured documents; unstructured textual model; Context; Data mining; Feature extraction; Information retrieval; Object oriented modeling; Semantics; XML; DAG; Feature Categorization; Feature Enrichment; XML Schema; unstructured text;
fLanguage
English
Publisher
ieee
Conference_Titel
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2013 International Conference on
Conference_Location
Beijing
Type
conf
DOI
10.1109/CyberC.2013.27
Filename
6685668
Link To Document