Title :
MDLDA: A new discriminant semantic representation for in-depth document analysis
Author :
Xiaoli Zhang;Lu Lu
Author_Institution :
Department of Industrial Design, Huangshan College, Tunxi District, Huangshan, Anhui, P.R. China
fDate :
7/1/2015 12:00:00 AM
Abstract :
This study considers the problem of in-depth document analysis. We propose a new document analysis method, named Multi-Dimensional Linear Discriminant Analysis (MDL-DA), which enables us to formulate an efficient class specific semantic representation of local information from a document with respect to term associations and spatial distributions. MDL-DA works by firstly partitioning each document into paragraphs and building a term affinity graph, which represents the frequency of term co-occurrence in a paragraph. We then conduct a Two-Dimensional Linear Discriminant Analysis (2DLDA) to achieve an optimal discriminating mapping. A hybrid document similarity measure is designed by hybridizing the global and local semantics from a document to boost the performance of this framework. Our algorithm is examined in web document classification. Experimental results demonstrate that the proposed technique outperforms current algorithms with respect to accuracy and computational efficiency.
Keywords :
"Semantics","Principal component analysis","Accuracy","Large scale integration","Feature extraction","Linear discriminant analysis","Training"
Conference_Titel :
Industrial Informatics (INDIN), 2015 IEEE 13th International Conference on
Electronic_ISBN :
2378-363X
DOI :
10.1109/INDIN.2015.7281891