DocumentCode
3662452
Title
MDLDA: A new discriminant semantic representation for in-depth document analysis
Author
Xiaoli Zhang;Lu Lu
Author_Institution
Department of Industrial Design, Huangshan College, Tunxi District, Huangshan, Anhui, P.R. China
fYear
2015
fDate
7/1/2015 12:00:00 AM
Firstpage
1112
Lastpage
1117
Abstract
This study considers the problem of in-depth document analysis. We propose a new document analysis method, named Multi-Dimensional Linear Discriminant Analysis (MDL-DA), which enables us to formulate an efficient class specific semantic representation of local information from a document with respect to term associations and spatial distributions. MDL-DA works by firstly partitioning each document into paragraphs and building a term affinity graph, which represents the frequency of term co-occurrence in a paragraph. We then conduct a Two-Dimensional Linear Discriminant Analysis (2DLDA) to achieve an optimal discriminating mapping. A hybrid document similarity measure is designed by hybridizing the global and local semantics from a document to boost the performance of this framework. Our algorithm is examined in web document classification. Experimental results demonstrate that the proposed technique outperforms current algorithms with respect to accuracy and computational efficiency.
Keywords
"Semantics","Principal component analysis","Accuracy","Large scale integration","Feature extraction","Linear discriminant analysis","Training"
Publisher
ieee
Conference_Titel
Industrial Informatics (INDIN), 2015 IEEE 13th International Conference on
ISSN
1935-4576
Electronic_ISBN
2378-363X
Type
conf
DOI
10.1109/INDIN.2015.7281891
Filename
7281891
Link To Document