DocumentCode :
3662452
Title :
MDLDA: A new discriminant semantic representation for in-depth document analysis
Author :
Xiaoli Zhang;Lu Lu
Author_Institution :
Department of Industrial Design, Huangshan College, Tunxi District, Huangshan, Anhui, P.R. China
fYear :
2015
fDate :
7/1/2015 12:00:00 AM
Firstpage :
1112
Lastpage :
1117
Abstract :
This study considers the problem of in-depth document analysis. We propose a new document analysis method, named Multi-Dimensional Linear Discriminant Analysis (MDL-DA), which enables us to formulate an efficient class specific semantic representation of local information from a document with respect to term associations and spatial distributions. MDL-DA works by firstly partitioning each document into paragraphs and building a term affinity graph, which represents the frequency of term co-occurrence in a paragraph. We then conduct a Two-Dimensional Linear Discriminant Analysis (2DLDA) to achieve an optimal discriminating mapping. A hybrid document similarity measure is designed by hybridizing the global and local semantics from a document to boost the performance of this framework. Our algorithm is examined in web document classification. Experimental results demonstrate that the proposed technique outperforms current algorithms with respect to accuracy and computational efficiency.
Keywords :
"Semantics","Principal component analysis","Accuracy","Large scale integration","Feature extraction","Linear discriminant analysis","Training"
Publisher :
ieee
Conference_Titel :
Industrial Informatics (INDIN), 2015 IEEE 13th International Conference on
ISSN :
1935-4576
Electronic_ISBN :
2378-363X
Type :
conf
DOI :
10.1109/INDIN.2015.7281891
Filename :
7281891
Link To Document :
بازگشت