• DocumentCode
    3662452
  • Title

    MDLDA: A new discriminant semantic representation for in-depth document analysis

  • Author

    Xiaoli Zhang;Lu Lu

  • Author_Institution
    Department of Industrial Design, Huangshan College, Tunxi District, Huangshan, Anhui, P.R. China
  • fYear
    2015
  • fDate
    7/1/2015 12:00:00 AM
  • Firstpage
    1112
  • Lastpage
    1117
  • Abstract
    This study considers the problem of in-depth document analysis. We propose a new document analysis method, named Multi-Dimensional Linear Discriminant Analysis (MDL-DA), which enables us to formulate an efficient class specific semantic representation of local information from a document with respect to term associations and spatial distributions. MDL-DA works by firstly partitioning each document into paragraphs and building a term affinity graph, which represents the frequency of term co-occurrence in a paragraph. We then conduct a Two-Dimensional Linear Discriminant Analysis (2DLDA) to achieve an optimal discriminating mapping. A hybrid document similarity measure is designed by hybridizing the global and local semantics from a document to boost the performance of this framework. Our algorithm is examined in web document classification. Experimental results demonstrate that the proposed technique outperforms current algorithms with respect to accuracy and computational efficiency.
  • Keywords
    "Semantics","Principal component analysis","Accuracy","Large scale integration","Feature extraction","Linear discriminant analysis","Training"
  • Publisher
    ieee
  • Conference_Titel
    Industrial Informatics (INDIN), 2015 IEEE 13th International Conference on
  • ISSN
    1935-4576
  • Electronic_ISBN
    2378-363X
  • Type

    conf

  • DOI
    10.1109/INDIN.2015.7281891
  • Filename
    7281891