DocumentCode
2272533
Title
Applying LSI and data reduction to XML for counter terrorism
Author
Demurjian, S. ; Rajasekaran, Sanguthevar ; Ammar, R. ; Greenshields, I. ; Doan, T. ; He, L.
Author_Institution
Dept. of Comput. Sci. & Eng., Connecticut Univ., Storrs, CT
fYear
0
fDate
0-0 0
Abstract
Data reduction is a critical problem for counter-terrorism; large collections of documents must be analyzed and processed, raising issues related to performance, lossless reduction, polysemy (the meaning of individual words being influenced by their surrounding words), and synonymy (the possibility of the same term being described in different ways). In this paper, we begin by presenting a survey of latent semantic indexing (LSI) techniques and strategies. Next, we highlight a subset of LSI software packages that are available (commercially and academically). Then, we explore approaches that apply LSI to eXtensible Markup Language (XML) data. Using this as a basis, the paper proposes an approach that applies LSI and data reduction to XML documents by transitioning from support vector machines (SVM) to random projections to LSI, and also postulates on the exploitation of semantics of Web-based documents that are captured via XML tags
Keywords
XML; data reduction; indexing; semantic Web; support vector machines; terrorism; Web-based documents; XML data; XML documents; XML tags; counterterrorism; data reduction; eXtensible Markup Language; latent semantic indexing; lossless reduction; polysemy; support vector machines; synonymy; Application software; Computer science; Counting circuits; Data engineering; Helium; Indexing; Large scale integration; Support vector machines; Terrorism; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Aerospace Conference, 2006 IEEE
Conference_Location
Big Sky, MT
Print_ISBN
0-7803-9545-X
Type
conf
DOI
10.1109/AERO.2006.1656047
Filename
1656047
Link To Document