DocumentCode :
2190575
Title :
Inter-document reference detection as an alternative to full text semantic analysis in document clustering
Author :
De Maziere, Patrick A. ; Van Hulle, Marc M.
Author_Institution :
Dept. Healthcare & Technol., KHLeuven, Leuven, Belgium
fYear :
2013
fDate :
22-25 Sept. 2013
Firstpage :
1
Lastpage :
6
Abstract :
We discuss here the search for inter-document references as an alternative to the grouping of document inventories based on a full text semantic analysis. The used document inventory, which is not publicly available, was provided to us by the European Union (EU) in the framework of an EU project, the aim of which was to analyse, classify, and visualise EU funded research in social sciences and humanities in EU framework programmes FP5 and FP6. This project, called the SSH project for short, was aimed at the evaluation of the contributions of research to the development of EU policies. For the semantic based grouping, we start from a Multi-Dimensional Scaling analysis of the document vectors, which is the result of a prior semantic analysis. As an alternative to a semantic analysis, we searched for inter-document references or direct references. Direct references are defined as terms that explicitly refer to other documents present in the inventory. We show that the grouping based on references is largely similar to the one based on semantics, but with considerably less computational efforts. In addition, the non-expert can make better use of the results, since the references are displayed as graphical webpages with hyperlinks pointing to both the referenced and the referencing document(s), and the reason of linkage. Finally, we show that the combination of a database, to store the data and the (intermediate) results, and a webserver, to visualise the results, offers a powerful platform to analyse the document inventory and to share the results with all participants/collaborators involved in a data- and computation intensive EU-project, thereby guaranteeing both data- and result-consistency.
Keywords :
data visualisation; database management systems; document handling; file servers; pattern clustering; EU funded research; EU project; European Union; FP5; FP6; Webserver; data-consistency; database; document clustering; document inventories; document vectors; full text semantic analysis; graphical webpages; humanities; hyperlinks; interdocument reference detection; multidimensional scaling analysis; result-consistency; semantic based grouping; social sciences; Databases; Europe; Semantics; Servers; Terminology; Text analysis; Vectors; HPC; Semantic Analysis; Text Mining; client-server infrastructure;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning for Signal Processing (MLSP), 2013 IEEE International Workshop on
Conference_Location :
Southampton
ISSN :
1551-2541
Type :
conf
DOI :
10.1109/MLSP.2013.6661952
Filename :
6661952
Link To Document :
بازگشت