Title :
A linguistic and statistical approach for extracting knowledge from documents
Author :
Sado, Wilfried Njomgue ; Fontaine, Dominique ; Fontaine, Philippe
Author_Institution :
Technol. Univ. of Compiegne, France
fDate :
30 Aug.-3 Sept. 2004
Abstract :
We present and evaluate an innovating method of automatic indexing. It combines a linguistic analysis of the document to be indexed and a statistical analysis by the singular values decomposition of words in the document. The weighting of words combines advantages of their local and global context as well as their position compared to others terms - the co-occurrence. An application was developed in order to propose assignments topics of documents to a hierarchical referential. Finally, we present experimental results and evaluation carried out on documents of Suez-Environment.
Keywords :
computational linguistics; indexing; information retrieval; knowledge acquisition; singular value decomposition; statistical analysis; word processing; Suez-Environment documents; automatic document indexing; knowledge extraction; linguistic analysis; singular values decomposition; statistical analysis; Content based retrieval; Data mining; Documentation; Information retrieval; Information technology; Machine assisted indexing; Proposals; Statistical analysis; Text analysis; Tree data structures;
Conference_Titel :
Database and Expert Systems Applications, 2004. Proceedings. 15th International Workshop on
Print_ISBN :
0-7695-2195-9
DOI :
10.1109/DEXA.2004.1333516