DocumentCode :
3423592
Title :
A linguistic and statistical approach for extracting knowledge from documents
Author :
Sado, Wilfried Njomgue ; Fontaine, Dominique ; Fontaine, Philippe
Author_Institution :
Technol. Univ. of Compiegne, France
fYear :
2004
fDate :
30 Aug.-3 Sept. 2004
Firstpage :
454
Lastpage :
458
Abstract :
We present and evaluate an innovating method of automatic indexing. It combines a linguistic analysis of the document to be indexed and a statistical analysis by the singular values decomposition of words in the document. The weighting of words combines advantages of their local and global context as well as their position compared to others terms - the co-occurrence. An application was developed in order to propose assignments topics of documents to a hierarchical referential. Finally, we present experimental results and evaluation carried out on documents of Suez-Environment.
Keywords :
computational linguistics; indexing; information retrieval; knowledge acquisition; singular value decomposition; statistical analysis; word processing; Suez-Environment documents; automatic document indexing; knowledge extraction; linguistic analysis; singular values decomposition; statistical analysis; Content based retrieval; Data mining; Documentation; Information retrieval; Information technology; Machine assisted indexing; Proposals; Statistical analysis; Text analysis; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications, 2004. Proceedings. 15th International Workshop on
ISSN :
1529-4188
Print_ISBN :
0-7695-2195-9
Type :
conf
DOI :
10.1109/DEXA.2004.1333516
Filename :
1333516
Link To Document :
بازگشت