DocumentCode :
2295994
Title :
Knowledge acquisition from documents with both fixed and free formats
Author :
Hirasawa, Shigeichi ; Chu, Wesley W.
Author_Institution :
Dept. of Industrial & Manage. Syst. Eng., Waseda Univ., Japan
Volume :
5
fYear :
2003
fDate :
5-8 Oct. 2003
Firstpage :
4694
Abstract :
Based on techniques in information retrieval, we discuss the methods for knowledge acquisition from the documents composed of both fixed and free formats. The documents with the fixed format imply items with those selected from the sentences, words, symbols, or numbers, while the documents with free format are with the usual text. In this paper, starting with the item-document matrix and term-document matrix used for the representation of a document set, we propose a new method for knowledge acquisition taking simultaneously into account of both fixed and free formats. A method based on the probabilistic latent semantic indexing (PLSI) model is used for clustering a set of documents. The proposed method is applied to a document set given by the questionnaires of students taken for the purpose of faculty development. We show the effectiveness of the proposed method compared to the conventional method.
Keywords :
information retrieval; knowledge acquisition; probability; programming language semantics; clustering; fixed format; free format; information retrieval; item-document matrix; knowledge acquisition; probabilistic latent semantic indexing model; students questionnaires; term-document matrix; Computer industry; Computer science; Engineering management; Indexing; Information retrieval; Knowledge acquisition; Knowledge engineering; Knowledge management; Matrix decomposition; Systems engineering and theory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2003. IEEE International Conference on
ISSN :
1062-922X
Print_ISBN :
0-7803-7952-7
Type :
conf
DOI :
10.1109/ICSMC.2003.1245725
Filename :
1245725
Link To Document :
بازگشت