Title :
Knowledge acquisition from documents with both fixed and free formats
Author :
Hirasawa, Shigeichi ; Chu, Wesley W.
Author_Institution :
Dept. of Industrial & Manage. Syst. Eng., Waseda Univ., Japan
Abstract :
Based on techniques in information retrieval, we discuss the methods for knowledge acquisition from the documents composed of both fixed and free formats. The documents with the fixed format imply items with those selected from the sentences, words, symbols, or numbers, while the documents with free format are with the usual text. In this paper, starting with the item-document matrix and term-document matrix used for the representation of a document set, we propose a new method for knowledge acquisition taking simultaneously into account of both fixed and free formats. A method based on the probabilistic latent semantic indexing (PLSI) model is used for clustering a set of documents. The proposed method is applied to a document set given by the questionnaires of students taken for the purpose of faculty development. We show the effectiveness of the proposed method compared to the conventional method.
Keywords :
information retrieval; knowledge acquisition; probability; programming language semantics; clustering; fixed format; free format; information retrieval; item-document matrix; knowledge acquisition; probabilistic latent semantic indexing model; students questionnaires; term-document matrix; Computer industry; Computer science; Engineering management; Indexing; Information retrieval; Knowledge acquisition; Knowledge engineering; Knowledge management; Matrix decomposition; Systems engineering and theory;
Conference_Titel :
Systems, Man and Cybernetics, 2003. IEEE International Conference on
Print_ISBN :
0-7803-7952-7
DOI :
10.1109/ICSMC.2003.1245725