DocumentCode :
2227315
Title :
An automated management tool for unstructured data
Author :
Ceglowski, Maciej ; Coburn, Aaron ; Cuadrado, John L.
fYear :
2003
fDate :
13-17 Oct. 2003
Firstpage :
554
Lastpage :
557
Abstract :
The rapidly growing quantity of online data has created a need for automated, content-based categorization and search tools. We describe an open-source, Web-based archive management, which uses latent semantic indexing, coupled with vector clustering techniques, to provide users with a fully searchable and automatically categorized interface to a data collection. The default English document parser included in the project uses part-of-speech tagging and recursive maximal noun phrase extraction to create a more effective term list for LSI than traditional stop list techniques. The archive interface supports multiple user views of the data collection. Advanced search features are implemented through relevance feedback, and do not require users to learn a query syntax.
Keywords :
Internet; content management; grammars; relevance feedback; search engines; English document parser; Web-based archive management; archive interface; automated content-based categorization; data collection; latent semantic indexing; online data; part-of-speech tagging; recursive maximal noun phrase extraction; relevance feedback; search tools; vector clustering techniques; Data mining; Educational technology; Feedback; Indexing; Information retrieval; Large scale integration; Open source software; Organizing; Speech; Tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
Print_ISBN :
0-7695-1932-6
Type :
conf
DOI :
10.1109/WI.2003.1241266
Filename :
1241266
Link To Document :
بازگشت