Abstract :
Many researchers have proposed classification systems that automatically classify email in order to reduce information overload. However, none of these systems are in use today. This paper examines some of the problems with classification technologies and proposes Relevance Categories as a method to avoid some of these problems. In particular, the dynamic nature of email categories, the cognitive overhead, required training categories, and the high costs of classification errors are hurdles for many classification algorithms. Relevance Categories avoid some of these problems through their simplicity; they are merely relevance-ranked lists of email messages that are similar to a set of query messages. by displaying messages as the result of a dynamic query in lieu of fixed categories, we hypothesize that users will be less sensitive to errors using the Relevance Categories scheme than to errors using a fixed categorization scheme. To study the effectiveness of the Relevance Categories concept, we devised a performance metric for relevance ranking and used it to test an inverted index implementation on the Reuter-21578 test collection. The promising test results indicate the need for further work
Keywords :
classification; electronic mail; relevance feedback; classification systems; cognitive overhead; email organization; inverted index; performance metric; relevance categories; relevance ranking; training categories; Classification algorithms; Costs; Electronic mail; Electronic switching systems; Frequency; Integrated circuit testing;