Abstract :
Today, most document categorization in organizations is
done manually. We save at work hundreds of files and
e-mail messages in folders every day. While automatic
document categorization has been widely studied, much
challenging research still remains to support usersubjective
categorization. This study evaluates and compares
the application of self-organizing maps (SOMs)
and learning vector quantization (LVQ) with automatic
document classification, using a set of documents from
an organization, in a specific domain, manually classified
by a domain expert. After running the SOM and LVQ
we requested the user to reclassify documents that were
misclassified by the system. Results show that despite
the subjective nature of human categorization, automatic
document categorization methods correlate well
with subjective, personal categorization, and the LVQ
method outperforms the SOM. The reclassification
process revealed an interesting pattern: About 40% of
the documents were classified according to their original
categorization, about 35% according to the system’s
categorization (the users changed the original categorization),
and the remainder received a different (new)
categorization. Based on these results we conclude that
automatic support for subjective categorization is feasible;
however, an exact match is probably impossible due
to the users’ changing categorization behavior