Title of article

Using scatterplots to understand and improve probabilistic models for text categorization and retrieval Original Research Article

Author/Authors

Giorgio Maria Di Nunzio، نويسنده ,

Issue Information

روزنامه با شماره پیاپی سال 2009

Pages

12

From page

945

To page

956

Abstract

The two-dimensional representation of documents which allows documents to be represented in a two-dimensional Cartesian plane has proved to be a valid visualization tool for Automated Text Categorization (ATC) for understanding the relationships between categories of textual documents, and to help users to visually audit the classifier and identify suspicious training data. This paper analyzes a specific use of this visualization approach in the case of the Naive Bayes (NB) model for text classification and the Binary Independence Model (BIM) for text retrieval. For text categorization, a reformulation of the equation for the decision of classification has to be written in such a way that each coordinate of a document is the sum of two addends: a variable component image, and a constant component image, the prior of the category. When plotted in the Cartesian plane according to this formulation, the documents that are constantly shifted along the x-axis and the y-axis can be seen. This effect of shifting is more or less evident according to which NB model, Bernoulli or multinomial, is chosen. For text retrieval, the same reformulation can be applied in the case of the BIM model. The visualization helps to understand the decisions that are taken to order the documents, in particular in the case of relevance feedback.

Keywords

Naive Bayes models , Information retrieval , Text Categorization

Journal title

International Journal of Approximate Reasoning

Serial Year

2009

Journal title

International Journal of Approximate Reasoning

Record number

Using scatterplots to understand and improve probabilistic models for text categorization and retrieval Original Research Article

Giorgio Maria Di Nunzio، نويسنده ,

1182724