• Title of article

    Using scatterplots to understand and improve probabilistic models for text categorization and retrieval Original Research Article

  • Author/Authors

    Giorgio Maria Di Nunzio، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2009
  • Pages
    12
  • From page
    945
  • To page
    956
  • Abstract
    The two-dimensional representation of documents which allows documents to be represented in a two-dimensional Cartesian plane has proved to be a valid visualization tool for Automated Text Categorization (ATC) for understanding the relationships between categories of textual documents, and to help users to visually audit the classifier and identify suspicious training data. This paper analyzes a specific use of this visualization approach in the case of the Naive Bayes (NB) model for text classification and the Binary Independence Model (BIM) for text retrieval. For text categorization, a reformulation of the equation for the decision of classification has to be written in such a way that each coordinate of a document is the sum of two addends: a variable component image, and a constant component image, the prior of the category. When plotted in the Cartesian plane according to this formulation, the documents that are constantly shifted along the x-axis and the y-axis can be seen. This effect of shifting is more or less evident according to which NB model, Bernoulli or multinomial, is chosen. For text retrieval, the same reformulation can be applied in the case of the BIM model. The visualization helps to understand the decisions that are taken to order the documents, in particular in the case of relevance feedback.
  • Keywords
    Naive Bayes models , Information retrieval , Text Categorization
  • Journal title
    International Journal of Approximate Reasoning
  • Serial Year
    2009
  • Journal title
    International Journal of Approximate Reasoning
  • Record number

    1182724