• DocumentCode
    780762
  • Title

    An active learning framework for content-based information retrieval

  • Author

    Zhang, Cha ; Chen, Tsuhan

  • Author_Institution
    Dept. of Electr. & Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • Volume
    4
  • Issue
    2
  • fYear
    2002
  • fDate
    6/1/2002 12:00:00 AM
  • Firstpage
    260
  • Lastpage
    268
  • Abstract
    We propose a general active learning framework for content-based information retrieval. We use this framework to guide hidden annotations in order to improve the retrieval performance. For each object in the database, we maintain a list of probabilities, each indicating the probability of this object having one of the attributes. During training, the learning algorithm samples objects in the database and presents them to the annotator to assign attributes. For each sampled object, each probability is set to be one or zero depending on whether or not the corresponding attribute is assigned by the annotator. For objects that have not been annotated, the learning algorithm estimates their probabilities with biased kernel regression. Knowledge gain is then defined to determine, among the objects that have not been annotated, which one the system is the most uncertain. The system then presents it as the next sample to the annotator to which it is assigned attributes. During retrieval, the list of probabilities works as a feature vector for us to calculate the semantic distance between two objects, or between the user query and an object in the database. The overall distance between two objects is determined by a weighted sum of the semantic distance and the low-level feature distance. The algorithm is tested on both synthetic databases and real databases of 3D models. In both cases, the retrieval performance of the system improves rapidly with the number of annotated samples. Furthermore, we show that active learning outperforms learning based on random sampling.
  • Keywords
    content-based retrieval; knowledge based systems; learning (artificial intelligence); object-oriented databases; query processing; relevance feedback; active learning; content-based information retrieval; database; feature distance; knowledge gain; probability; random sampling; semantic distance; Content based retrieval; Feature extraction; Feedback; Image databases; Image retrieval; Information retrieval; Kernel; Probability; Spatial databases; Testing;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2002.1017738
  • Filename
    1017738