• DocumentCode
    913045
  • Title

    An application of cluster detection to text and picture processing

  • Author

    Rosenfeld, Avi ; Huang, Han K. ; Schneider, Victor B.

  • Volume
    15
  • Issue
    6
  • fYear
    1969
  • fDate
    11/1/1969 12:00:00 AM
  • Firstpage
    672
  • Lastpage
    681
  • Abstract
    Syntactic information about a corpus of linguistic or pictorial data can be discovered by analyzing the statistics of the data. Given a corpus of text, one can measure the tendencies of pairs of words to occur in common contexts, and use these measurements to define clusters of words. Applied to basic English text, this procedure yields clusters which correspond very closely to the traditional parts of speech (nouns, verbs, articles, etc.). For FORTRAN text, the clusters obtained correspond to integers, operations, etc.; for English text regarded as a sequence of letters (or of phonemes) rather than words, the vowels and the consonants are obtained as clusters. Finally, applied to the gray shades in a digitized picture, the procedure yields slice levels which appear to be useful for figure extraction.
  • Keywords
    Image analysis; Languages; Pattern clustering methods; Text processing; Biomedical measurements; Computer science; Gaussian noise; Image processing; Information theory; Integral equations; Linear systems; Noise generators; Riccati equations; Speech;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.1969.1054378
  • Filename
    1054378