• DocumentCode
    2948755
  • Title

    Automatic Extraction of Meaning from the Web

  • Author

    Cilibrasi, Rudi ; Vitanyi, Paul

  • Author_Institution
    CWI, Amsterdam
  • fYear
    2006
  • fDate
    9-14 July 2006
  • Firstpage
    2309
  • Lastpage
    2313
  • Abstract
    We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have literal embodiments like the first type, but may also be abstract like "red" or "Christianity". For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features between pairs of literal objects. For the second type we consider similarity distances generated by Web users corresponding to particular semantic relations between the (names for) the designated objects. For both families we give universal similarity distance measures, incorporating all particular distance measures in the family. In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms. In both cases experiments on a massive scale give evidence of the viability of the approaches
  • Keywords
    Internet; data compression; feature extraction; Google; Web; automatic extraction; compression; semantic relations; universal similarity distance measures; Bioinformatics; Books; Data mining; Fourier transforms; Genomics; Histograms; Mice; Particle measurements; Rhythm; Robustness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Theory, 2006 IEEE International Symposium on
  • Conference_Location
    Seattle, WA
  • Print_ISBN
    1-4244-0505-X
  • Electronic_ISBN
    1-4244-0504-1
  • Type

    conf

  • DOI
    10.1109/ISIT.2006.261979
  • Filename
    4036382