• DocumentCode
    3628500
  • Title

    A generic method for multi word extraction from Wikipedia

  • Author

    Bozo Bekavac;Marko Tadic

  • Author_Institution
    Faculty of Humanities and Social Sciences, University of Zagreb, Ivana Lu?i?a 3, 10000, Croatia
  • fYear
    2008
  • fDate
    6/1/2008 12:00:00 AM
  • Firstpage
    663
  • Lastpage
    668
  • Abstract
    This paper presents the generic method for multiword expression extraction from Wikipedia. The method is using the properties of this specific encyclopedic genre in its HTML format and it relies on the intention of the authors of articles to link to other articles. The relevant links were processed by applying local regular grammars within the NooJ development environment. We tested the method on a Croatian version of Wikipedia and we present the results obtained.
  • Keywords
    "Internet","Encyclopedias","Information services","Electronic publishing","HTML","Filtering","Artificial neural networks"
  • Publisher
    ieee
  • Conference_Titel
    Information Technology Interfaces, 2008. ITI 2008. 30th International Conference on
  • ISSN
    1330-1012
  • Print_ISBN
    978-953-7138-12-7
  • Type

    conf

  • DOI
    10.1109/ITI.2008.4588490
  • Filename
    4588490