• DocumentCode
    424376
  • Title

    Grid-based indexing of a newswire corpus

  • Author

    Hughes, Baden ; Venugopal, Srikumar ; Buyya, Rajkumar

  • Author_Institution
    Dept. of Comput. Sci. & Software Eng., Melbourne Univ., Parkville, Vic., Australia
  • fYear
    2004
  • fDate
    8 Nov. 2004
  • Firstpage
    320
  • Lastpage
    327
  • Abstract
    In this paper we report experience in the use of computational grids in the domain of natural language processing, particularly in the area of information extraction, to create query indices for information retrieval tasks. Given the prevalence of large corpora in the natural language processing domain, computational grids offer significant utility to researchers in the domain who are reaching the bounds of computational efficiency. We leverage the affinities between the segmented data sources prevalent in natural language processing and the parallelisation model from the grid domain. The experiment reported here is a large-scale newswire corpus indexing task, with the goal to efficiently create a queryable index of the entire corpus. By parallelising the indexing task and executing it on an Australian computational grid, we observe overall performance improvement of a 2.26x speedup over the same experiment on a single computational node. In addition to reporting the raw performance impact, we reflect on a number of interesting points discovered during the execution of the experiments and propose a number of new requirements for grid middleware.
  • Keywords
    grid computing; information resources; information retrieval; middleware; natural languages; Newswire corpus; computational; grid-based indexing; information extraction; information retrieval task; middleware; natural language processing; parallelisation model; query indices; segmented data source; Australia; Computational efficiency; Concurrent computing; Data mining; Grid computing; Indexing; Information retrieval; Large-scale systems; Middleware; Natural language processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on
  • ISSN
    1550-5510
  • Print_ISBN
    0-7695-2256-4
  • Type

    conf

  • DOI
    10.1109/GRID.2004.34
  • Filename
    1382847