• DocumentCode
    2424624
  • Title

    An unsupervised language model adaptation based on keyword clustering and query availability estimation

  • Author

    Ito, Akinori ; Kajiura, Yasutomo ; Makino, Shozo ; Suzuki, Motoyuki

  • Author_Institution
    Grad. Sch. of Eng., Tohoku Univ., Sendai
  • fYear
    2008
  • fDate
    7-9 July 2008
  • Firstpage
    1412
  • Lastpage
    1418
  • Abstract
    Language model adaptation using text data downloaded from the WWW is an efficient way to train a topic-specific LM. We are developing an unsupervised LM adaptation method using data in the Web. The one key point of unsupervised Web-based LM adaptation is how to select keywords to compose the search query. In this paper, we propose a new method of selecting keywords from keyword candidates, which uses a keyword clustering technique based on word similarities. The other key point is how to determine the number of downloaded pages for each query. In this paper we propose a method to estimate "a query availability," which is based on a small number of downloaded Web pages. The experimental result showed that the determination of downloaded pages using the query availability was effective than the conventional methods that determined the number of pages empirically.
  • Keywords
    Internet; pattern clustering; query processing; World Wide Web; downloaded Web pages; keyword clustering; query availability estimation; unsupervised language model adaptation; word similarities; Adaptation model; Data engineering; Frequency; Information retrieval; Natural languages; Sampling methods; Speech recognition; Web pages; Web sites; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Audio, Language and Image Processing, 2008. ICALIP 2008. International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-1723-0
  • Electronic_ISBN
    978-1-4244-1724-7
  • Type

    conf

  • DOI
    10.1109/ICALIP.2008.4590103
  • Filename
    4590103