• DocumentCode
    2501560
  • Title

    Aliases discovered in Thai sports news articles

  • Author

    Suwanapong, T. ; Theeramunkong, T.

  • Author_Institution
    Sch. of Inf., Comput. & Commun. Technol., Thammasat Univ., Pathum Thani, Thailand
  • fYear
    2009
  • fDate
    20-22 Oct. 2009
  • Firstpage
    63
  • Lastpage
    66
  • Abstract
    Aliases discovered in Thai articles are challenging. We apply a standard vector space model to explore and match aliases with formal names or each others. On first construct a term-by-document matrix (TDM), which contains term frequency of term occurring in document collection assuming that all terms exist in the typed named entity dictionary. Normalization techniques are used instead of standard weighting functions to reduce the gap among related terms; alternatively increase the gap of unrelated terms. The matrix decomposition algorithm decomposes the term-by-document matrix to form the left singular vectors which projects term properties. We finally create a correlation matrix to represent term relations. The empirical results show that this technique is appropriate in discovering aliases in highly sparse matrix.
  • Keywords
    dictionaries; information resources; matrix decomposition; natural languages; pattern matching; sparse matrices; sport; vectors; Thai sports news article; alias match; correlation matrix; document collection; normalization technique; sparse matrix; standard weighting function; term-by-document matrix decomposition algorithm; typed named entity dictionary; vector space model; Automatic testing; Dictionaries; Frequency; Matrix decomposition; Natural language processing; Search engines; Space exploration; Sparse matrices; Time division multiplexing; Uniform resource locators;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing, 2009. SNLP '09. Eighth International Symposium on
  • Conference_Location
    Bangkok
  • Print_ISBN
    978-1-4244-4138-9
  • Electronic_ISBN
    978-1-4244-4139-6
  • Type

    conf

  • DOI
    10.1109/SNLP.2009.5340945
  • Filename
    5340945