Title :
Aliases discovered in Thai sports news articles
Author :
Suwanapong, T. ; Theeramunkong, T.
Author_Institution :
Sch. of Inf., Comput. & Commun. Technol., Thammasat Univ., Pathum Thani, Thailand
Abstract :
Aliases discovered in Thai articles are challenging. We apply a standard vector space model to explore and match aliases with formal names or each others. On first construct a term-by-document matrix (TDM), which contains term frequency of term occurring in document collection assuming that all terms exist in the typed named entity dictionary. Normalization techniques are used instead of standard weighting functions to reduce the gap among related terms; alternatively increase the gap of unrelated terms. The matrix decomposition algorithm decomposes the term-by-document matrix to form the left singular vectors which projects term properties. We finally create a correlation matrix to represent term relations. The empirical results show that this technique is appropriate in discovering aliases in highly sparse matrix.
Keywords :
dictionaries; information resources; matrix decomposition; natural languages; pattern matching; sparse matrices; sport; vectors; Thai sports news article; alias match; correlation matrix; document collection; normalization technique; sparse matrix; standard weighting function; term-by-document matrix decomposition algorithm; typed named entity dictionary; vector space model; Automatic testing; Dictionaries; Frequency; Matrix decomposition; Natural language processing; Search engines; Space exploration; Sparse matrices; Time division multiplexing; Uniform resource locators;
Conference_Titel :
Natural Language Processing, 2009. SNLP '09. Eighth International Symposium on
Conference_Location :
Bangkok
Print_ISBN :
978-1-4244-4138-9
Electronic_ISBN :
978-1-4244-4139-6
DOI :
10.1109/SNLP.2009.5340945