DocumentCode
2501560
Title
Aliases discovered in Thai sports news articles
Author
Suwanapong, T. ; Theeramunkong, T.
Author_Institution
Sch. of Inf., Comput. & Commun. Technol., Thammasat Univ., Pathum Thani, Thailand
fYear
2009
fDate
20-22 Oct. 2009
Firstpage
63
Lastpage
66
Abstract
Aliases discovered in Thai articles are challenging. We apply a standard vector space model to explore and match aliases with formal names or each others. On first construct a term-by-document matrix (TDM), which contains term frequency of term occurring in document collection assuming that all terms exist in the typed named entity dictionary. Normalization techniques are used instead of standard weighting functions to reduce the gap among related terms; alternatively increase the gap of unrelated terms. The matrix decomposition algorithm decomposes the term-by-document matrix to form the left singular vectors which projects term properties. We finally create a correlation matrix to represent term relations. The empirical results show that this technique is appropriate in discovering aliases in highly sparse matrix.
Keywords
dictionaries; information resources; matrix decomposition; natural languages; pattern matching; sparse matrices; sport; vectors; Thai sports news article; alias match; correlation matrix; document collection; normalization technique; sparse matrix; standard weighting function; term-by-document matrix decomposition algorithm; typed named entity dictionary; vector space model; Automatic testing; Dictionaries; Frequency; Matrix decomposition; Natural language processing; Search engines; Space exploration; Sparse matrices; Time division multiplexing; Uniform resource locators;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing, 2009. SNLP '09. Eighth International Symposium on
Conference_Location
Bangkok
Print_ISBN
978-1-4244-4138-9
Electronic_ISBN
978-1-4244-4139-6
Type
conf
DOI
10.1109/SNLP.2009.5340945
Filename
5340945
Link To Document