DocumentCode :
1866199
Title :
Classification of text to subject using LDA
Author :
Smith, Douglas A. ; McManis, Charles
Author_Institution :
Blekko Inc., Redwood City, CA, USA
fYear :
2015
fDate :
7-9 Feb. 2015
Firstpage :
131
Lastpage :
135
Abstract :
Blekko Inc., an Internet search company, has divided web sites into subjects we call slash tags. Text from these web sites can be processed using Latent Dirichlet Allocations (LDA), to determine sets of topics for each subject. These topics can then be used to classify any text to determine the subject. We will discuss the methods used to do this; the details of the corpus used for training and testing; and results on how well the system works to classify a priori known text.
Keywords :
Internet; Web sites; classification; text analysis; Blekko Inc; Internet search company; LDA; Web sites; latent Dirichlet allocation; slash tag; text classification; Histograms; Resource management; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing (ICSC), 2015 IEEE International Conference on
Conference_Location :
Anaheim, CA
Type :
conf
DOI :
10.1109/ICOSC.2015.7050791
Filename :
7050791
Link To Document :
بازگشت