Title :
Classification of text to subject using LDA
Author :
Smith, Douglas A. ; McManis, Charles
Author_Institution :
Blekko Inc., Redwood City, CA, USA
Abstract :
Blekko Inc., an Internet search company, has divided web sites into subjects we call slash tags. Text from these web sites can be processed using Latent Dirichlet Allocations (LDA), to determine sets of topics for each subject. These topics can then be used to classify any text to determine the subject. We will discuss the methods used to do this; the details of the corpus used for training and testing; and results on how well the system works to classify a priori known text.
Keywords :
Internet; Web sites; classification; text analysis; Blekko Inc; Internet search company; LDA; Web sites; latent Dirichlet allocation; slash tag; text classification; Histograms; Resource management; Training;
Conference_Titel :
Semantic Computing (ICSC), 2015 IEEE International Conference on
Conference_Location :
Anaheim, CA
DOI :
10.1109/ICOSC.2015.7050791