DocumentCode :
3762312
Title :
Deriving labeled training data for topic link detection by alternating words
Author :
Marc W. Abel;Soon M. Chung
Author_Institution :
Dept. of Computer Science and Engineering, Wright State University, Dayton, Ohio 45435, USA
fYear :
2015
Firstpage :
83
Lastpage :
88
Abstract :
Although classifiers can be trained to estimate whether two short text segments relate to a common topic, obtaining training data for supervised learning presents a hurdle. The natural approach would be to train with topic-aligned pairs of text segments from a large corpus, but nothing is available to locate such alignments. We offer that simply partitioning the words of a large document according to their odd and even positions will yield training data suitable for certain applications and sets of features. The reason is that the partitioned texts are topic-aligned along their respective lengths despite sharing no original word instances. We further show that parametrically introducing a small amount of overlap into the partitioned texts can greatly improve the precision of a classifier.
Keywords :
"Training data","Supervised learning","Software engineering","Backpropagation","Magnetic resonance","Image segmentation","Data visualization"
Publisher :
ieee
Conference_Titel :
Data and Software Engineering (ICoDSE), 2015 International Conference on
Print_ISBN :
978-1-4673-8428-5
Type :
conf
DOI :
10.1109/ICODSE.2015.7436976
Filename :
7436976
Link To Document :
بازگشت