Title :
Judging whether a document changes in subject
Author :
Nicholson, Colin
Author_Institution :
Inst. for Artificial Intell., Univ. of Georgia, Athens, GA, USA
Abstract :
This paper describes a method for determining whether a document is composed of text related to a single subject or text that changes subjects. The algorithm involves dividing the document into five equal parts and measuring the text similarity of the different sections with one another. Documents that drift in subject are shown to have a higher standard deviation of similarity values than documents that remain on one subject. This method requires a threshold value that is specific to the domain to work properly.
Keywords :
classification; fuzzy logic; text analysis; classification; coherence; document; fuzzy logic; semantic unity; standard deviation; subject; text similarity; threshold value; Artificial intelligence; Broadcasting; Computer errors; Distortion measurement; Fasteners; Size measurement; Software measurement; Speech; Testing; Writing;
Conference_Titel :
Southeastcon, 2009. SOUTHEASTCON '09. IEEE
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-3976-8
Electronic_ISBN :
978-1-4244-3978-2
DOI :
10.1109/SECON.2009.5174074