Title :
Toward Unification of Source Attribution Processes and Techniques
Author :
Khosmood, Foaad ; Levinson, Robert
Author_Institution :
Dept. of Comput. Sci., California Univ., Santa Cruz, CA
Abstract :
Automatic source attribution refers to the ability for an autonomous process to determine the source of a previously unexamined piece of writing. Statistical methods for source attribution have been the subject of scholarly research for well over a century. The field, however, is still missing a definitive currency of established or agreed-upon classes of features, methods, techniques and nomenclature. This paper represents continuation of research into the basic attribution problem, as well as work towards an eventual source attribution standard. We augment previous work which utilized in-common, non-trivial word frequencies with neural networks on a more standardized data set. We also use two other techniques: phrase-based feature sets evaluated with naive Bayesians and bi-gram feature sets evaluated with the nearest neighbor algorithm. We compare the three and explore methods of combining the techniques in order to achieve better results
Keywords :
Bayes methods; learning (artificial intelligence); natural languages; neural nets; Bayesian feature sets; automatic source attribution process; bi-gram feature sets; nearest neighbor algorithm; neural networks; phrase-based feature sets; statistical methods; word frequencies; Artificial neural networks; Bayesian methods; Computer science; Cybernetics; Educational institutions; Feature extraction; Frequency; HTML; Humans; Machine learning; Neural networks; Search engines; Statistical analysis; Testing; Vocabulary; Writing; Meta predictors; Source attribution; authorship attribution; n-grams; naïve Bayesian; neural networks;
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
DOI :
10.1109/ICMLC.2006.258376