DocumentCode :
3570876
Title :
Toward inferring the age of Twitter users with their use of nonstandard abbreviations and lexicon
Author :
Moseley, Nathaniel ; Alm, Cecilia Ovesdotter ; Rege, Manjeet
Author_Institution :
Dept. of Comput. Sci., Rochester Inst. of Technol., Rochester, NY, USA
fYear :
2014
Firstpage :
219
Lastpage :
226
Abstract :
Automatically determining demographic profile attributes of writers with high accuracy, based on their texts, can be useful for a range of application domains, including smart ad placement, security, the discovery of predator behaviors, enabling automatic enhancement of participants profiles for extended analysis, and various other applications. Attributes such as author gender can be determined with some amount of success from many sources, using various methods, such as analysis of shallow linguistic patterns or topic. Author age is more difficult to determine, but previous research has been somewhat successful at classifying age as a binary (e.g. over or under 30), ternary, or even as a continuous variable using various techniques. In this work, we show that word and phrase abbreviation patterns can be used toward determining user age using novel binning. Notable results include classification accuracy of up to 82.8%, which was 67.0% above relative majority class baseline when classifying user ages into 10 equally sized age bins using a support vector machine classifier and PCA extracted features (including n-grams) and 50.8% (33.7% above baseline) when using only abbreviation features. Also presented is an analysis of the evident change in abbreviation use over time on Twitter.
Keywords :
feature extraction; pattern classification; principal component analysis; social networking (online); support vector machines; text analysis; Lexicon; PCA extracted features; Twitter users; author gender; nonstandard abbreviations; shallow linguistic pattern analysis; smart ad placement; support vector machine classifier; word and phrase abbreviation pattern; Accuracy; Feature extraction; Pragmatics; Principal component analysis; Standards; Twitter; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on
Type :
conf
DOI :
10.1109/IRI.2014.7051893
Filename :
7051893
Link To Document :
بازگشت