DocumentCode :
134209
Title :
Linear model incorporating feature ranking for Chinese documents readability
Author :
Gang Sun ; Zhiwei Jiang ; Qing Gu ; Daoxu Chen
Author_Institution :
State Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
fYear :
2014
fDate :
12-14 Sept. 2014
Firstpage :
29
Lastpage :
33
Abstract :
Assessing the readability of documents is always a rewarding work. In this paper, we apply linear regression models for readability assessment of Chinese documents, and put forward LiFR (Linear model incorporating Feature Ranking), which uses feature ranking to select the most appropriate text features to build the linear model. Text features specialized for Chinese are developed, which include the surface, part of speech, parse tree and entropy features. The experimental results demonstrate that both linear and log-linear regression models are worthy of confidence for readability assessment, and can achieve competitive performance to other machine learning methods, such as SVR (Support Vector Machine for Regression). Also the designed features are valuable, and feature ranking is essential to build useful linear functions.
Keywords :
learning (artificial intelligence); natural language processing; regression analysis; speech processing; support vector machines; Chinese document readability assessment; LiFR; SVR; entropy feature; feature ranking; log-linear regression model; machine learning; parse tree; part of speech; support vector machine; text features; Abstracts; History; Learning systems; Manganese; Measurement; Software; Training; Chinese; Feature Ranking; Linear Regression Models; Readability Assessment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
Type :
conf
DOI :
10.1109/ISCSLP.2014.6936601
Filename :
6936601
Link To Document :
بازگشت