DocumentCode :
2118162
Title :
Ranking Text Documents Based on Conceptual Difficulty Using Term Embedding and Sequential Discourse Cohesion
Author :
Jameel, Sakar ; Wai Lam ; Xiaojun Qian
Author_Institution :
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Hong Kong, China
Volume :
1
fYear :
2012
fDate :
4-7 Dec. 2012
Firstpage :
145
Lastpage :
152
Abstract :
We propose a novel framework for determining the conceptual difficulty of a domain-specific text document without using any external lexicon. Conceptual difficulty relates to finding the reading difficulty of domain-specific documents. Previous approaches to tackling domain-specific readability problem have heavily relied upon an external lexicon, which limits the scalability to other domains. Our model can be readily applied in domain-specific vertical search engines to re-rank documents according to their conceptual difficulty. We develop an unsupervised and principled approach for computing a term´s conceptual difficulty in the latent space. Our approach also considers transitions between the segments generated in sequence. It performs better than the current state-of-the-art comparative methods.
Keywords :
text analysis; domain specific readability problem; domain specific text document; domain specific vertical search engines; external lexicon; sequential discourse cohesion; term conceptual difficulty; term embedding; text document ranking; Conceptual Difficulty; K-means; LSI; Term Embedding;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on
Conference_Location :
Macau
Print_ISBN :
978-1-4673-6057-9
Type :
conf
DOI :
10.1109/WI-IAT.2012.235
Filename :
6511877
Link To Document :
بازگشت