Title :
Text Comprehensiveness Ranking
Author :
Ghaluh Indah P. S;Junaidillah Fadlil;Rudy Cahyadi H. P;Hsing-Kuo Pao
Author_Institution :
Dept. of Comput. Sci. &
Abstract :
When we use a search engine to find interesting texts for read, we often find something that could be too difficult to follow or too easy for us to learn anything interesting. In this work, we propose an algorithm for text ranking based on the text comprehensiveness, such as we can rank texts from the most difficult one to the easiest one. Given the ranking result, a high school student and a researcher may find texts of different comprehensiveness levels to read even their queries are identical. Specifically, given a set of articles with different comprehensiveness levels, the proposed ranking method can recursively separate articles into different groups if they are with different comprehensiveness levels. The comprehensiveness measure is based on the observation that given two groups of articles of the same subject but not the same comprehensiveness level, easy articles may not use the terms that are frequently used in difficult articles, while difficult articles may still use the terms that could be used by easy articles. We tested the measure in an article database that consists of articles of different comprehensiveness levels and different subjects. The result shows that the proposed ranking method can recursively separate texts of different comprehensiveness levels with very high accuracy. The algorithm can also separate two article groups where each has a mixed comprehensiveness level. Based on an EM-like procedure, we can gradually refine the result to filter out the article set that is considered more difficult than the rest of the articles when the procedure converges. We also tested the proposed method in various databases including the CCSS corpus and an article database that consists of research articles from journals and magazines.
Keywords :
"Frequency measurement","Search engines","Indexing","Education","Collaboration","Heuristic algorithms"
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on
DOI :
10.1109/WI-IAT.2015.195