Title :
Finding Rare Web Pages by Relevancy and Atypicality in a Category
Author :
Yumoto, Takayuki ; Tada, Ryohei ; Nii, Manabu ; Sato, Kiminori
Author_Institution :
Grad. Sch. of Eng., Univ. of Hyogo, Himeji, Japan
fDate :
Aug. 31 2013-Sept. 4 2013
Abstract :
In this paper, we propose rarity of a Web page in a category given by a user to find useful information that a few people know. A rare Web page is a page that belongs to a given category and that is atypical in the category. We define a probability that the page is a rare Web page in the given category as a rarity score. The rarity score is a product of a relevancy score and an a typicality score. The relevancy is a probability that a Web page belongs to a category given by a user. The a typicality is a conditional probability that a page is atypical in the category when it belongs to the category. Both probabilities are calculated by using tags of social bookmark services and words in Web pages. We evaluated the proposed relevancy score by classifying whether Web pages belong to a certain category. We also evaluated the proposed rarity as a metric for ranking Web pages, and compared the rankings by relevancy and a typicality. We confirmed usefulness of the rarity score to find relevant and atypical pages.
Keywords :
pattern classification; probability; relevance feedback; social networking (online); atypical pages; atypicality score; classification task; conditional probability; rare Web page ranking; rarity metric; rarity score; relevancy score; relevant pages; social bookmark service tags; word tags; Databases; Equations; Games; Mathematical model; Probability; TV; Web pages; atypicality; probabilistic model; ranking; rarity; relevancy; social bookmark;
Conference_Titel :
Advanced Applied Informatics (IIAIAAI), 2013 IIAI International Conference on
Conference_Location :
Los Alamitos, CA
Print_ISBN :
978-1-4799-2134-8
DOI :
10.1109/IIAI-AAI.2013.27