Title :
Fast Information Retrieval and Social Network Mining via Cosine Similarity Upper Bound
Author :
Weizhong Zhao ; Martha, V.S. ; Gang Chen ; Xiaowei Xu
Author_Institution :
Coll. of Inf. Eng., Xiangtan Univ., Xiangtan, China
Abstract :
Similarity search is a key function for many applications including databases, pattern recognition and recommendation systems to name a few. In this paper, we first propose ε-query, a similarity search based on the popular cosine similarity for information retrieval and social network analysis. In contrast to traditional similarity search ε-query returns results whose cosine similarities with the query are larger than a threshold ε. The major contribution of this paper is an efficient ε-query processing algorithm by using an upper bound for binary data. Our evaluation using two of the largest publicly available real datasets, ClueWeb09 and Twitter, demonstrated that the proposed method could achieve several orders of magnitude speedup in comparison with the traditional approach. Last but not least, we applied the proposed method for information retrieval from ClueWeb and finding community structures from Twitter. The outcome further proved the effectiveness of the proposed method.
Keywords :
data mining; query processing; social networking (online); ε-query processing algorithm; ClueWeb09; Twitter; cosine similarity upper bound; databases; fast information retrieval; pattern recognition; recommendation systems; similarity search; social network mining; Communities; Complexity theory; Image edge detection; Twitter; Upper bound;
Conference_Titel :
Social Computing (SocialCom), 2013 International Conference on
Conference_Location :
Alexandria, VA
DOI :
10.1109/SocialCom.2013.147