DocumentCode :
3740493
Title :
f-Fractional Bit Minwise Hashing for Large-Scale Learning
Author :
Jingjing Tang;Yingjie Tian
Author_Institution :
Sch. of Math. Sci., Univ. of Chinese Acad. of Sci., Beijing, China
Volume :
3
fYear :
2015
Firstpage :
60
Lastpage :
63
Abstract :
Recently, document similarity detection technology captures a host of researchers´ attention. In this paper, we propose to integrate linear SVM with f-fractional bit minwise hashing to make a wide range of choices for accuracy and storage space requirements. According to the derived properties of f-fractional bit minwise hashing, we obtained the optimal combination of fractional bit with the minimum estimator of variances and ultimately applied it to the process of integration. The innovation of this algorithm is the continuous selectivity of bit instead of the discrete integer value, which not only improves the theoretical system of b-bit minwise hashing SVM algorithm, but also satisfies the various needs of accuracy and storage space in the practical system. Due to the nonlinear of the resemblance matrix considered as kernel matrix, it can not be used in linear SVM training directly. However, in the theoretical analysis, we provide the proof of positive definiteness for resemblance matrix generated by f-fractional bit minwise hashing scheme, which is a logical and feasible basis for the integration. Meanwhile, experimental results on publicly available large-scale datasets validate the effectiveness of this algorithm.
Keywords :
"Support vector machines","Algorithm design and analysis","Optimized production technology","Radio frequency","Matrix decomposition","Matrix converters","Kernel"
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on
Type :
conf
DOI :
10.1109/WI-IAT.2015.104
Filename :
7397423
Link To Document :
بازگشت