Title :
SVM initiative learning algorithm research for meta-information obtaining
Author :
Ding, Junping ; Cai, Wandong
Author_Institution :
Sch. of Comput. Sci. & Eng., Northwestern Polytech. Univ., Xi´´an, China
Abstract :
The crawler technique adopts as same obtaining technique as common URL instead of telling the difference between webpage URL and seed file URL when the traditional crawler technique crawling the seed file, so it cannot meet the requirement of active monitoring model both in efficiency and accuracy. The traditional SVM classification algorithm was improved as per the requirement of active monitoring model: classified URL by powered least squares support vector machine, and directly obtained the seed file URL by file-download technique as per the classified result. The contents of improved classification algorithm are summarized as follows: collecting the sample information, discretizing and normalizing its feature vector and obtaining it; then obtaining the key parameter of the algorithm by learning it; finally classifying URL. As the test indicates, this improved algorithm can meet the requirement of active monitoring model.
Keywords :
Internet; Web sites; information retrieval; support vector machines; SVM classification algorithm; SVM initiative learning algorithm research; Web page URL; active monitoring model; crawler technique; crawling; file-download technique; meta-information obtaining; seed file URL; support vector machine; Algorithm design and analysis; Classification algorithms; Crawlers; Heuristic algorithms; Joining processes; Monitoring; Support vector machines; Crawler algorithm; Feature vector representation; Least squares support vector machine; Meta-information; P2P monitoring;
Conference_Titel :
Signal Processing, Communications and Computing (ICSPCC), 2011 IEEE International Conference on
Conference_Location :
Xi´an
Print_ISBN :
978-1-4577-0893-0
DOI :
10.1109/ICSPCC.2011.6061660