Abstract :
Density and accuracy of network entity landmarks are an important foundation of IP geolocation. For the existing problems of the limited quantity and low reliability of landmarks mined by current landmark mining methods, an algorithm of city-level landmark mining based on Internet forum is proposed in this paper. Firstly, the basic principle of Web-Based landmark mining methods and their existing flaws are analyzed, and then according to existing a huge amount of individual IP addresses in the Internet forums, a technical framework of network entity landmark mining based on the Internet forum is given, Next, the Internet forum selection strategy, IP addresses extraction, IP addresses screening and other major processing steps are described respectively for two major parts of the framework, including landmark extraction algorithm and landmark evaluation algorithm. The classic GeoTrack, a network entity geolocation algorithm, is improved and used for evaluating the candidate landmarks. Finally, the feasibility of our framework and algorithm are studied from 2 aspects: forum selection strategy, and Forum-Based landmark mining algorithm. Experimental results based on 27 Internet forums of 3 types forums in 3 cities show that compared with the classic Web-Based landmark mining methods, the proposed algorithm can not only mine huge amounts of city-level landmarks, but also improve the city-level network entity geolocation accuracy obviously.
Keywords :
"IP networks","Geology","Web servers","Reliability","Data mining","HTML"