Title :
Sensitive Information Acquisition Based on Machine Learning
Author :
Shang, Wenqian ; Liu, Hongjia ; Lv, Rui
Author_Institution :
Sch. of Comput., Commun. Univ. of China, Beijing, China
Abstract :
With the rapid development of Internet, online information has greatly enriched. The Internet becomes a vast treasure of information, but simultaneously it is also flooding various trash information, such as: viruses, Trojans, violence, pornography, gambling and so on. The hostile forces outside of country and criminal elements are using the Internet to engage in illegal activities that endanger national security. So how to recognize this information to find the corresponding website and to carry on the effective supervision has become an urgent problem. For these reasons, this paper designs a new web information extraction system, which calls the extraction rule corresponding to the template by calculating the structural similarity among pages. In addition, a new method based on STU-DOM tree to construct decision tree is proposed. This method can use the classification of decision tree to determine sensitive information node.
Keywords :
Internet; Web sites; computer crime; computer viruses; decision trees; information dissemination; information retrieval; information retrieval systems; learning (artificial intelligence); national security; pattern classification; Internet; STU-DOM tree; Trojans; Web information extraction system; Website; criminal elements; decision tree classification; extraction rule; gambling; illegal activities; machine learning; national security; online information; pornography; sensitive information acquisition; structural similarity; trash information; violence; viruses; Accuracy; Algorithm design and analysis; Data mining; Databases; Decision trees; Feature extraction; Information retrieval; DOM; information extraction; machine learning; sensitive information;
Conference_Titel :
Industrial Control and Electronics Engineering (ICICEE), 2012 International Conference on
Conference_Location :
Xi´an
Print_ISBN :
978-1-4673-1450-3
DOI :
10.1109/ICICEE.2012.296