DocumentCode :
2970583
Title :
Research of machine learning method for specific information recognition on the Internet
Author :
Zheng, Dequan ; Hu, Yi ; Zhao, Tiejun ; Yu, Hao ; Li, Sheng
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., China
fYear :
2002
fDate :
2002
Firstpage :
229
Lastpage :
234
Abstract :
With the available resources on the Internet becoming plentiful, a large amount of harmful information is permeating in and has been seriously affecting people´s normal work and living. Therefore, harmful data streams must be recognized and filtered out effectively. After analyzing some harmful contents in Internet information streams, we present a new method, which recognizes specific information by machine learning (ML). We extracted key information from a number of corpuses through the ML method to obtain the part of speech (POS) transfer-form for key information by learning from corpuses, which is based on the same pronunciation matching of key information. Furthermore, the testing value of key information will be obtained in a real corpus to examine the likelihood between matching rules from information streams and those learnt from corpuses through the average value of POS transfer probability of key information. Therefore, the testing value for the whole real data stream will be obtained The experiment proved that the method was efficient for recognizing certain Internet harmful information.
Keywords :
Internet; information retrieval; learning (artificial intelligence); Internet; ML method; POS transfer probability; harmful data stream recognition; harmful information; machine learning; machine learning method; part of speech transfer-form; pronunciation matching; real data stream; specific information recognition; testing value; Character recognition; Humans; IP networks; Information analysis; Information filtering; Information filters; Information resources; Internet; Learning systems; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conference on
Print_ISBN :
0-7695-1834-6
Type :
conf
DOI :
10.1109/ICMI.2002.1166998
Filename :
1166998
Link To Document :
بازگشت