DocumentCode
2127475
Title
A Web entity activity recognition approach based on k-nearest neighbors classifier
Author
Liu, Hui ; Zhang, Chuanyan
Author_Institution
Sch. of Mech. Eng., Shandong Univ., Jinan, China
fYear
2012
fDate
21-23 April 2012
Firstpage
848
Lastpage
852
Abstract
Based on the traditional information extraction, this paper puts forward an approach to recognizing entity activity form Web. Entity activity which describes the behaviour or action information about the entity is valuable to recognize for search engine, business intelligence, data integration and so on. Since Web entity activities, as multi-relationships, are contained in unstructured text in the Web, recognizing entity activities is a difficult problem in current information extraction area. In this paper, we rigorously define the recognition task firstly and propose a novel recognition framework to extract entity activities form Web. Firstly, this paper proposes a k-nearest neighbors classifier based on sentence similarity to discover the sentences which contain web entity activity. Then, according to dependency parsing feature, we use a set of heuristic rules to extract the information from sentences. Experimental results demonstrate the feasibility and effectiveness of our proposed approach and it can adapt to multi-domains.
Keywords
Internet; grammars; information retrieval; pattern classification; text analysis; Web entity activity recognition; action information; business intelligence; data integration; dependency parsing feature; heuristic rule; information extraction; k-nearest neighbors classifier; recognition task; search engine; sentence similarity; unstructured text; Bayesian methods; Companies; Data mining; Educational institutions; Feature extraction; Text recognition; Web pages; information extraction; k-nearest neighbors classifier; web entity activity;
fLanguage
English
Publisher
ieee
Conference_Titel
Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on
Conference_Location
Yichang
Print_ISBN
978-1-4577-1414-6
Type
conf
DOI
10.1109/CECNet.2012.6202019
Filename
6202019
Link To Document