DocumentCode :
3447927
Title :
Extracting traffic information from web texts with a D-S evidence theory based approach
Author :
Peiyuan Qiu ; Feng Lu ; Hengcai Zhang
Author_Institution :
State Key Lab. of Resources & Environ. Inf. Syst., Inst. of Geographic Sci. & Natural Resources Res., Beijing, China
fYear :
2013
fDate :
20-22 June 2013
Firstpage :
1
Lastpage :
5
Abstract :
Web texts, such as web pages, BBS, or microblogs, usually contain a great amount of real-time traffic information, which can be expected to become an important data source for city traffic collection. However, due to the characteristics of ambiguity and uncertainty in the description of traffic condition with natural language, and the difference of description quality for web texts among various publishers and text types, there may exist much inconsistency, or even contradiction for the traffic condition on similar spatial-temporal contexts. An efficient information fusion process is crucial to take advantage of the mass web sources for real-time traffic collection. In this paper, we propose a traffic state extraction approach from massive web texts based on D-S evidence theory to solve the above problem. Firstly, an evaluation index system for the traffic state information collected from the web texts is built with the help of semantic similarity based on Wikipedia, to eliminate ambiguity. Then, D-S evidence theory is adopted to judge and fuse the extracted traffic state information, with evidence combination and decision, which can solve the problem of uncertainty and difference. An experiment shows that the presented approach can effectively judge the traffic state information contained in massive web texts, and can fully utilize the data from different websites. Meanwhile, the proposed approach is arguably more accurate than the traditional text clustering algorithm.
Keywords :
Web sites; information theory; natural language processing; pattern clustering; text analysis; traffic information systems; BBS; D-S evidence theory based approach; Web pages; Web sources; Web texts; Websites; Wikipedia; city traffic collection; data source; evaluation index system; information fusion process; microblogs; natural language; real-time traffic collection; real-time traffic information extraction; semantic similarity; spatial-temporal contexts; text clustering algorithm; traffic condition description; traffic state extraction approach; Data mining; Electronic publishing; Encyclopedias; Internet; Semantics; Uncertainty; D-S evidence theory; Wikipedia; text clustering; traffic state; web texts;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Geoinformatics (GEOINFORMATICS), 2013 21st International Conference on
Conference_Location :
Kaifeng
ISSN :
2161-024X
Type :
conf
DOI :
10.1109/Geoinformatics.2013.6626207
Filename :
6626207
Link To Document :
بازگشت