DocumentCode :
3147858
Title :
Extraction approach of hypertext information based on regular expression
Author :
Liu, Ya-Shu ; Li, Ming-Zhuo
Author_Institution :
Dept. of Comput. Sci., Beijing Univ. of Civil Eng. & Archit., Beijing, China
fYear :
2011
fDate :
16-18 April 2011
Firstpage :
3181
Lastpage :
3184
Abstract :
Hypertext is the most popular file format on the Internet, which has simple formal standard, and has the feature of non-continuity. This paper studies how to extract information from hypertext by regular expression, gives results of how to extracting weather information from the weather forecasting pages of sohu, sina and tencent. This paper gives the feasible extraction approach of hyper text, which is very useful in Chinese Information Processing and the research-base of the research engine.
Keywords :
formal languages; hypermedia; Chinese information processing; Internet; hypertext file format; hypertext information extraction approach; regular expression; research engine; Civil engineering; Computer architecture; Data mining; Feature extraction; Meteorology; Pattern matching; Web pages; Chinese Information Processing; Hypertext; Information Extraction; Regular Expression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Consumer Electronics, Communications and Networks (CECNet), 2011 International Conference on
Conference_Location :
XianNing
Print_ISBN :
978-1-61284-458-9
Type :
conf
DOI :
10.1109/CECNET.2011.5768216
Filename :
5768216
Link To Document :
بازگشت