Title :
Extraction approach of hypertext information based on regular expression
Author :
Liu, Ya-Shu ; Li, Ming-Zhuo
Author_Institution :
Dept. of Comput. Sci., Beijing Univ. of Civil Eng. & Archit., Beijing, China
Abstract :
Hypertext is the most popular file format on the Internet, which has simple formal standard, and has the feature of non-continuity. This paper studies how to extract information from hypertext by regular expression, gives results of how to extracting weather information from the weather forecasting pages of sohu, sina and tencent. This paper gives the feasible extraction approach of hyper text, which is very useful in Chinese Information Processing and the research-base of the research engine.
Keywords :
formal languages; hypermedia; Chinese information processing; Internet; hypertext file format; hypertext information extraction approach; regular expression; research engine; Civil engineering; Computer architecture; Data mining; Feature extraction; Meteorology; Pattern matching; Web pages; Chinese Information Processing; Hypertext; Information Extraction; Regular Expression;
Conference_Titel :
Consumer Electronics, Communications and Networks (CECNet), 2011 International Conference on
Conference_Location :
XianNing
Print_ISBN :
978-1-61284-458-9
DOI :
10.1109/CECNET.2011.5768216