DocumentCode
1843677
Title
Data extraction and cleansing of semi-structured Chinese texts
Author
Zhu, Wei-heng ; Long, Shun
Author_Institution
Dept. of Comput. Sci., Jinan Univ., Guangzhou, China
Volume
1
fYear
2011
fDate
13-15 May 2011
Firstpage
726
Lastpage
729
Abstract
The rapid growth of data mining generates an ever-increasing demand for automatic information extraction from Chinese texts. However, existing approaches in this domain focus on well-structured Chinese texts and therefore have difficulties in dealing with semi-structured Chinese texts which do not conform to strict syntactic structures. We propose in this paper an approach to semi-automatic data extraction and cleansing for these texts. Preliminary experimental results show that, with modest manual intervention, it can effectively extract information from raw semi-structured Chinese texts collected from e-business applications.
Keywords
business data processing; data mining; information retrieval; natural language processing; text analysis; automatic information extraction; data mining; e-business application; semiautomatic data extraction; semistructured Chinese text; text cleansing; Data mining; Data warehouses; Manuals; Merchandise; Semantics; Syntactics; Terminology; Chinese; data cleansing; data extraction; manual intervention; semi-structured text;
fLanguage
English
Publisher
ieee
Conference_Titel
Business Management and Electronic Information (BMEI), 2011 International Conference on
Conference_Location
Guangzhou
Print_ISBN
978-1-61284-108-3
Type
conf
DOI
10.1109/ICBMEI.2011.5917038
Filename
5917038
Link To Document