• DocumentCode
    1843677
  • Title

    Data extraction and cleansing of semi-structured Chinese texts

  • Author

    Zhu, Wei-heng ; Long, Shun

  • Author_Institution
    Dept. of Comput. Sci., Jinan Univ., Guangzhou, China
  • Volume
    1
  • fYear
    2011
  • fDate
    13-15 May 2011
  • Firstpage
    726
  • Lastpage
    729
  • Abstract
    The rapid growth of data mining generates an ever-increasing demand for automatic information extraction from Chinese texts. However, existing approaches in this domain focus on well-structured Chinese texts and therefore have difficulties in dealing with semi-structured Chinese texts which do not conform to strict syntactic structures. We propose in this paper an approach to semi-automatic data extraction and cleansing for these texts. Preliminary experimental results show that, with modest manual intervention, it can effectively extract information from raw semi-structured Chinese texts collected from e-business applications.
  • Keywords
    business data processing; data mining; information retrieval; natural language processing; text analysis; automatic information extraction; data mining; e-business application; semiautomatic data extraction; semistructured Chinese text; text cleansing; Data mining; Data warehouses; Manuals; Merchandise; Semantics; Syntactics; Terminology; Chinese; data cleansing; data extraction; manual intervention; semi-structured text;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Business Management and Electronic Information (BMEI), 2011 International Conference on
  • Conference_Location
    Guangzhou
  • Print_ISBN
    978-1-61284-108-3
  • Type

    conf

  • DOI
    10.1109/ICBMEI.2011.5917038
  • Filename
    5917038