• DocumentCode
    2183506
  • Title

    The Partition Heuristic Information Extraction Algorithm of Unstructured Data

  • Author

    Cong Li ; Chengming Zou ; Luo Zhong ; Jinyang Zhu

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan, China
  • fYear
    2013
  • fDate
    16-19 Dec. 2013
  • Firstpage
    570
  • Lastpage
    576
  • Abstract
    In this paper, we propose a method that extracts attributes of given entity from unstructured data for the field of logistics by using the idea of divide and conquer as to the characters of logistics information. After the full study of logistics information, we make a statistical analysis for the text logistics information and summarize the common attributes of text information entity. According to the different attributes and attribute values, we divided text information entity by the idea of divide and conquer. As to the entity we get from last step we make an internal processing based on segmentation method of tagging and graph. We extracted valuable attributes and attribute values from the unstructured data. Experimental results show that this method is valid for the logistics information which we achieve from a well-known logistics system.
  • Keywords
    divide and conquer methods; information retrieval; logistics data processing; statistical analysis; text analysis; attribute extraction; divide and conquer method; graph; partition heuristic information extraction algorithm; statistical analysis; tagging segmentation method; text information entity; text logistics information; unstructured data; Cities and towns; Data mining; Educational institutions; Information retrieval; Logistics; Statistical analysis; Vehicles; divide-and-conquer method; extraction; information; logistics information; unstructured data; words segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on
  • Conference_Location
    Fuzhou
  • Print_ISBN
    978-1-4799-2829-3
  • Type

    conf

  • DOI
    10.1109/CLOUDCOM-ASIA.2013.104
  • Filename
    6821051