• DocumentCode
    531386
  • Title

    MapMarker: Extraction of Postal Addresses and Associated Information for General Web Pages

  • Author

    Chang, Chia-Hui ; Li, Shu-Ying

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Central Univ., Jhongli, Taiwan
  • Volume
    1
  • fYear
    2010
  • fDate
    Aug. 31 2010-Sept. 3 2010
  • Firstpage
    105
  • Lastpage
    111
  • Abstract
    Address information is essential for people´s daily life. People often need to query addresses of unfamiliar location through Web and then use map services to mark down the location for direction purpose. Although both address information and map services are available online, they are not well combined. Users usually need to copy individual address from a Web site and paste it to another Web site with map services to locate its direction. Such copy and paste operations have to be repeated if multiple addresses are listed on a single page such as public school list or apartment list. Furthermore, associated information with individual address has to be copied and included on each marker for better comprehension. Our research is devoted to automate the above process and make the combination an easier task for users. The main techniques applied here include postal address extraction and associated information extraction. We apply sequence labeling algorithm based on Conditional Random Fields (CRFs) to train models for address extraction. Meanwhile, using the extracted addresses as landmarks, we apply pattern mining to identify the boundaries of address blocks and extract associated information with each individual address. The experimental result shows high F-score at 91% for postal address extraction and 87% accuracy for associated information extraction.
  • Keywords
    Internet; cartography; geographic information systems; MapMarker; Web site; associated information extraction; conditional random fields; general Web pages; map services; postal address extraction; sequence labeling algorithm; address extraction; associated information extraction; conditional random fields;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
  • Conference_Location
    Toronto, ON
  • Print_ISBN
    978-1-4244-8482-9
  • Electronic_ISBN
    978-0-7695-4191-4
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2010.64
  • Filename
    5616209