• DocumentCode
    3301133
  • Title

    Extraction and visualization of numerical and named entity information from a large number of documents

  • Author

    Murata, Masaki ; Ma, Qing ; Torisawa, Kentaro ; Iwatate, Masakazu ; Shirado, Tamotsu ; Ichii, Koji ; Kanamaru, Toshiyuki

  • Author_Institution
    NICT, Kyoto
  • fYear
    2008
  • fDate
    19-22 Oct. 2008
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    We have developed a system that can semi automatically extract numerical and named entity sets from a large number of Japanese documents and can create various kinds of tables and graphs. In our experiments, our system has semiautomatically created approximately 300 kinds of graphs and tables at precisions of 0.2-0.8 with only two hours of manual preparation from a two-year stack of newspapers articles. Note that these newspaper articles contained a large quantity of data, and all of them could not be read or checked manually in such a short amount of time. From this perspective, we concluded that our system is useful and convenient for extracting information from a large number of documents.
  • Keywords
    data mining; data visualisation; document handling; information analysis; Japanese documents; named entity information extraction; named entity sets; newspapers articles; numerical extraction; numerical visualization; Data mining; Databases; Humans; Humidity; Scattering; Temperature; Typhoons; Visualization; Wind speed; Visualization; graph; named entity; numerical information; table;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-4515-8
  • Electronic_ISBN
    978-1-4244-2780-2
  • Type

    conf

  • DOI
    10.1109/NLPKE.2008.4906795
  • Filename
    4906795