• DocumentCode
    3714479
  • Title

    A method for imputation of semantic class in diagnostic radiology text

  • Author

    Eamon Johnson;W. Christopher Baughman;Gultekin Ozsoyoglu

  • Author_Institution
    Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA
  • fYear
    2015
  • Firstpage
    750
  • Lastpage
    755
  • Abstract
    Diagnostic medicine produces large volumes of free-text reports used primarily for communication between medical professionals. Secondary use of these reports requires extraction of structured information from the free text. State-of-the-art computational natural language processing techniques can make partial identification of semantics in text, but the diverse terminology used in medical settings makes training classifiers for every lexicon a laborious task. We present statistics of semantics from a large-scale machine-annotated corpus of 83,452 chest x-ray reports. We show that the distribution of semantics is consistent with Zipfian distributions observed in other natural language corpora, and we quantify the semantic focus imparted by limiting a study by body area and modality. We demonstrate that within our semantically focused corpus, pairwise co-occurrence statistics can be used to accurately impute the semantic class for frequently occurring unknown entities, thereby reducing the number of semantically unclassified phrases by up to 25%. Finally, we show that our imputation approach is consistent across multiple reconstructions of the underlying text data.
  • Keywords
    "Diseases","Semantics","Syntactics","Medical diagnostic imaging","Head","Image resolution"
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BIBM.2015.7359780
  • Filename
    7359780