• DocumentCode
    1992228
  • Title

    Automatically Extracting Acronyms from Biomedical Text

  • Author

    Fox, Jared ; Brown, Nik

  • Author_Institution
    California Univ., Los Angeles
  • fYear
    2007
  • fDate
    14-17 Oct. 2007
  • Firstpage
    1245
  • Lastpage
    1248
  • Abstract
    Acronyms are widely used in biomedical literature. Knowledge of acronyms and their corresponding long forms is a fundamental pre-processing step in the mining of information from biomedical text. In this paper, we present a set of heuristics for finding acronyms and their long forms in biomedical texts. Our accurate and scalable heuristics allowed us to determine over 14,000 acronyms/long form pairs in 6 million English language PubMed abstracts. The overall precision of our term finding heuristics was 97.9%. Our heuristics found about 12,000 novel acronyms that were not in the SPECIALIST lexicon, potentially increasing the SPECIALIST lexicon acronym list by 125%. Our stemmed down acronym list is at http://www.computationalbiology.info/.
  • Keywords
    heuristic programming; medical computing; text analysis; English language PubMed abstracts; SPECIALIST lexicon; acronyms; automatic extraction; biomedical text; heuristics; Abstracts; Colon; Computer science; Data mining; Diseases; Drugs; Natural languages; Proteins; Testing; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
  • Conference_Location
    Boston, MA
  • Print_ISBN
    978-1-4244-1509-0
  • Type

    conf

  • DOI
    10.1109/BIBE.2007.4375724
  • Filename
    4375724