• DocumentCode
    2152612
  • Title

    Automatic Recognition of Text Difficulty from Consumers Health Information

  • Author

    Wang, Yunli

  • Author_Institution
    Inst. for Inf. Technol., Nat. Res. Council of Canada, Fredericton, NB
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    131
  • Lastpage
    136
  • Abstract
    Internet is used as one of major sources of health information. However, some studies show that the readability of health information presented on health Web sites is difficult for many consumers. Readability formulas usually measure difficulty of writing style, instead of difficulty of content. In order to recommend health information with appropriate reading level to consumers, we investigate the feasibility of identifying text difficulty of health information using machine learning methods. Support vector machine is used to classify consumer health information into easy to read and reading level for the general public. Three feature sets: surface linguistic features, word difficulty features, unigrams and their combinations are compared in terms of classification accuracy. Unigram features alone reach an accuracy of 80.71%, and the combination of three feature sets is the most effective in classification with accuracy of 84.06%. They are significantly better than surface linguistic features, word difficulty features and their combination
  • Keywords
    Internet; learning (artificial intelligence); medical information systems; natural languages; support vector machines; text analysis; Internet; consumers health information; health Web sites; machine learning methods; readability formulas; support vector machine; surface linguistic features; text difficulty; unigrams; word difficulty features; Councils; Information technology; Internet; Learning systems; Readability metrics; Support vector machine classification; Support vector machines; Text categorization; Text recognition; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer-Based Medical Systems, 2006. CBMS 2006. 19th IEEE International Symposium on
  • Conference_Location
    Salt Lake City, UT
  • ISSN
    1063-7125
  • Print_ISBN
    0-7695-2517-1
  • Type

    conf

  • DOI
    10.1109/CBMS.2006.58
  • Filename
    1647558