DocumentCode
2924876
Title
Automatic approaches to clustering occupational description data for prediction of probability of workplace exposure to beryllium
Author
Slutsky, A. ; Yuan An ; Hu, T. ; Burstyn, I.
Author_Institution
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA, USA
fYear
2011
fDate
8-10 Nov. 2011
Firstpage
596
Lastpage
601
Abstract
We investigated automatic approaches for clustering data that describes occupations related to hazardous airborne exposure (beryllium). The regulatory compliance data from Occupational Safety and Health Administration includes records containing short free text job descriptions and associated numerical exposure levels. Researchers in public health domain need to map job descriptions to Standard Occupational Classification (SOC) nomenclature for estimating occupational health risks. Previous manual process was time-consuming and did not advance so far to linkage to SOC. We investigated alternative automatic approaches for clustering job descriptions. The clustering results are the first essential step towards discovery of corresponding SOC terms. Our study indicated that the Tolerance Rough Set with Jaccard similarity was a better combination overall. The utility of the algorithm was further verified by applying logistic regression and validating that the predictive power of the automatically generated classifications, in terms of association of “job” with probability of exposure to beryllium above certain threshold, closely approached that of the manually assembled classification of the same 12,148 records.
Keywords
air pollution; beryllium; occupational health; occupational safety; pattern classification; pattern clustering; probability; regression analysis; rough set theory; Jaccard similarity; automatic occupational description data clustering approaches; beryllium; hazardous airborne exposure; logistic regression; occupational health; occupational health risk estimation; occupational safety; regulatory compliance data; short free text job descriptions; standard occupational classification; tolerance rough set; workplace exposure probability prediction; Classification algorithms; Clustering algorithms; Educational institutions; Manuals; Occupational safety; Partitioning algorithms; Welding;
fLanguage
English
Publisher
ieee
Conference_Titel
Granular Computing (GrC), 2011 IEEE International Conference on
Conference_Location
Kaohsiung
Print_ISBN
978-1-4577-0372-0
Type
conf
DOI
10.1109/GRC.2011.6122664
Filename
6122664
Link To Document