Title :
Automatic approaches to clustering occupational description data for prediction of probability of workplace exposure to beryllium
Author :
Slutsky, A. ; Yuan An ; Hu, T. ; Burstyn, I.
Author_Institution :
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA, USA
Abstract :
We investigated automatic approaches for clustering data that describes occupations related to hazardous airborne exposure (beryllium). The regulatory compliance data from Occupational Safety and Health Administration includes records containing short free text job descriptions and associated numerical exposure levels. Researchers in public health domain need to map job descriptions to Standard Occupational Classification (SOC) nomenclature for estimating occupational health risks. Previous manual process was time-consuming and did not advance so far to linkage to SOC. We investigated alternative automatic approaches for clustering job descriptions. The clustering results are the first essential step towards discovery of corresponding SOC terms. Our study indicated that the Tolerance Rough Set with Jaccard similarity was a better combination overall. The utility of the algorithm was further verified by applying logistic regression and validating that the predictive power of the automatically generated classifications, in terms of association of “job” with probability of exposure to beryllium above certain threshold, closely approached that of the manually assembled classification of the same 12,148 records.
Keywords :
air pollution; beryllium; occupational health; occupational safety; pattern classification; pattern clustering; probability; regression analysis; rough set theory; Jaccard similarity; automatic occupational description data clustering approaches; beryllium; hazardous airborne exposure; logistic regression; occupational health; occupational health risk estimation; occupational safety; regulatory compliance data; short free text job descriptions; standard occupational classification; tolerance rough set; workplace exposure probability prediction; Classification algorithms; Clustering algorithms; Educational institutions; Manuals; Occupational safety; Partitioning algorithms; Welding;
Conference_Titel :
Granular Computing (GrC), 2011 IEEE International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-4577-0372-0
DOI :
10.1109/GRC.2011.6122664