Author/Authors :
Tian, Xiaolu Department of Medical Statistics and Epidemiology & Health Information Research Center & Guangdong Key Laboratory of Medicine - School of Public Health - Sun Yat-sen University - Guangzhou, China , Chong, Yutian Department of Infectious Diseases - The Third Affiliated Hospital - Sun Yat-sen University - Guangzhou, China , Huang, Yutao School of Data and Computer Science - Sun Yat-sen University - Guangzhou, China , Guo, Pi Department of Public Health - Medical College of Shantou University - Shantou, China , Li, Mengjie Department of Medical Statistics and Epidemiology & Health Information Research Center & Guangdong Key Laboratory of Medicine - School of Public Health - Sun Yat-sen University - Guangzhou, China , Zhang, Wangjian Department of Environmental Health Sciences - School of Public Health - University at Albany - State University of New York - Rensselaer, USA , Du, Zhicheng Department of Medical Statistics and Epidemiology & Health Information Research Center & Guangdong Key Laboratory of Medicine - School of Public Health - Sun Yat-sen University - Guangzhou, China , Li, Xiangyong Department of Infectious Diseases - The Third Affiliated Hospital - Sun Yat-sen University - Guangzhou, China , Hao, Yuantao Sun Yat-sen University - Guangzhou, China
Abstract :
Hepatitis B surface antigen (HBsAg) seroclearance during treatment is associated with a better prognosis among patients with
chronic hepatitis B (CHB). Significant gaps remain in our understanding on how to predict HBsAg seroclearance accurately and
efficiently based on obtainable clinical information. -is study aimed to identify the optimal model to predict HBsAg seroclearance. We obtained the laboratory and demographic information for 2,235 patients with CHB from the South China Hepatitis
Monitoring and Administration (SCHEMA) cohort. HBsAg seroclearance occurred in 106 patients in total. We developed models
based on four algorithms, including the extreme gradient boosting (XGBoost), random forest (RF), decision tree (DCT), and
logistic regression (LR). The optimal model was identified by the area under the receiver operating characteristic curve (AUC). The
AUCs for XGBoost, RF, DCT, and LR models were 0.891, 0.829, 0.619, and 0.680, respectively, with XGBoost showing the best
predictive performance. -e variable importance plot of the XGBoost model indicated that the level of HBsAg was of high
importance followed by age and the level of hepatitis B virus (HBV) DNA. Machine learning algorithms, especially XGBoost, have
appropriate performance in predicting HBsAg seroclearance. The results showed the potential of machine learning algorithms for
predicting HBsAg seroclearance utilizing obtainable clinical data.
Keywords :
Algorithms , Machine , CHB , DNA