Applied research of the impact of air pollution on absenteeism in students with respiratory issues through machine learning analysis
-
摘要:
目的 探讨机器学习预测模型在学生因大气污染引起呼吸系统症状缺课短期序列中的应用性能,以期为学校疾病发生的早期预警提供方法学参考。 方法 基于江苏省2019年9月—2022年10月学生因呼吸系统症状缺课短期序列数据,集成大气污染物平均浓度数据,结合单因素分布滞后非线性模型筛选大气污染物最优滞后变量,构建极端梯度提升(XGBoost)算法模型预测学生因呼吸系统症状缺课频数,并与季节性自回归综合移动平均外生(SARIMAX)模型进行比较。 结果 2019—2022年江苏省日均因呼吸系统症状缺课学生9 709名,大气指标日均空气质量指数(AQI)为76.96,PM2.5、PM10、NO2以及O3的日均质量浓度分别为35.75,61.13,28.89,104.81 μg/m3。格兰杰因果检验显示,AQI、PM2.5、PM10、NO2和O3均是因呼吸系统症状缺课频数序列的预测因素(F值分别为1.46,1.79,1.67,3.41,2.18,P值均 < 0.01)。PM2.5、PM10、NO2和O3单日滞后效应RR值分别在lag4、lag0、lag0、lag4时达到峰值。结合大气污染物最优滞后变量的XGBoost模型与SARIMAX模型相比,平均绝对误差(MAE)指标由2.251降低至0.475、平均绝对百分比误差(MAPE)指标由0.429降低至0.080、均方根误差(RMSE)指标由2.582降低至0.713。预警阈值为P75时,XGBoost模型与SARIMAX模型相比,灵敏度由0.086提升至0.694、特异度由0.979提升至0.988、约登指数由0.065提升至0.682。 结论 XGBoost模型在预测学生因大气污染引起呼吸系统症状缺课短期序列方面有较好的预测性能和预警效果。学校可适时采用该模型,及早发现疾病流行进行预警及防控,完善学校卫生工作。 Abstract:Objective To explore the performance of machine learning prediction models in forecasting student absenteeism due to respiratory symptoms caused by air pollution in short term, aiming to provide a methodological reference for early warning systems of school diseases. Methods Utilizing data from short-term sequences of student absenteeism due to respiratory symptoms in Jiangsu Province from September 2019 to October 2022, the study integrated average concentrations of atmospheric pollutants. A univariate distributed lag nonlinear model was employed to select optimal lag variables for the pollutants. An extreme gradient boosting(XGBoost) algorithm model was developed to predict the frequency of absenteeism due to respiratory symptoms and compared with the seasonal autoregressive integrated moving average with exogenous factors(SARIMAX) model. Results Between 2019 and 2022, an average of 9 709 students per day in Jiangsu Province were absent due to respiratory symptoms. The daily average air quality index (AQI) was 76.96, with mass concentrations of PM2.5, PM10, NO2, and O3 averaging at 35.75, 61.13, 28.89, 104.81 μg/m3, respectively. Granger causality tests indicated that AQI, PM2.5, PM10, NO2, and O3 were significant predictors of absenteeism frequency due to respirutory symptoms(F=1.46, 1.79, 1.67, 3.41, 2.18, P < 0.01). The single-day lag effects of PM2.5, PM10, NO2, and O3 reached their peak relative risk (RR) values at lag4, lag0, lag0, lag4 respectively. When integrating these optimal lag variables for the pollutants, the XGBoost model demonstrated superior predictive performance to the SARIMAX model, reducing the mean absolute error (MAE) from 2.251 to 0.475, mean absolute percentage error (MAPE) from 0.429 to 0.080, and root mean square error (RMSE) from 2.582 to 0.713; at the P75 percentile alert threshold, the sensitivity improved from 0.086 to 0.694 and specificity from 0.979 to 0.988, with the Youden index increasing from 0.065 to 0.682. Conclusions The XGBoost model exhibits robust predictive performance and effective early warning capabilities for short-term sequences of student absenteeism due to respiratory symptoms caused by air pollution. Schools could timely adopt this model to preemptively detect and control disease outbreaks, thereby enhancing school health management. -
Key words:
- Air pollution /
- Respiratory system /
- Absenteeism /
- Models, statistical /
- Students
1) 利益冲突声明 所有作者声明无利益冲突。 -
表 1 不同自由度的DLNM模型最大单日滞后效应[RR值(95%CI)]
Table 1. The maximum single-day lag effect of the DLNM model with varying degrees of freedom[RR(95%CI)]
其余大气污染物自由度 时间因素自由度 PM2.5 PM10 NO2 O3 2 6 1.02(0.99~1.05) 1.04(0.97~1.12) 1.30(1.18~1.43) 1.12(1.05~1.21) 7 1.03(1.00~1.06) 1.08(1.02~1.16) 1.26(1.15~1.37) 1.08(1.01~1.15) 8 1.03(1.01~1.04) 1.07(1.04~1.11) 1.22(1.17~1.27) 1.09(1.05~1.12) 3 6 1.02(0.99~1.05) 1.05(0.97~1.13) 1.30(1.18~1.43) 1.13(1.05~1.21) 7 1.03(1.00~1.06) 1.09(1.02~1.16) 1.25(1.14~1.37) 1.08(1.01~1.15) 8 1.03(1.01~1.04) 1.08(1.05~1.11) 1.21(1.16~1.27) 1.09(1.05~1.12) 4 6 1.02(0.99~1.05) 1.05(0.97~1.13) 1.28(1.16~1.41) 1.12(1.04~1.21) 7 1.03(1.00~1.06) 1.09(1.01~1.16) 1.24(1.13~1.36) 1.08(1.01~1.15) 8 1.03(1.01~1.04) 1.07(1.04~1.11) 1.20(1.16~1.26) 1.08(1.05~1.12) -
[1] 马军. 中国学生健康状况监测及学校卫生监测体系建立[J]. 中国学校卫生, 2015, 36(7): 961-964. http://www.cjsh.org.cn/article/id/zgxxws201507001MA J. Establishment of the health monitoring system for Chinese students and school health monitoring system[J]. Chin J Sch Health, 2015, 36(7): 961-964. (in Chinese) http://www.cjsh.org.cn/article/id/zgxxws201507001 [2] 杨月, 叶盛, 刘辉, 等. 南京市中小学生2019—2021学年因病缺课监测分析[J]. 中国学校卫生, 2022, 43(12): 1835-1838, 1842. doi: 10.16835/j.cnki.1000-9817.2022.12.018YANG Y, YE S, LIU H, et al. Sickness absenteeism among primary and middle school students in Nanjing during 2019-2021[J]. Chin J Sch Health, 2022, 43(12): 1835-1838, 1842. (in Chinese) doi: 10.16835/j.cnki.1000-9817.2022.12.018 [3] 陈树昶, 徐虹, 刘卫艳, 等. 大气污染对小学生健康的影响[J]. 中国学校卫生, 2021, 42(10): 1560-1563, 1567. doi: 10.16835/j.cnki.1000-9817.2021.10.028CHEN S C, XU H, LIU W Y, et al. The influence of air pollution on the health of primary school students[J]. Chin J Sch Health, 2021, 42(10): 1560-1563, 1567. (in Chinese) doi: 10.16835/j.cnki.1000-9817.2021.10.028 [4] 熊华威, 王赟, 吴宇, 等. 深圳市2014—2015学年中小学生因病缺课监测情况分析[J]. 实用预防医学, 2017, 24(11): 1374-1377. doi: 10.3969/j.issn.1006-3110.2017.11.028XIONG H W, WANG Y, WU Y, et al. Surveillance on the status of illness-induced absenteeism among primary and middle school students in Shenzhen City during the 2014-2015 academic years[J]. Pract Prev Med, 2017, 24(11): 1374-1377. (in Chinese) doi: 10.3969/j.issn.1006-3110.2017.11.028 [5] LUO C, QIAN J, LIU Y, et al. Long-term air pollution levels modify the relationships between short-term exposure to meteorological factors, air pollution and the incidence of hand, foot and mouth disease in children: a DLNM-based multicity time series study in Sichuan Province, China[J]. BMC Public Health, 2022, 22(1): 1484. doi: 10.1186/s12889-022-13890-7 [6] ZHANG Q, SUN S, SUI X, et al. Associations between weekly air pollution exposure and congenital heart disease[J]. Sci Total Environ, 2021, 757: 143821. doi: 10.1016/j.scitotenv.2020.143821 [7] GASPARRINI A, ARMSTRONG B, KENWARD M G. Distributed lag non-linear models[J]. Stat Med, 2010, 29(21): 2224-2234. doi: 10.1002/sim.3940 [8] SOMYANONTHANAKUL R, WARIN K, AMASIRI W, et al. Forecasting COVID-19 cases using time series modeling and association rule mining[J]. BMC Med Res Methodol, 2022, 22(1): 281. doi: 10.1186/s12874-022-01755-x [9] PANAGGIO M J, RAINWATER-LOVETT K, NICHOLAS P J, et al. Gecko: a time-series model for COVID-19 hospital admission forecasting[J]. Epidemics, 2022, 39: 100580. doi: 10.1016/j.epidem.2022.100580 [10] BELMAHDI B, LOUZAZNI M, EL BOUARDI A. Comparative optimization of global solar radiation forecasting using machine learning and time series models[J]. Environ Sci Pollut Res Int, 2022, 29(10): 14871-14888. doi: 10.1007/s11356-021-16760-8 [11] TSAI M C, CHENG C H, TSAI M I, et al. Forecasting leading industry stock prices based on a hybrid time-series forecast model[J]. PLoS One, 2018, 13(12): e0209922. doi: 10.1371/journal.pone.0209922 [12] LOU H R, WANG X, GAO Y, et al. Comparison of ARIMA model, DNN model and LSTM model in predicting disease burden of occupational pneumoconiosis in Tianjin, China[J]. BMC Public Health, 2022, 22(1): 2167. doi: 10.1186/s12889-022-14642-3 [13] BALLI S. Data analysis of COVID-19 pandemic and short-term cumulative case forecasting using machine learning time series methods[J]. Chaos Solitons Fractals, 2021, 142: 110512. doi: 10.1016/j.chaos.2020.110512 [14] LUO J, ZHANG Z, FU Y, et al. Time series prediction of COVID-19 transmission in America using LSTM and XGBoost algorithms[J]. Results Phys, 2021, 27: 104462. doi: 10.1016/j.rinp.2021.104462 [15] 应圣洁, 顾怡勤, 汪曦, 等. 大气污染与上海市闵行区学生因呼吸道疾病缺课关系的时间序列研究[J]. 环境与职业医学, 2018, 35(5): 394-399. https://www.cnki.com.cn/Article/CJFDTOTAL-LDYX201805005.htmYING S J, GU Y Q, WANG X, et al. Time-series analysis on association between air pollution and student absence caused by respiratory disorders in Minhang District of Shanghai[J]. J Environ Occup Med, 2018, 35(5): 394-399. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-LDYX201805005.htm [16] ZHOU N, DAI H, ZHA W, et al. The impact of meteorological factors and PM2.5 on COVID-19 transmission[J]. Epidemiol Infect, 2022, 150: e38. doi: 10.1017/S0950268821002570 [17] HAO Y, WANG R R, HAN L, et al. Time series analysis of mumps and meteorological factors in Beijing, China[J]. BMC Infect Dis, 2019, 19(1): 435. doi: 10.1186/s12879-019-4011-6 [18] LV C X, AN S Y, QIAO B J, et al. Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model[J]. BMC Infect Dis, 2021, 21(1): 839. doi: 10.1186/s12879-021-06503-y [19] 刘霄玮, 俞丹丹, 张玲玲. 2013—2017学年上海市金山区学生因呼吸系统症状/疾病缺课的监测结果分析[J]. 职业与健康, 2020, 36(16): 2257-2260. https://www.cnki.com.cn/Article/CJFDTOTAL-ZYJK202016032.htmLIU X W, YU D D, ZHANG L L. Analysis of the monitoring results on school absences due to respiratory diseases among students in Jinshan District of Shanghai during 2013-2017 academic year[J]. Occup Health, 2020, 36(16): 2257-2260. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZYJK202016032.htm [20] 王正中, 于宏杰, 蒋鸿琳, 等. 上海嘉定区2010—2017学年中小学生因病缺课状况及其与急性传染病发病的相关性[J]. 中国学校卫生, 2023, 44(6): 914-918. doi: 10.16835/j.cnki.1000-9817.2023.06.027WANG Z Z, YU H J, JIANG H L, et al. Changing trend of sickness absenteeism among students during 2010-2017 academic years and its correlation with the incidence of acute infectious diseases in Jiading District, Shanghai[J]. Chin J Sch Health, 2023, 44(6): 914-918. (in Chinese) doi: 10.16835/j.cnki.1000-9817.2023.06.027 [21] 杨敏娟, 王文朋, 解惠坚, 等. 上海市浦东新区PM2.5对小学生因病缺课影响的时间序列研究[J]. 环境与职业医学, 2018, 35(11): 973-978. https://www.cnki.com.cn/Article/CJFDTOTAL-LDYX201811002.htmYANG M J, WANG W P, XIE H J, et al. Effects of PM2.5 on elementary school students' sickness absenteeism in Pudong New Area, Shanghai: a time-series analysis[J]. J Environ Occup Med, 2018, 35(11): 973-978. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-LDYX201811002.htm [22] HWANG B F, CHEN Y H, LIN Y T, et al. Relationship between exposure to fine particulates and ozone and reduced lung function in children[J]. Environ Res, 2015, 137: 382-390. doi: 10.1016/j.envres.2015.01.009 [23] 耿雪. 大气颗粒物对中小学生因呼吸系统症状缺课影响的病例交叉研究[D]. 青岛: 青岛大学, 2022.GENG X. Impact of particulate matter on respiratory-related school absence: a case-crossover study[D]. Qingdao: Qingdao University, 2022. (in Chinese) [24] 张喆, 虞瑾, 罗春燕, 等. 上海市中小学生因呼吸系统症状缺课与大气污染物的关联[J]. 环境与职业医学, 2018, 35(1): 29-32. https://www.cnki.com.cn/Article/CJFDTOTAL-LDYX201801009.htmZHANG Z, YU J, LUO C Y, et al. Associations between absenteeism caused by respiratory symptoms and air pollutants among primary and middle school students in Shanghai[J]. J Environ Occup Med, 2018, 35(1): 29-32. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-LDYX201801009.htm [25] 代吉亚, 宋铁, 郭汝宁, 等. 2014—2019年广东省传染病自动预警系统运行效果评价[J]. 预防医学论坛, 2020, 26(10): 729-731, 735. https://www.cnki.com.cn/Article/CJFDTOTAL-YXWX202010005.htmDAI J Y, SONG T, GUO R N, et al. Evaluation on operational effect of automatic early warning system for infectious diseases, Guangdong Province, 2014-2019[J]. Prev Med Trib, 2020, 26(10): 729-731, 735. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-YXWX202010005.htm -