Article
Performance Comparison of Statistical Models in PM2.5 Forecasting: A Case Study of Almaty
DOI:
https://doi.org/10.47344/b3exq459Keywords:
PM2.5, air quality forecasting, statistical models, hybrid models, missing data imputation, AlmatyAbstract
Air pollution, particularly fine particulate matter (PM2.5), poses a significant threat to public health in urban areas. In Almaty, Kazakhstan, high PM2.5 concentrations require effective forecasting methods to support timely intervention and policy planning. This study aims to evaluate and compare the performance of traditional statistical models and their hybrid counterparts for PM2.5 prediction. Multiple Linear Regression (MLR), Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMA (SARIMA), Generalized Additive Models (GAM), and several hybrid combinations (e.g., MLR + GAM) were applied to daily air quality and meteorological data from February 2020 to May 2024. Missing values were imputed using Multiple Imputation by Chained Equations (MICE), and model performance was assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R² score. The results show that MLR provided the best explanatory power (R² = 0.7160), while SARIMA achieved the lowest RMSE (0.2719), indicating strong short-term predictive accuracy. Among hybrid models, MLR + GAM delivered the most promising results (R2 = 0.6124), although improvements over standalone models were limited. These findings demonstrate the robustness of traditional statistical approaches for air quality forecasting and provide a benchmark for future studies incorporating machine learning techniques. The study offers practical value for environmental monitoring and air quality management in Almaty, and similar urban regions.