Published In

Environment International

Document Type


Publication Date



Ground-level ozone, Random forest, Trend analysis, Cross-validation, Spatiotemporal dynamic, Data-driven model


Surface ozone (O3), one of the harmful air pollutants, generated significantly negative effects on human health and plants. Existing O3 datasets with coarse spatiotemporal resolution and limited coverage, and the uncertainties of O3 influential factors seriously restrain related epidemiology and air pollution studies. To tackle above issues, we proposed a novel scheme to estimate daily O3 concentrations on a fine grid scale (1 km × 1 km) from 2018 to 2020 across China based on machine learning methods using hourly observed ground-level pollutant concentrations data, meteorological data, satellite data, and auxiliary data including digital elevation model (DEM), land use data (LUD), normalized difference vegetation index (NDVI), population (POP), and nighttime light images (NTL), and to identify the difference of influential factors of O3 on diverse urbanization and topography conditions. Some findings were achieved. The correlation coefficients (R2) between O3 concentrations and surface net solar radiation (SNSR), boundary layer height (BLH), 2 m temperature (T2M), 10 m v-component (MVW), and NDVI were 0.80, 0.40, 0.35, 0.30, and 0.20, respectively. The random forest (RF) demonstrated the highest validation R2 (0.86) and lowest validation RMSE (13.74 μg/m3) in estimating O3 concentrations, followed by support vector machine (SVM) (R2 = 0.75, RMSE = 18.39 μg/m3), backpropagation neural network (BP) (R2 = 0.74, RMSE = 19.26 μg/m3), and multiple linear regression (MLR) (R2 = 0.52, RMSE = 25.99 μg/m3). Our China High-Resolution O3 Dataset (CHROD) exhibited an acceptable accuracy at different spatial–temporal scales. Additionally, O3 concentrations showed decreasing trend and represented obviously spatiotemporal heterogeneity across China from 2018 to 2020. Overall, O3 was mainly affected by human activities in higher urbanization regions, while O3 was mainly controlled by meteorological factors, vegetation coverage, and elevation in lower urbanization regions. The scheme of this study is useful and valuable in understanding the mechanism of O3 formation and improving the quality of the O3 dataset.


© 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( nd/4.0/).



Persistent Identifier