Improved Support Vector Machine Models for Work Zone Crash Injury Severity Prediction and Analysis

Published In

Transportation Research Record: Journal of the Transportation Research Board

Document Type


Publication Date



Work zones are a high priority issue in the field of road transportation because of their impacts on traffic safety. A better understanding of work zone crashes can help to identify the contributing factors and countermeasures to enhance roadway safety. This study investigates the prediction of work zone crash severity and the contributing factors by employing a parametric approach using the mixed logit modeling framework and a non-parametric machine learning approach using the support vector machine (SVM). The mixed logit model belongs to the class of random parameter models in which the effects of flexible variables across different observations are identified, that is, data heterogeneity is taken into account. The performance of the SVM model is enhanced by applying three metaheuristic algorithms: particle swarm optimization (PSO), harmony search (HS), and the whale optimization algorithm (WOA). Empirical findings indicate that SVM provides higher prediction accuracy and outperforms the mixed logit model. Estimation results reveal key factors that increase the likelihood of severe work zone crashes. Furthermore, the analysis illustrates the ability of the three metaheuristics to enhance the SVM and the superiority of the harmony search algorithm in improving the performance of the SVM model.

The occurrence of fatal road crashes has followed an increasing trend in recent years, illustrated by a 32% growth from 2,228 fatal road crashes in 2013 to 2,933 in 2017 in the State of Florida. Florida is among the three states in the U.S.A. with the highest rates of fatality road crashes (1). Miami-Dade County had the highest number of road crashes in Florida, with a total of 33,694 fatality and injury crashes out of 64,070 crashes in 2016 (2). The number of work zones in Florida has also increased, because of the growth of highway renovation and construction projects. As such, the number of crashes associated with work zones has also increased, from 1,153 in 2013 to 1,315 in 2017. Thus, safety should be an important consideration for decision makers, as they plan, design, and operate work zones. Geometric characteristics, traffic control, and smart work zones have significant impacts on the occurrence of work zone crashes. As a result, a better understanding of the contributing factors of work zone crashes can help to identify appropriate countermeasures to improve roadway safety.

Work zone crashes constitute approximately 1% of the total crashes in Miami-Dade County. Fatalities occur in just 0.5% of work zone crashes, which is over twice the amount of fatalities in road crashes not involving work zones (i.e., 0.2%). Although the low percentages of this type of crash may not seem alarming at first glance, the significant percentage of loss of life suggests an emergent need for comprehensive and in-depth investigation. Another aspect of work zone crashes is workers’ safety. Approximately 3,400 workers were injured in work zone crashes between 2013 and 2017 in Miami-Dade County. Moreover, considering that around 38% of work zones involve lane closure, the economic impact of travel delay associated with additional lane closures because of incidents can be substantial (3).

In light of this, this study investigates the factors that affect the severity levels of work zone crashes using a disaggregate level analytical approach, in which individual crash records and associated potential contributing factors are studied. By applying a mixed logit modeling framework, a parametric approach, the significant contributing factors affecting driver and passenger injury severity at work zones will be investigated first. Next, the authors propose a support vector machine (SVM) modeling framework, a machine learning approach, with multilayer perceptron and Gaussian radius basis function kernels to classify crash records. Three different metaheuristic algorithms are then applied—particle swarm optimization, harmony search, and the recently introduced whale optimization algorithm—to improve the performance of the SVM. The results from the two models are then compared in relation to the contributing factors identified and the prediction performance.

The remainder of this paper is organized as follows. The next section summarizes the most recent and relevant studies, in which analytical (i.e., parametric) modeling approaches and machine learning techniques are used to study injury severity. The methodology is then presented, which includes brief descriptions of the methods used. This is followed by brief descriptions of the metaheuristic algorithms employed to improve the prediction performance of the SVM and corresponding performance measurement metrics. Next, the data description and processing procedure are introduced. Finally, the last section recaps the research outcomes based on the results obtained and provides concluding remarks.


Copyright © 2020 by National Academy of Sciences. All rights reserved.



Persistent Identifier