Published In

Accident Analysis & Prevention

Document Type


Publication Date



Transportation safety, Real-time information


In this paper, a framework is outlined to generate realistic artificial data (RAD) as a tool for comparing different models developed for safety analysis. The primary focus of transportation safety analysis is on identifying and quantifying the influence of factors contributing to traffic crash occurrence and its consequences. The current framework of comparing model structures using only observed data has limitations. With observed data, it is not possible to know how well the models mimic the true relationship between the dependent and independent variables. Further, real datasets do not allow researchers to evaluate the model performance for different levels of complexity of the dataset. RAD offers an innovative framework to address these limitations. Hence, we propose a RAD generation framework embedded with heterogeneous causal structures that generates crash data by considering crash occurrence as a trip level event impacted by trip level factors, demographics, roadway and vehicle attributes. Within our RAD generator we employ three specific modules: (a) disaggregate trip information generation, (b) crash data generation and (c) crash data aggregation. For disaggregate trip information generation, we employ a daily activity-travel realization for an urban region generated from an established activity-based model for the Chicago region. We use this data of more than 2 million daily trips to generate a subset of trips with crash data. For trips with crashes crash location, crash type, driver/vehicle characteristics, and crash severity. The daily RAD generation process is repeated for generating crash records at yearly or multi-year resolution. The crash databases generated can be employed to compare frequency models, severity models, crash type and various other dimensions by facility type - possibly establishing a universal benchmarking system for alternative model frameworks in safety literature.


© Copyright the author(s) 2024


This is the author’s version of a work that was accepted for publication. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published as: Implementation of a realistic artificial data generator for crash data generation. Accident Analysis & Prevention, 200, 107566.



Persistent Identifier