Document Type


Publication Date



Transportation -- United States -- Planning, Transportation and state -- Decision making -- Analysis


This extensive database project provides demographic data for workers, both in terms of where they live and where they work, for 39 U.S. Metropolitan Statistical Areas (MSAs). Planners and researchers can use this database to assess the extent to which transit stations and station areas are associated with economic and demographic change, and to forecast similar magnitudes of change through proposed new or expanded transit systems. Specific data elements include job numbers based on job sector, earnings, race, education, sex, and distance of census blocks to transit separated by mode type. Data were extracted as part of a larger project that examines the impact of transit on a large number of societal factors, including economic development and demographic dynamics (


The data project contains a collection of tables, which provides U.S. census block data of the United States Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) Origin-Destination Employment Statistics (LODES) data from 2002 to 2015. The LEHD LODES files include data for Residential Area Characteristics (RAC) and Work Area Characteristics (WAC). Provided data also include distances from each census block centroid or edge to the nearest transit station, separated by mode type. These have been exported from GIS geodatabases created from a combination of census blocks and the LODES tables. Data from each city were merged into one file per city that contains all years for both the RAC and the WAC data, which are combined into 35 separate data sets and are made available as csv and txt format.

The data for the project were downloaded from multiple sites. The GIS census block shapefiles (a GIS file format) are 2010 census blocks and come from the Integrated Public Use Microdata Series (IPUMS) National Historical GIS (IPUMS NHGIS) website. The NHGIS census block data sets are useful due to the lack of availability of these data in a ready-made GIS format from the U.S. Census Bureau. The NHGIS has requested that all research utilizing their data sets include the citation they provide. Please refer to the required citation provided by the NHGIS at their website. The LEHD LODES data were downloaded as csv tables from the U.S. Census Bureau’s download site and joined to the census block data, which were converted from a shapefile to a geodatabase format for a more robust product. The Origin-Destination (OD) files are not included, because they are outside of the scope of this project.

Each geodatabase includes both the Residential Area Characteristics (RAC) and Work Area Characteristics (WAC) files for each of the years 2002-2015, where available. Each file has a concatenated name, combining the census block name and the LODES table name. For example, for Austin, TX, the 2002 RAC file is named “TX_block_2010_tx_rac_S000_JT00_2002_Austin.” The first section, from the 2010 census block GIS data, is “TX_2010_block.” The second section, from the LODES RAC table, is the remainder of the name, except for the name of the MSA at the end of the file name. The other portions of the LODES table name are described in the technical documentation. The concatenated name indicates that the LEHD LODES tables are joined to the census block data, which was accomplished through a table join in ArcGIS.

The distance from the census block edge to the nearest transit station point is provided in each file, as well, by transit mode (e.g., CRT or SCT). Distances were measured as euclidean distances using a Near analysis in ArcGIS. In order to calculate these distance measurements, files are projected to the North America Albers Equal Area Conic projection, a secant projection that minimizes the Scale Factor across the projection, ensuring that distortion from the projection process is negligible overall. A number of the files are projected instead to the State Plane Coordinate System (SPCS) for the local area. These projections are frequently used as a standard by government and industry analysts. The GIS census blocks included in the data set all fall within the Metropolitan Statistical Area boundary for each city included in the data set. Care was taken to include all census blocks falling within the counties that fall within each of the MSAs, which are enumerated below. Some files include data for combined MSAs, such as Seattle-Tacoma and San Diego-Oceanside-Escondido, because of the proximity of some MSAs to each other.

Content of Files

The data for each MSA are provided as separate feature classes (a logical file structure for GIS similar to shapefiles) for each MSA, and are housed inside a geodatabase. The feature classes are separated between three geodatabases in order to reduce the file size of each database for ease of use across the internet. In addition to providing the geodatabases for use in GIS software, we have provided their contents as exported tables in csv and text format for ready use in statistics software.

Three fields, found in the geodatabase and the exported csv and text tables, organize the data in accordance with their original file structure. The extra field names are as follows: LEHD_Year, WAC_RAC, and City. These fields were necessary to keep data organized by the year, whether the data are from the WAC or RAC files, and to what city they belong.

While this process does result in a sizeable number of null values in the provided tables, it nevertheless made data much more manageable in size, by allowing fewer files. This format also greatly increases processing efficiency.

*The data zip files are large and can take up to 4 hours to download.



Persistent Identifier

LEHD Data Dictionary.xlsx (15 kB)
Data Dictionary (5428498 kB)
LODES Compiled Tables (8584441 kB) (8818463 kB) (9072556 kB) (17756 kB)
Transit Types Table.xlsx (43 kB)