Data structures (Statistics), Autocorrelation (Statistics), Raster data, System analysis
Raster data are digital representations of spatial phenomena that are organized into rows and columns that typically have the same dimensions in each direction. They are used to represent image data at any scale. Common raster data are medical images, satellite data, and photos generated by modern smartphones.
Satellites capture reflectance data in specific bands of wavelength that correspond to red, green, blue, and often some infrared and thermal bands. These composite vectors can then be classified into actual land use categories such as forest or water using automated techniques. These classifications are verified on the ground using hand-held sensors.
Reconstructability analysis (RA) is a methodology for analyzing categorical data. There is an entire field of geostatistics for analyzing spatial data that are continuous and numeric, but tools for spatial analysis of categorical (non-numeric) data are limited. RA can bring new insight into such data. This study applies RA to a set of satellite data classified by National Land Cover Database into 15 land use classes. This analysis groups these classes into four types: Forest, Developed, Water, and Grasses.
A Von Neumann Neighborhood (VNN) kernel is passed over the data, coding the values in the North, South, East, and West directions into columns. These tuples of data now consist of rows in which the first column is the center cell of the VNN, the DV we are trying to predict, and the remaining four columns are the values of the VNN, the IV predictors. The VNN was chosen over the Moore neighborhood, consisting of eight neighbors because the NW, NE, SW, and SE cells are further from the center cell than the N, S, E, and W cells. An even better reason— in this particular data analysis -- to prefer the VNN is that RA on the Moore neighborhood indicates that a model with all IVs in the VNN predicts the center cell with a fidelity as high as 84%. Further analysis shows that just the North and South cells together predict the center with 64% probability. We analyze this three-cell relationship for most of the remaining results.
We remove data rows for the trivial case in which all five cells are the same. Another trivial case is when the North and South cells have the same value. In this case the center cell will likely be the same, with a probability ranging from 88% to 98%, depending whether the neighbors are Developed, Grasses, Forest, or Developed. When the North and South cells are different from each other, RA pulls out relations that are beyond classical autocorrelation. If either N or S is Grasses, there is a preference for the center cell being Grasses, regardless of what the N cell is. Similarly, if the S cell is Developed (and not Grasses) then the center cell has a higher probability of being Developed regardless of the N value. If both N and S are neither Grasses nor Developed, then we get the intriguing result that the preferred value for the center cell is whatever the S value is, whether it is Water or Forest. There are more subtle results when the East cell is added back into the analysis.
This initial foray into analysis of raster data using RA shows a great deal of promise compared to other textural analysis techniques, such as GLCM, or autocorrelation analyses, such as Moran’s I or hotspot analysis.
Percy, David and Zwick, Martin, "Beyond Spatial Autocorrelation: A Novel Approach Using Reconstructability Analysis" (2018). Systems Science Faculty Publications and Presentations. 122.