Spatial Autocorrelation & Spatial Dependency

Published

November 24, 2022

#PERSONAL NOTES

SPATIAL RELATIONSHIPS

Spatial Relationships Types

Adjacency, contiguity, overlap, and proximity are the four ways of describing the relationship between two or more entities.

Spacial relationship either -

  • variable values for themes defined space - rainfall.

  • locations for themes defined objects - cities.

  • >> measured as the existence of statistical dependence in a collection of random variables, each of which is associated with a different geographical location

SPATIAL DEPENDENCY

Tools for exploring spatial dependence include :

Methods for spatial interpolation include Kriging, which is a type of best linear unbiased prediction. (?)


SPATIAL AUTOCORRELATION

Spatial dependency is the co-variation of properties within geographic space : characteristics at proximal locations appear to be correlated, either positively or negatively.

Spatial dependency leads to the spatial autocorrelation problem in statistics as it violates standard statistical techniques that assume independence among observations.

Spatial autocorrelation measures and analyse the degree of dependency among observations in a geographic space.

  • Spatial autocorrelation statistics are global in the sense that they estimate the overall degree of spatial autocorrelation for a dataset. The possibility of spatial heterogeneity suggests that the estimated degree of autocorrelation may vary significantly across geographic space.

  • Local spatial autocorrelation statistics provide estimates disaggregated to the level of the spatial analysis units, allowing assessment of the dependency relationships across space.

VARIABLE VALUES ASSUMPTION

The variable can assume values either :

  • At any point on a continuous surface (e.g. land use type or annual precipitation levels in a region);

  • At a set of fixed sites located within a region (e.g. prices at a set of retail outlets);

  • Across a set of areas that subdivide a region (e.g. the count or proportion of households with two or more cars in a set of Census tracts that divide an urban region).


interpret spatial distribution

  • clustering spatial distribution = high spatial autocorrelation

  • checkerboard spatial distribution = low spatial autocorrelation

Table 1 : Summary of spatial autocorrelation relationship.
trait Positive Spatial Autocorrelation Negative Spatial Autocorrelation
clustering similar values tend to be in similar locations, cluster pattern dispersed, regular pattern (** != random pattern)
neighbours similarity high similarity than under spatial randomness. low similarity than under spatial randomness.
compatibility compatible with diffusion, but not necessary caused by diffusion. compatible to competition, but not necessary competition.
visual ref


GLOBAL SPATIAL AUTOCORRELATION (GSA) MEASUREMENT

  • Moran’s I = describe how features differ from the values in the study area as a whole.

  • Geary’s C = describe how features differ from their immediate neighbours.

Relationship of Moran’s I & Geary’s C :

  • C approaches 0 and I approaches 1 when similar values are clustered.

  • C approaches 3 and I approaches -1 when dissimilar values tend to cluster.

  • High values of C measures correspond to low values of I = both measures are
    inversely related
    .

  • I is a measure of global spatial autocorrelation, while C is more sensitive to local spatial autocorrelation.

Moran’s I

Moran I (Z value) is :

  • positive (I>0): Clustered, observations tend to be similar;

  • negative(I<0): Dispersed, observations tend to be dissimilar;

  • approximately zero: observations are arranged randomly over space

Geary’s C

Geary C (Z value) is :

  • Large c value (>1) : Dispersed, observations tend to be dissimilar;

  • Small c value (<1) : Clustered, observations tend to be similar;

  • c = 1: observations are arranged randomly over space.

Getis-Ord Global G

Measures of global high / low clustering.

  • Concerned with the overall concentration or lack of concentration in all pairs that are neighbours given the definition of neighbouring areas.

  • The variable MUST contain only positive values to be used.

    >> e.g. growth rate that consists +ve & -ve. The value will be off if involve -ve value.

interpretation cannot reject H0 may reject H0 may reject H0
p-value not significant statistically significant statistically significant
z-score - positive negative
remarks Observed spatial pattern of values could be one of many possible versions of complete spatial randomness. Spatial distribution of high values in the dataset is more spatially clustered than would be expected if underlying spatial processes were truly random. Spatial distribution of low values in the dataset is more spatially clustered than would be expected if underlying spatial processes were truly random.


** revision corner **

  • select confident interval 95% => alpha value = 0.05

  • reject Null hypothesis (H0) if p-value < alpha value

  • failed to reject H0 if p-value > alpha value


LOCAL SPATIAL AUTOCORRELATION

  • A collection of geospatial statistical analysis methods for analysing the location related tendency (clusters or outliers) in the attributes of geographically referenced data (points or areas).

  • Can be indecies decomposited from their global measures such as local Moran’s I, local Geary’s c, and Getis-Ord Gi*.

  • These spatial statistics are well suited for :

    • detecting clusters or outliers;

    • identifying hot spot or cold spot areas;

    • assessing the assumptions of stationarity;

    • identifying distances beyond which no discernible association obtains.

local indicator of spatial association (LISA)

A subset of localised geospatial statistics methods.

Any spatial statistics that satisfies the following two requirements :

  • the LISA for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation;

  • the sum of LISAs for all observations is proportional to a global indicator of spatial association.

Local Indicators of Spatial Association or LISA are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable.

For instance if we are studying cancer rates among census tracts in a given city local clusters in the rates mean that there are areas that have higher or lower rates than is to be expected by chance alone; that is, the values occurring are above or below those of a random distribution in space.

interpret Local Moran & scatterplot

An outlier = significant & negative if location i is associated with relatively low values in surrounding locations.

A cluster = significant & positive if location i is associated with relatively high values of the surrounding locations.

  • In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant.

  • The commonly used alpha-values are 0.1, 0.05, 0.01, 0.001 corresponding the 90%, 95, 99% and 99.9% confidence intervals respectively.

detect hot & cold spot areas with Getis-Ord Gi statistic

Interpretation of Getis-Ord Gi and Gi :

  • A hot spot area: significant and positive if location i is associated with relatively high values of the surrounding locations.

  • A cold spot area: significant and negative if location i is associated with relatively low values in surrounding locations.


SPATIAL RANDOMNESS

Null hypothesis (H0)

  • Observed spatial pattern of values is equally likely as any other spatial pattern.

  • Values at one location do not depend on values at other (neighbouring) locations.

  • Under spatial randomness, the location of values may be altered without affecting
    the information content of the data.

assess the violation of assumptions

Use a Monte Carlo simulation to assess the assumptions of Moran’s I are true (normality and randomisation).

  • Simulate Moran’s I n times under the assumption of no spatial pattern,

  • Assigning all regions the mean value

  • Calculate Moran’s I,

  • Compare actual value of Moran’s I to randomly simulated distribution to obtain p-value (pseudo significance).


CONSIDERATIONS FOR WEIGHTING SCHEME

fixed weighting scheme

  • All features should have at least one neighbour.

  • No feature should have all other features as neighbours.

  • Especially when input field values are skewed, focus on features to have about eight neighbors each.

  • Might produce large estimate variances where data are sparse, while mask subtle local variations where data are dense.

  • In extreme condition, fixed schemes might not be able to calibrate in local areas where data are too sparse to satisfy the calibration requirements (observations must be more than parameters).

adaptive weighting scheme

  • Adaptive schemes adjust itself according to the density of data

  • Shorter bandwidths where data are dense and longer where sparse.

  • Finding nearest neighbors are one of the oen used approaches.


SUITABILITY OF SPATIAL WEIGHTING METHOD

polygon contiguity method

  • Effective when polygons are similar in size and distribution, and

  • When spatial relationships are a function of polygon proximity (if two polygons share a boundary, spatial interaction between them increases).

  • When select a polygon contiguity conceptualization - select row standardization for tools that have the Row Standardization parameter.

fixed distance method

  • Works well for point data. It is often a good option for polygon data.

  • When there is a large variation in polygon size (very large polygons at the edge of the study area and very small polygons at the center of the study area.

  • Ensure a consistent scale of analysis.

inverse distance method

  • Most appropriate with continuous data or to model processes where the closer two features are in space, the more likely they are to interact / influence each other.

  • Every feature is potentially a neighbour of every other feature, and with large datasets, the number of computations involved will be enormous.

k-nearest neighbours method

Effective when you want to ensure you have a minimum number of neighbors for your analysis.

Works well when the data distribution varies across the study area so that some features are far away from all other features.

  • Especially when the associated features values are skewed (are not normally distributed), it is important that each feature is evaluated within the context of at least eight or so neighbors (this is a rule of thumb only).

  • Spatial context of the analysis changes depending on variations in the sparsity/density of the features.

  • When fixing the scale of analysis is less important than fixing the number of neighbors, the k-nearest neighbours method is appropriate.


GUIDE TO SELECT FIXED-DISTANCE BAND VALUE

  • Select a distance based on what you know about the geographic extent of the spatial processes promoting clustering for the phenomena you are studying.

  • Use a distance band that is large enough to ensure all features will have at least one neighbor, or results will not be valid.

  • Try not to get stuck on the idea that there is only one correct distance band.

  • Most likely, there are multiple/interacting spatial processes
    promoting observed clustering.

  • Select an appropriate distance band or threshold distance.

  • All features should have at least one neighbour.

  • No feature should have all other features as a neighbour.

  • Especially if the values for the input field are skewed, each feature should have about eight neighbours.