Spatial Autocorrelation & Spatial Dependency
#PERSONAL NOTES
SPATIAL RELATIONSHIPS
Spatial Relationships Types
Adjacency, contiguity, overlap, and proximity are the four ways of describing the relationship between two or more entities.
Spacial relationship either -
variable values for themes defined space - rainfall.
locations for themes defined objects - cities.
>> measured as the existence of statistical dependence in a collection of random variables, each of which is associated with a different geographical location
SPATIAL DEPENDENCY
Tools for exploring spatial dependence include :
Methods for spatial interpolation include Kriging, which is a type of best linear unbiased prediction. (?)
SPATIAL AUTOCORRELATION
Spatial dependency is the co-variation of properties within geographic space : characteristics at proximal locations appear to be correlated, either positively or negatively.
Spatial dependency leads to the spatial autocorrelation problem in statistics as it violates standard statistical techniques that assume independence among observations.
Spatial autocorrelation measures and analyse the degree of dependency among observations in a geographic space.
Spatial autocorrelation statistics are global in the sense that they estimate the overall degree of spatial autocorrelation for a dataset. The possibility of spatial heterogeneity suggests that the estimated degree of autocorrelation may vary significantly across geographic space.
Local spatial autocorrelation statistics provide estimates disaggregated to the level of the spatial analysis units, allowing assessment of the dependency relationships across space.
VARIABLE VALUES ASSUMPTION
The variable can assume values either :
At any point on a continuous surface (e.g. land use type or annual precipitation levels in a region);
At a set of fixed sites located within a region (e.g. prices at a set of retail outlets);
Across a set of areas that subdivide a region (e.g. the count or proportion of households with two or more cars in a set of Census tracts that divide an urban region).
interpret spatial distribution
clustering spatial distribution = high spatial autocorrelation
checkerboard spatial distribution = low spatial autocorrelation
trait | Positive Spatial Autocorrelation | Negative Spatial Autocorrelation |
---|---|---|
clustering | similar values tend to be in similar locations, cluster pattern | dispersed, regular pattern (** != random pattern) |
neighbours similarity | high similarity than under spatial randomness. | low similarity than under spatial randomness. |
compatibility | compatible with diffusion, but not necessary caused by diffusion. | compatible to competition, but not necessary competition. |
visual ref |
GLOBAL SPATIAL AUTOCORRELATION (GSA) MEASUREMENT
Moran’s I = describe how features differ from the values in the study area as a whole.
Geary’s C = describe how features differ from their immediate neighbours.
Relationship of Moran’s I & Geary’s C :
C approaches 0 and I approaches 1 when similar values are clustered.
C approaches 3 and I approaches -1 when dissimilar values tend to cluster.
High values of C measures correspond to low values of I = both measures are
inversely related.I is a measure of global spatial autocorrelation, while C is more sensitive to local spatial autocorrelation.
Moran’s I
Moran I (Z value) is :
positive (I>0): Clustered, observations tend to be similar;
negative(I<0): Dispersed, observations tend to be dissimilar;
approximately zero: observations are arranged randomly over space
Geary’s C
Geary C (Z value) is :
Large c value (>1) : Dispersed, observations tend to be dissimilar;
Small c value (<1) : Clustered, observations tend to be similar;
c = 1: observations are arranged randomly over space.
Getis-Ord Global G
Measures of global high / low clustering.
Concerned with the overall concentration or lack of concentration in all pairs that are neighbours given the definition of neighbouring areas.
The variable MUST contain only positive values to be used.
>> e.g. growth rate that consists +ve & -ve. The value will be off if involve -ve value.
interpretation | cannot reject H0 | may reject H0 | may reject H0 |
---|---|---|---|
p-value | not significant | statistically significant | statistically significant |
z-score | - | positive | negative |
remarks | Observed spatial pattern of values could be one of many possible versions of complete spatial randomness. | Spatial distribution of high values in the dataset is more spatially clustered than would be expected if underlying spatial processes were truly random. | Spatial distribution of low values in the dataset is more spatially clustered than would be expected if underlying spatial processes were truly random. |
** revision corner **
select confident interval 95% => alpha value = 0.05
reject Null hypothesis (H0) if p-value < alpha value
failed to reject H0 if p-value > alpha value
LOCAL SPATIAL AUTOCORRELATION
A collection of geospatial statistical analysis methods for analysing the location related tendency (clusters or outliers) in the attributes of geographically referenced data (points or areas).
Can be indecies decomposited from their global measures such as local Moran’s I, local Geary’s c, and Getis-Ord Gi*.
These spatial statistics are well suited for :
detecting clusters or outliers;
identifying hot spot or cold spot areas;
assessing the assumptions of stationarity;
identifying distances beyond which no discernible association obtains.
local indicator of spatial association (LISA)
A subset of localised geospatial statistics methods.
Any spatial statistics that satisfies the following two requirements :
the LISA for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation;
the sum of LISAs for all observations is proportional to a global indicator of spatial association.
Local Indicators of Spatial Association or LISA are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable.
For instance if we are studying cancer rates among census tracts in a given city local clusters in the rates mean that there are areas that have higher or lower rates than is to be expected by chance alone; that is, the values occurring are above or below those of a random distribution in space.
interpret Local Moran & scatterplot
An outlier = significant & negative if location i is associated with relatively low values in surrounding locations.
A cluster = significant & positive if location i is associated with relatively high values of the surrounding locations.
In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant.
The commonly used alpha-values are 0.1, 0.05, 0.01, 0.001 corresponding the 90%, 95, 99% and 99.9% confidence intervals respectively.
detect hot & cold spot areas with Getis-Ord Gi statistic
Interpretation of Getis-Ord Gi and Gi :
A hot spot area: significant and positive if location i is associated with relatively high values of the surrounding locations.
A cold spot area: significant and negative if location i is associated with relatively low values in surrounding locations.
SPATIAL RANDOMNESS
Null hypothesis (H0)
Observed spatial pattern of values is equally likely as any other spatial pattern.
Values at one location do not depend on values at other (neighbouring) locations.
Under spatial randomness, the location of values may be altered without affecting
the information content of the data.
assess the violation of assumptions
Use a Monte Carlo simulation to assess the assumptions of Moran’s I are true (normality and randomisation).
Simulate Moran’s I n times under the assumption of no spatial pattern,
Assigning all regions the mean value
Calculate Moran’s I,
Compare actual value of Moran’s I to randomly simulated distribution to obtain p-value (pseudo significance).
CONSIDERATIONS FOR WEIGHTING SCHEME
fixed weighting scheme
All features should have at least one neighbour.
No feature should have all other features as neighbours.
Especially when input field values are skewed, focus on features to have about eight neighbors each.
Might produce large estimate variances where data are sparse, while mask subtle local variations where data are dense.
In extreme condition, fixed schemes might not be able to calibrate in local areas where data are too sparse to satisfy the calibration requirements (observations must be more than parameters).
adaptive weighting scheme
Adaptive schemes adjust itself according to the density of data
Shorter bandwidths where data are dense and longer where sparse.
Finding nearest neighbors are one of the oen used approaches.
SUITABILITY OF SPATIAL WEIGHTING METHOD
polygon contiguity method
Effective when polygons are similar in size and distribution, and
When spatial relationships are a function of polygon proximity (if two polygons share a boundary, spatial interaction between them increases).
When select a polygon contiguity conceptualization - select row standardization for tools that have the Row Standardization parameter.
fixed distance method
Works well for point data. It is often a good option for polygon data.
When there is a large variation in polygon size (very large polygons at the edge of the study area and very small polygons at the center of the study area.
Ensure a consistent scale of analysis.
inverse distance method
Most appropriate with continuous data or to model processes where the closer two features are in space, the more likely they are to interact / influence each other.
Every feature is potentially a neighbour of every other feature, and with large datasets, the number of computations involved will be enormous.
k-nearest neighbours method
Effective when you want to ensure you have a minimum number of neighbors for your analysis.
Works well when the data distribution varies across the study area so that some features are far away from all other features.
Especially when the associated features values are skewed (are not normally distributed), it is important that each feature is evaluated within the context of at least eight or so neighbors (this is a rule of thumb only).
Spatial context of the analysis changes depending on variations in the sparsity/density of the features.
When fixing the scale of analysis is less important than fixing the number of neighbors, the k-nearest neighbours method is appropriate.
GUIDE TO SELECT FIXED-DISTANCE BAND VALUE
Select a distance based on what you know about the geographic extent of the spatial processes promoting clustering for the phenomena you are studying.
Use a distance band that is large enough to ensure all features will have at least one neighbor, or results will not be valid.
Try not to get stuck on the idea that there is only one correct distance band.
Most likely, there are multiple/interacting spatial processes
promoting observed clustering.Select an appropriate distance band or threshold distance.
All features should have at least one neighbour.
No feature should have all other features as a neighbour.
Especially if the values for the input field are skewed, each feature should have about eight neighbours.