
Geo-spatial Data and Information
The first benefit of investigation is not increased confidence, but removal of false confidence.
Geo-spatial data is collected and transformed into information...
Geo-spatial information is attributed, organised and interrogated...
This information, properly presented allows the visualisation and comprehension of complex 3D and temporal situations.
For Geo-spatial data to provide meaningful information or insights it must be:
-
Observed
-
Preserved
-
Processed
-
Analysed
-
Communicated
Not all geo-spatial data is created equally. Reliability of geo-spatial data is therefore dependent upon:
-
The quality of the sampling.
-
The quality of the position.
-
The quality of the data attribution.
-
The relevance of the data.
Note: In order to visualise and measure your geo-spatial data within the wider context of the system domain, it is imperative to accurately position it in the correct coordinate reference system.
Assumptions About Your Data
Everything is related to everything, but near things are more related. (Tobler 1970)
But....
Your data is rarely collected with the aim of providing good spatial statistics for the geo-statistician…
-
Budgetary constraints
-
Minimum required (for representative sample)
-
Scale of observations/measurements
-
Targeted with bias
Errors WILL propagate throughout the computational, analytical process.
Unless your data:
-
Have been collected and analysed/measured in a fit and proper manner.
-
Have been pre-screened to identify blunders and outliers; you should consider:
-
Different sampling campaigns
-
Consistent sampling procedures
-
Signs: Differences in reporting precision, sample numbering, missing samples
-
-
Have positions been verified (plot on a map).
-


Site Characterisation and its Variability
If the area has near uniform data values then you can expect accurate estimates/predictions.
If the area has highly variable data values - the chances of local accurate estimates and predictions are poor.
This will affect all estimation methods used.

Site Characterisation:
How many samples are needed?

Each sample will have a cost impact
-
Collection, analysis complexity, time, errors
-
Avoid unnecessary samples - waste of resources
Population Size
-
Smaller populations allow uncertainty (outliers) to migrate into results
-
Large populations increased complexity, time and confounding factors
What is the aim?
-
Larger MoE will require less samples
-
Higher confidence will require more samples
Site Characterisation:
Displaying and Prediction of the Results


How good are your map/ results
-
A helpful qualitative display with questionable quantitative significance’ Isaacks (1989)
-
What contouring method was used?
-
Calibration points (but consider sample costs)
Distribution and number of sample points
-
Type of sampling, targeted or random
-
Where is the seed point for sampling?
-
Understanding sampling bias
Do the results fit the expectation of geological and environmental understanding
-
Limits of the data -
-
Better to interpolate than extrapolate the results
-
-
Are the anomalies real or processing artifacts
Contouring results using same data (red dots) but different methods

Basic Analysis
Univariate and Bivariate



Univariate Analysis
Provides simple statistical information
Visual representation
No spatial understanding (depiction)
Bivariate Analysis
Simple linear regression and more complex relationships
Are two variables correlatable?
-
Correlation is not causation!
Multivariate Analysis
Natural systems will be expected to have more than one independent input
-
Rain + Geology + Time + Topography + ...
Use residual plots
In a regression model we should not be able to predict the error in any given observation
-
By analysing our residuals we can determine if they are consistent with random error or there is a systematic bias

Autocorrelation
Autocorrelation in spatial data refers to the correlation of a variable with itself through space. It describes how similar data values are based on the distance and direction between them.
Why it is important:
-
Identify clustering, gradients, or randomness in spatial distributions
-
Strong positive autocorrelation implies nearby data points are good predictors for unknown values — a core idea behind kriging and contouring.
-
If autocorrelation is unexpectedly high or low, it may point to issues like:
-
Measurement bias
-
Systematic environmental variation
-
Over-sampling (duplicate or near-duplicate points)
-
Model Assumptions
Many statistical models assume observations are independent. Spatial autocorrelation violates this — meaning traditional stats (like regression) may give misleading results.


Geo-spatial Data
A Project Schema & Geospatial Uncertainty Curve
Geospatial projects often exist in complex 3D environments both at, and beneath the ground surface.
From project conception through to execution and eventual completion, Geo-spatial data and its transformation to credible information is poorly understood and is an often side-lined facet of the project.
Because many projects begin with unjustified certainty - the first benefit of investigation is not increased confidence, but removal of false confidence.
For successful outcomes good quality geo-spatial data will reduce uncertainty, dispel incorrect pre-conceptions and add value to projects that require this data as a key foundation element:
-
Knowledge gaps are identified and rectified through a single data measurement or series of data measurements.
-
This data is organised and attributed, and transformed into information
-
This information yields insights from which informed decisions based on spatial knowledge can be made...
-
Reducing overrun and cost consuming changes to plan that were originally made on false pre-conceptions

_PNG.png)
_PNG.png)
Acknowledgement is made to Pyrcz, Isaacks, Deutsch, Smith and others from whom many of the above themes are based

A conceptual look at knowledge gained in an investigation
The idea is to treat the curve as a knowledge-gain function, where:
-
x = investigative input, effort, time, cost, or integrated investigation intensity
-
y = usable spatial knowledge, or defensible confidence in site understanding
-
Then: dy/dx Is the marginal knowledge gain per unit investigative effort.
Interpretation by project stage
Early stage: low dy/dx
-
Desk study, initial assumptions, rough conceptual model.
-
At this stage, effort may not immediately produce much reliable spatial knowledge. Some effort is spent identifying what is not known. There may even be confusion reduction rather than true knowledge expansion.
-
This is an important point: early effort is still valuable, but the apparent rate of gain in defensible knowledge may be small.
Transitional stage: increasing dy/dx
-
Knowledge gaps are identified, investigation becomes targeted, measurements begin to answer the right questions.
-
This is where each added unit of effort may yield large reductions in uncertainty. This is often the most efficient part of the investigation.
Mature stage: peak dy/dx
-
The investigation is well-designed, the main controls on variability are understood, and data are being transformed into robust spatial interpretation.
-
This is the zone of maximum return on investigative effort.
Late stage: declining dy/dx
-
Additional sampling, modelling refinement, and monitoring still help, but each new increment contributes less than before.
-
This is the diminishing-returns zone
A useful alternative perspective is that dy/dx does not merely depend on effort quantity. It depends on effort quality. So more rigorously dy/dx
is high when investigation is:
-
Correctly targeted
-
Spatially representative
-
Well-attributed
-
Properly analysed
-
Linked to the governing uncertainties
and low when effort is:
-
Ad hoc
-
Biased
-
Redundant
-
Poorly located
-
Not tied to the key conceptual uncertainties
This is probably the most important insight. The derivative is not just “how much work is being done,” but “how much useful understanding is being extracted from that work.”
The second Derivative
A further useful idea is the second derivative d2y/dx2
This indicates whether the rate of knowledge gain is accelerating or decelerating.
Conceptually:
-
d2y/dx2 > 0: investigation is becoming more effective, often because the conceptual model is improving and sampling is becoming better targeted
-
d2y/dx2 = 0: point of maximum efficiency growth, often near the inflection region
-
d2y/dx2 < 0: diminishing returns have begun
This can be very helpful if the aim is to argue for targeted investigation design rather than simply more investigation.
Area under the curve
The Spatial Certainty Curve can be viewed as a cumulative expression of spatial knowledge developed through investigation. Its gradient, dy/dx, represents the rate at which useful knowledge is gained as investigative effort increases.
The area under this rate curve represents the cumulative increase in spatial knowledge over a given range of effort, rather than the total possible knowledge.
Where spatial knowledge is expressed qualitatively rather than as a measured index, this relationship should be understood as a conceptual guide rather than a strict mathematical quantity.
.
Important caution
A common assumption would be that knowledge always increases smoothly with effort. In reality that is not always true.
Sometimes there may be temporary negative effects in perceived confidence:
-
Early investigation may reveal that prior assumptions were wrong
-
Confidence may drop before reliable knowledge rises
-
The curve may therefore include a “false confidence collapse” before the main rise
This is actually a very valuable idea for site investigation, because many projects begin with unjustified certainty. In those cases the first benefit of investigation is not increased confidence, but removal of false confidence.
So in a more realistic conceptual model:
-
Apparent confidence may fall first
-
Defensible knowledge then rises
-
Later gains diminish
