#separator:tab #html:true classification: Natural Breaks When natural clusters in data are present
Minimizes variance within classes, max variance between classification: Equal Interval "You tell # of classes
-: distribution of values not spread equally
+: evenly distributed data (on a histogram)
+: comparing data sets" classification: Defined interval You tell the width of classes
-: distribution of values also not spread equally across all classes
+: comparing data sets classification: Quantile Equal number of observations per class

+: symmetrical (normally distributed) data or mild/moderate skew (on histogram)
+: top/bottom percentile of values
-: doesn't consider natural gaps classification: Standard deviation divides data, categorizes into intervals of standard deviations above/below mean

+: to highlight how far values deviate from average
-: shows as z-scores
-: hard to understand classification: Geometric class Multiplicatively vary class widths

when data has a highly skewed distribution central tendency: mode - categorical data
- highest frequency value central tendency: Qualitative ordinal data, use Median central tendency: Mean "" central tendency: Weighted mean center pulls mean centre towards higher value weights central tendency: Median center location representing the shortest total distance to all other features

more robust to outliers central tendency: Central feature Chooses an *existing feature* in dataset that has shortest total distance to ALL other features dispersion: normally distributed data ~68% within 1 std deviation from mean,
~95% within 2,
~99% within 3 dispersion: Standard distance average distance of each feature to mean center
then use that distance as a radius centered on mean center

(Spatial equiv of std deviation) Standard deviational ellipse Like standard distance,

but calculate x and y coordinates to those of mean center Dispersion How spread out/ compact a dataset is around its location of central tendency Central tendency "Single location to summarize a set of locations, ""typical""/ ""average""/ most representative" Defining neighbours: Number of neighbours nbh defined by a specified number of features closest to the focal feature

distances can vary depending on density of features Defining neighbours: Fixed distance all features that fall within specified distance of focal features

num of neighbours depends on density of features in area Defining neighbours: Network distance Travel routes around focal feature

Fixed distance (no bridge) vs realistic Contiguity: Raster/Vector polygons vs points Only to polygons, because points have no edges Delaunay triangulation Contiguity but for points

Generates thiessen polygons on points, then use contiguity edges corners method Defining neighbours: Contiguity edges (Rook) shared border w/ focal feature considered a neighbour

(directly next to each other) Defining neighbours: Contiguity edges corners (Queen) border or corner shared feature considered neighbour Spatial analysis issues - MAUP, neighbourhood definition
- Boundary problem
- Spatial sampling
- Tobler Cluster analysis finding areas with unexpectedly high values, or finding groups of features with similar characteristics/values/locations, or finding point patterns in the landscape GIS ML tool types: - prediction
- classification
- clustering density-based clustering grouping of observations based on feature locations Tobler's first law of geog Everything is related to everything else,
but near things are more related than distant things Modifiable Areal Unit Problem (MAUP) Combined effects of scale and aggregation

how things are zoned AND the scale of geographic unit (then aggregated) can change outcomes

explore alternate zoning effects and hierarchical models Spatial weights matrix File that quantifies spatial relationships (neighbourhood/s) among a set of features Traditional statistical tests can often be applied to spatial data, BUT... doesn't account for Tobler's first law
and other spatial relationships

also, datasets w/ very different distributions can produce the same summary statistics central tendency: Median """Middle"" value, often better choice for skewed data
" central tendency: What do outliers do? Dispersion: variance difference between min and max values in a distribution
Heavily impacted by outliers (frequency of values not considered) Dispersion: Standard deviation sqr(variance)

variance: sum of (observations - mean)2 all  /  mean z-scores (Standard score) standard deviations above/below mean

( observation - mean ) / stddev central tendency: mean centre spatial average (add up) AI ability of a machine to perform tasks traditionally requiring human intelligence Machine learning set of tools, algos, and techniques to allow computers to learn patterns in data and acquire info w/o human explicitly programming the process Deep learning Using trainable algos in the form of artificial neural networks (inspired by how human brain works) DBSCAN "# of features to be considered a cluster, max search distance" HDBSCAN uses series of nested clusters and chooses levels that create stable clusters having as many members as possible

OPTICS Uses reachability plot for distances between neighbours, peaks = big spatial jump, separates clusters

DBSCAN vs HDBSCAN DBSCAN struggles w/ different densities unlike HDBSCAN, where search distances can vary multivariate clustering GROUPING of observations based on feature attributes trad vs. spatially constrained multivariate clustering traditional:
SC:
MC: k-means Finds k groups in data based on feature attributes

SC-MvC: Minimum spanning tree features laid out in data / 'spatial' space, connected based on how far they are based on location and attribute values

links are THEN broken in ways that keep clusters as distinct as possible complete spatial randomness reference spatial distribution, simulates random pattern

you can compare observations to CSR negative spatial autocorrelation similar values scattered across space, things closer together likely to have diff values

underlying process causes REGULAR DISPERSION random pattern mix of clustering and dispersion positive spatial autocorrelation similar values clustered in space
things closer together likely to have similar values

underlying process leads to clustering p-value probability that the pattern seen is the result of a random process

(low is high certainty that it's not) Spatial autocorrelation: TEST TYPES Moran's I Clustering, dispersion, both?

Clustered +ve
Dispersal -ve
z-score: high/low suggests pattern unlikely to be random Getis-Ord G Clustering?

G: Spatial density, high values indicate high value clustering, low ... Getis-Ord Local Gi* (Scan statistic) ID Hot and cold spots (clusters) relative to mean ACROSS study area at different confidence levels (p-value)

is local nbh average (z-score) significantly different from global average?
could have low features included in hot spot Anselin Local Moran's I (LISA) (Scan statistic) Clusters: high-high, low-low
Outliers: high-low, low-high (a feature doesn't match w/ rest of nbh)

Is nbh average significantly different from global average, and
is feature value significantly different from nbh average? Bivariate Moran's I Clusters across 2 variables

high-high, low-low, high-low, low-high clusters

better than comparing visually linear mean: orientation vs direction orientation: angle only
direction: average length and angle Spatiotemporal pattern mining Finding patterns in space and through time by analyzing snapshots of data Problem w/ thematic/ hotspot map for each period of time Effectiveness of visual analysis decreases as volume of data increases

- difficult to visually quantify trends/change
- assumes each time period is completely independent (Jan doesn't affect Feb) Time-series analysis looking at how spatial patterns have changed over time Space-time cube 3D cube
x,y: location grid or polygon aggregation
z: time Space-time cube: Bin individual unit: unique spatiotemporal extent Space-time cube: location column of bins: same location, different temporal extents Space-time cube: Time Slice row of bins that share the same temporal extent (imagine off the top: most recent)

standalone analysis: 1 slice Space-time cube: Aggregation Polygons: probably should standardize

Bins (same size: no need to standardize):
- fishnet / square grid: quicker than hexagon
- hexagon: more edges = more neighbours, closest approximation to circle that fits nicely Space-time cube: Modifiable temporal unit problem Similar to MAUP but over time: How we aggregate time can affect our analysis

Data aggregated to months results != years
Consider: seasonality of data, but also more broad patterns

Also: Feb (less days = less car crashes)
Ensure aggregation divides evenly among data or chop off the oldest bit Emerging hot spot analysis Getis-Ord Gi*: Spatial + temporal (+-1) neighbours, compares to whole study area for hot/cold spots
Difference: z-score, Significance: p-value

Results:
3D: clustering/significance of clustering at each bin over time for entire space-time cube
2D: Top is summarized

17 categories (8+8+1): consider combining/removing categories depending on audience
can also examine a category (why is there oscillation?) Local Outlier Analysis "Extension of Anselin Local Moran's I: IDs clusters and local outliers in space + time (not as detailed as emerging hotspot analysis)

3D entire output: bin value -- neighbourhood value
2D summary output: no indication of changes to significance of clusters/outliers
- ""only"": has only ever been that
- multiple types, never significant"