#separator:tab
#html:true
classification:&nbsp;Natural Breaks	When natural clusters in data are present<br>Minimizes variance within classes, max variance between
classification:&nbsp;Equal Interval	"You tell # of classes<br>-: distribution of values not spread equally<br>+: evenly distributed data (on a histogram)<br>+: comparing data sets"
classification: Defined interval	You tell the width of classes<br>-: distribution of values also not spread equally across all classes<br>+: comparing data sets
classification:&nbsp;Quantile	Equal number of observations per class<br><br>+: symmetrical (normally distributed) data or mild/moderate skew (on histogram)<br>+: top/bottom percentile of values<br>-: doesn't consider natural gaps
classification: Standard deviation	divides data, categorizes into intervals of standard deviations above/below mean<br><br>+: to highlight how far values deviate from average<br>-: shows as z-scores<br>-: hard to understand
classification: Geometric class	Multiplicatively vary class widths<br><br>when data has a highly skewed distribution
central tendency: mode	- categorical data<br>- highest frequency value
central tendency: Qualitative ordinal data, use	Median
central tendency: Mean	"<ul><li>""typical"" score, best for normally distributed data</li><li>outliers have strong influence</li><li>ex bimodal data /\_......_/\, mean not useful</li><li>ex location, point data only for LA, NYC, mean not useful</li><li>ex calculated value may not be in dataset</li></ul>"
central tendency: Weighted mean center	pulls mean centre towards higher value weights
central tendency: Median center	location representing the shortest total distance to all other features<br><br>more robust to outliers
central tendency: Central feature	Chooses an *existing feature* in dataset that has shortest total distance to ALL other features
dispersion: normally distributed data	~68% within 1 std deviation from mean,<br>~95% within 2,<br>~99% within 3
dispersion: Standard distance	average distance of each feature to mean center<br>then use that distance as a radius centered on mean center<br><br>(Spatial equiv of std deviation)
Standard deviational ellipse	Like standard distance,<br><br>but calculate x and y coordinates to those of mean center
Dispersion	How spread out/ compact a dataset is around its location of central tendency
Central tendency	"Single location to summarize a set of locations, ""typical""/ ""average""/ most representative"
Defining neighbours: Number of neighbours	nbh defined by a specified number of features closest to the focal feature<br><br>distances can vary depending on density of features
Defining neighbours:&nbsp;Fixed distance	all features that fall within specified distance of focal features<br><br>num of neighbours depends on density of features in area
Defining neighbours:&nbsp;Network distance	Travel routes around focal feature<br><br>Fixed distance (no bridge) vs realistic
Contiguity: Raster/Vector polygons vs points	Only to polygons, because points have no edges
Delaunay triangulation	Contiguity but for points<br><br>Generates thiessen polygons on points, then use contiguity edges corners method
Defining neighbours: Contiguity edges (Rook)	shared border w/ focal feature considered a neighbour<br><br>(directly next to each other)
Defining neighbours:&nbsp;Contiguity edges corners (Queen)	border or corner shared feature considered neighbour
Spatial analysis issues	- MAUP, neighbourhood definition<br>- Boundary problem<br>- Spatial sampling<br>- Tobler
Cluster analysis	finding areas with unexpectedly high values, or finding groups of features with similar characteristics/values/locations, or finding point patterns in the landscape
GIS ML tool types:	- prediction<br>- classification<br>- clustering
density-based clustering	grouping of observations based on feature locations
Tobler's first law of geog	Everything is related to everything else,<br>but near things are more related than distant things
Modifiable Areal Unit Problem (MAUP)	Combined effects of scale and aggregation<br><br>how things are zoned AND the scale of geographic unit (then aggregated) can change outcomes<br><br>explore alternate zoning effects and hierarchical models
Spatial weights matrix	File that quantifies spatial relationships (neighbourhood/s) among a set of features
Traditional statistical tests can often be applied to spatial data, BUT...	doesn't account for Tobler's first law<br>and other spatial relationships<br><br>also, datasets w/ very different distributions can produce the same summary statistics
central tendency: Median	"""Middle"" value, often better choice for skewed data<br><ul><li>common for socio{demographic,economic} data</li><li>exact centre of distribution</li></ul>"
central tendency: What do outliers do?	
Dispersion: variance	difference between min and max values in a distribution<br>Heavily impacted by outliers (frequency of values not considered)
Dispersion: Standard deviation	sqr(variance)<br><br>variance: sum of (observations - mean)<sup>2</sup>&nbsp;all&nbsp; /&nbsp; mean
z-scores (Standard score)	standard deviations above/below mean<br><br>( observation - mean ) / stddev
central tendency: mean centre	spatial average (add up)
AI	ability of a machine to perform tasks traditionally requiring human intelligence
Machine learning	set of tools, algos, and techniques to allow computers to learn patterns in data and acquire info w/o human explicitly programming the process
Deep learning	Using trainable algos in the form of artificial neural networks (inspired by how human brain works)
DBSCAN	"# of features to be considered a cluster, max search distance<ul><li>FIXED SEARCH DISTANCE, to find clusters of similar densities</li><li>fastest computationally</li></ul>"
HDBSCAN	uses series of nested clusters and chooses levels that create stable clusters having as many members as possible<br><br><ul><li>can find clusters of varying densities</li><li>most data-driven</li></ul>
OPTICS	Uses reachability plot for distances between neighbours, peaks = big spatial jump, separates clusters<br><br><ul><li>ex 2 peaks in a row: noise point</li><li>can adjust sensitivity</li><li>most computationally-intensive</li></ul>
DBSCAN vs HDBSCAN	DBSCAN struggles w/ different densities unlike HDBSCAN, where search distances can vary
multivariate clustering	GROUPING of observations based on feature attributes
trad vs. spatially constrained multivariate clustering	traditional:<br><ul><li>not explicitly spatial, but can be applied to spatial problems/ data</li><li>Features in group more alike, but groups may not be spatially contiguous</li></ul>SC:<br><ul><li>Explicitly incorporate geography</li><li>AKA spatially contiguous groups but maybe less alike</li></ul>
MC: k-means	Finds <i>k</i>&nbsp;groups in data based on feature attributes<br><br><ul><li>think number of variables, plotted on a n-D space, find clusters from there</li><li>boxplot + map (but not inherently spatial)</li></ul>
SC-MvC: Minimum spanning tree	features laid out in data / 'spatial' space, connected based on how far they are based on location and attribute values<br><br>links are THEN broken in ways that keep clusters as distinct as possible
complete spatial randomness	reference spatial distribution, simulates random pattern<br><br>you can compare observations to CSR
negative spatial autocorrelation	similar values scattered across space, things closer together likely to have diff values<br><br>underlying process causes REGULAR DISPERSION
random pattern	mix of clustering and dispersion
positive spatial autocorrelation	similar values clustered in space<br>things closer together likely to have similar values<br><br>underlying process leads to clustering
p-value	probability that the pattern seen is the result of a random process<br><br>(low is high certainty that it's not)
Spatial autocorrelation: TEST TYPES	<ul><li>Global: whole study area, generalizes as summary statistic, NO MAP</li><li>Local: global on a subset of the study area</li><li>Scan Statistics: search multiple subsets, return where clustering/dispersal</li></ul>
Moran's I	Clustering, dispersion, both?<br><br>Clustered +ve<br>Dispersal -ve<br>z-score: high/low suggests pattern unlikely to be random
Getis-Ord G	Clustering?<br><br>G: Spatial density, high values indicate high value clustering, low ...
Getis-Ord Local Gi* (Scan statistic)	ID Hot and cold spots (clusters) relative to mean ACROSS study area at different confidence levels (p-value)<br><br>is local nbh average (z-score) significantly different from global average?<br>could have low features included in hot spot
Anselin Local Moran's I (LISA) (Scan statistic)	Clusters: high-high, low-low<br>Outliers: high-low, low-high (a feature doesn't match w/ rest of nbh)<br><br>Is nbh average significantly different from global average, and<br>is feature value significantly different from nbh average?
Bivariate Moran's I	Clusters across 2 variables<br><br>high-high, low-low, high-low, low-high clusters<br><br>better than comparing visually
linear mean: orientation vs direction	orientation: angle only<br>direction: average length and angle
Spatiotemporal pattern mining	Finding patterns in space and through time by analyzing snapshots of data
Problem w/ thematic/ hotspot map for each period of time	Effectiveness of visual analysis decreases as volume of data increases<br><br>- difficult to visually quantify trends/change<br>- assumes each time period is completely independent (Jan doesn't affect Feb)
Time-series analysis	looking at how spatial patterns have changed over time
Space-time cube	3D cube<br>x,y: location grid or polygon aggregation<br>z: time
Space-time cube: Bin	individual unit: unique spatiotemporal extent
Space-time cube: location	column of bins: same location, different temporal extents
Space-time cube: Time Slice	row of bins that share the same temporal extent (imagine off the top: most recent)<br><br>standalone analysis: 1 slice
Space-time cube: Aggregation	Polygons: probably should standardize<br><br>Bins (same size: no need to standardize):<br>- fishnet / square grid: quicker than hexagon<br>- hexagon: more edges = more neighbours, closest approximation to circle that fits nicely
Space-time cube: Modifiable temporal unit problem	Similar to MAUP but over time: How we aggregate time can affect our analysis<br><br>Data aggregated to months results != years<br>Consider: seasonality of data, but also more broad patterns<br><br>Also: Feb (less days = less car crashes)<br>Ensure aggregation divides evenly among data or chop off the oldest bit
Emerging hot spot analysis	Getis-Ord Gi*: Spatial + temporal (+-1) neighbours, compares to whole study area for hot/cold spots<br>Difference: z-score, Significance: p-value<br><br>Results:<br>3D: clustering/significance of clustering at each bin over time for entire space-time cube<br>2D: Top is summarized<br><br>17 categories (8+8+1): consider combining/removing categories depending on audience<br>can also examine a category (why is there oscillation?)
Local Outlier Analysis	"Extension of Anselin Local Moran's I: IDs clusters and local outliers in space + time (not as detailed as emerging hotspot analysis)<br><br>3D entire output:&nbsp;<i>bin value --&nbsp;neighbourhood value</i><br>2D summary output: no indication of <b>changes</b> to significance of clusters/outliers<br>- ""only"": has only ever been that<br>- multiple types, never significant"