Model Selection and Spatial Analysis

Sample Submission Note

This sample is an assignment submitted to obtain my Maximum Likelihood Estimation II certification (2025). This submission exemplifies my competence not only in coding to answer the questions in a presentable format, but also in providing substantive explanations of the tests and results.

The assignment was not originally meant to be submitted in Markdown or HTML, but I took the time to create these types of documents with additional explanations of concepts for each assignment for my own learning purposes, as well as others. I have shared these with friends and colleagues and they have received rave reviews! If the HTML version is of interest, please email me.

NOTE: Code chunks automatically appear above the associated subsection.

Assignment Introduction & Variables

For this assignment, you will conduct a cross-sectional spatial analysis utilizing U.S. state policy innovativeness data, located on the Canvas site. The DV is adopt_pct, which represents the average proportion of policy adoption in each state for the 20th century. For IVs, you will need to include the following variables: mean total population (totpop), urban percentage (urban_pct), legislative professionalism (legp_king), and the US state policy legislative index (inst6002).

1. Mapping

1.1. Create a map of adopt_pct by state.

#load packages as necessary
library(sf)
library(dplyr)

original_data <- import("Week 3_MLE II/MLEII2025_Assign3_updated/MLEII2025_Assign3/innovativeness.dta") |> 
  rename_with(tolower)

shape_data <- st_read("Week 3_MLE II/MLEII2025_Assign3_updated/MLEII2025_Assign3/s_01au07/s_01au07.shp") |>
  rename_with(tolower)

## Reading layer `s_01au07' from data source 
##   `/Users/rachelurban/Library/Mobile Documents/com~apple~CloudDocs/Desktop/YR1-4 UNL PhD/9 ICSPR Lectures/4_MLE2/Week 3_MLE II/MLEII2025_Assign3_updated/MLEII2025_Assign3/s_01au07/s_01au07.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 57 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1334 ymin: -14.60409 xmax: 179.7882 ymax: 71.39805
## Geodetic CRS:  NAD83

map_data_joined <- left_join(shape_data, original_data,
                             by = "state") |> 
  filter(!state %in% c("AK", "HI")) |> 
  drop_na(adopt_pct, totpop, urban_pct, legp_king, inst6002)

map_data_clean <- st_make_valid(map_data_joined)

#plot cleaned map

ggplot(map_data_clean) +
  geom_sf(aes(fill = adopt_pct), color = "white") +
  scale_fill_viridis_c(option = "plasma", na.value = "grey90") +
  labs(fill = "Policy Adoption (%)",
       title = "Average Policy Adoption by State") +
  theme_minimal() +
  coord_sf(crs = st_crs(5070)) +
  theme(legend.position = c(0.15, 0.25),   # (x, y) in plot coordinates
        legend.background = element_rect(fill = alpha("white", 0.7)),
        legend.title = element_text(size = 10),
        legend.text  = element_text(size = 8))

1.2. Provide a brief description of any clustering patterns you observe.

The clustering observed in the univariate map shows a diffusion of policy adoption in each state. Generally, the lighter colors on the viridis color scale (see legend) have a flow from one state to neighbors that get darker. One regional area that does not follow this pattern directly, however, is that of California and it’s geographic neighbors Nevada and Arizona. The starkest difference is between California (yellow) and Nevada (purple). This is not the diffusion we’d expect despite the behavioral diffusion hypothesis seems to stand across the rest of the country.

2. OLS Estimate + Diagnostics

Estimate an OLS model and perform the Moran’s I diagnostic test on the residuals. Make sure the spatial weights matrix is row-normalized.

Ordinary Least Squares (OLS) Coefficient Estimates
Predictor Variable	Coefficient Estimate	Std. Error	t-statistic	p-value	Lower CI (95%)	Upper CI (95%)
(Intercept)	0.013	0.004	3.271	0.002	0.005	0.021
totpop	0.001	0.000	1.455	0.153	0.000	0.001
urban_pct	0.034	0.008	4.339	0.000	0.018	0.050
legp_king	0.014	0.014	0.974	0.335	-0.015	0.042
inst6002	0.014	0.007	2.139	0.038	0.001	0.028

library(spdep)

# Queen contiguity neighbor list
neighbors_queen_list <- poly2nb(map_data_clean, 
                                queen = TRUE)

# Row-normalized spatial weights
weights_queen <- nb2listw(neighbors_queen_list, 
                     style = "W", 
                     zero.policy = TRUE)

weights_queen

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48 
## Number of nonzero links: 214 
## Percentage nonzero weights: 9.288194 
## Average number of links: 4.458333 
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 48 2304 48 23.94514 197.3707

# Extract residuals
ols_resid <- residuals(ols_model)

# Moran's I test
moran.test(ols_resid, weights_queen, zero.policy = TRUE)

## 
##  Moran I test under randomisation
## 
## data:  ols_resid  
## weights: weights_queen    
## 
## Moran I statistic standard deviate = 0.75353, p-value = 0.2256
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.05235897       -0.02127660        0.00954938

# Clean up presentation
## broom package gets weird (naming conventions & doesn't like 'rename()', so a 
## bullet proof way is building a df. )

m_test <- moran.test(ols_resid, weights_queen, zero.policy = TRUE)


moran_tidy <- data.frame(`Moran's I Statistic (I)` = as.numeric(m_test$estimate[1]),
                        `Expectation` = as.numeric(m_test$estimate[2]),
                        `Variance` = as.numeric(m_test$estimate[3]),
                        `Standard Deviate (Z)` = as.numeric(m_test$statistic),
                        `p-value` = as.numeric(m_test$p.value),
                        check.names = FALSE) |> 
  mutate(across(where(is.numeric), ~ round(., 4)))

# Render as a clean table matching the OLS table style
knitr::kable(moran_tidy, 
             caption = "Moran's I Test for Residual Spatial Autocorrelation")

Moran’s I Test for Residual Spatial Autocorrelation
Moran’s I Statistic (I)	Expectation	Variance	Standard Deviate (Z)	p-value
0.0524	-0.0213	0.0095	0.7535	0.2256

3. Spatial Estimation

3.1.1. Estimate a spatial error model. **Make sure the spatial weights matrix is row-normalized for both models.

spatial_error_model <- errorsarlm(adopt_pct ~ totpop + 
                                    urban_pct + 
                                    legp_king + 
                                    inst6002,
                                  data = map_data_clean,
                                  listw = weights_queen,
                                  zero.policy = TRUE)

#summary(spatial_error_model)

3.1.2. Estimate a spatial lag (of Y) model. **Make sure the spatial weights matrix is row-normalized for both models.

spatial_lag_model <- lagsarlm(adopt_pct ~ totpop + 
                                urban_pct + 
                                legp_king + 
                                inst6002,
                      data = map_data_clean,
                      listw = weights_queen,
                      zero.policy = TRUE)

#summary(spatial_lag_model)

3.2. Report the two spatial models and the OLS model in a single table.

library(stargazer) #load stargazer as necessary

stargazer(ols_model, spatial_error_model, spatial_lag_model,
          type = "latex",              
          column.labels = c("OLS (Baseline)", "Spatial Error", "Spatial Lag"),
          dep.var.labels = "Policy Adoption (\\%)",
          covariate.labels = c("Total Population", "Urban %", "Legislative Professionalism", "Institutional Score"),
          digits = 4,
          no.space = TRUE,
          header = FALSE,
          title = "Comparative Model Output Matrix")

3.3. Compare the results to the results from the OLS model.

Both the AIC and BIC values are used for testing models’ “fit,” mathematically penalizing them based on their complexity, either by the number of parameters or observations (explanations below). The AIC is better for predicting future outcomes as it tests the relative quality of the models, whereas BIC is better for structural interpretation of underlying mechanisms in a model as it more harshly tests models to find the “true” structure. However, because the latter is more conservative, avoid overcomplicating the model; the more you add, the more it is penalized.

Mathematical differences:

AIC (Akaike Information Criterion): Penalizes by \(2k\) (where \(k\) is the number of parameters).
BIC (Bayesian Information Criterion): Penalizes by \(k \ln(n)\) (where \(n\) is the number of observations/data points).

# For a simple/raw Normalized Weighted AIC visual: 
#AIC(ols_model, spatial_error_model, spatial_lag_model)

# For a simple/raw Normalized Weighted BIC visual: 
#BIC(ols_model, spatial_error_model, spatial_lag_model)

# Combine into table (again, use a data frame to avoid labeling issues): 
fit_comparison <- data.frame(Model = c("OLS Baseline", "Spatial Error", "Spatial Lag"),
                             AIC = c(AIC(ols_model), 
                                     AIC(spatial_error_model), 
                                     AIC(spatial_lag_model)),
                             BIC = c(BIC(ols_model), 
                                     BIC(spatial_error_model), 
                                     BIC(spatial_lag_model))) |> 
  mutate(across(where(is.numeric), ~ round(., 4)))

# Render structured table
knitr::kable(fit_comparison, 
      caption = "Comparative Model Fit and Information Criteria Statistics")

Comparative Model Fit and Information Criteria Statistics
Model	AIC	BIC
OLS Baseline	-339.7324	-328.5052
Spatial Error	-338.0127	-324.9143
Spatial Lag	-337.8497	-324.7513

According to the AIC & BIC values across the models, there are not large differences between the models, but the spatial lag (AIC = -337.85, BIC = -324.75) appears to perform better than the spatial error (AIC = -338.10, BIC = -324.91) and OLS (AIC = -339.73, BIC = -328.51) models.

4. Repeat questions #2 and #3 with a non-normalized spatial weights matrix. Are there any noticeable differences between the two sets of results?

AIC model values were estimated to minimize information loss and maximize predictive accuracy out-of-sample, in addition to the above explanation. Because AIC is more tolerant of complexity than BIC — especially given our small sample size (n = 57) — it guards against the risk of under-fitting critical spatial dependencies.

The OLS models do not perform differently between the normalized and non-normalized weights. The spatial error model with special weights performs slightly better (AIC = -337.82) than the normalized weights (AIC = -338.03). However, the spatial lag with normalized weights performed significantly better (AIC = -337.85) compared to the non-normalized weights (AIC = -341.06).

It appears that the spatial dependence is weak overall — which is why AIC changes are small and Moran’s I is low. The slight improvement in the spatial lag model with binary weights suggests that a few high-adoption states surrounded by other high-adoption states may be influencing results when we don’t scale the influence of neighbors down. Since both spatial error and lag models give similar conclusions to OLS, one could argue that there’s limited practical spatial dependence here, even though model fit changes slightly depending on the weight matrix.

4.1. OLS Non-normalized Replication

non_normalized_queen_B <- nb2listw(neighbors_queen_list, 
                            style = "B",                  #"B" = binary
                            zero.policy = TRUE)

# Summary:
non_normalized_queen_B

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 48 
## Number of nonzero links: 214 
## Percentage nonzero weights: 9.288194 
## Average number of links: 4.458333 
## 
## Weights style: B 
## Weights constants summary:
##    n   nn  S0  S1   S2
## B 48 2304 214 428 4296

# Binary OLS Model Estimation
ols_model_B <- lm(adopt_pct ~ totpop + 
                    urban_pct + 
                    legp_king + 
                    inst6002,
                  data = map_data_clean)

# Raw summary:
#summary(ols_model_B)

# Non-normalized weights from cleaned data
neighbors_queen_list_B <- poly2nb(map_data_clean, 
                                  queen = TRUE)

weights_queen_B <- nb2listw(neighbors_queen_list_B, 
                            style = "B", 
                            zero.policy = TRUE)

ols_resid_B <- residuals(ols_model_B)

moran.test(ols_resid_B, 
           weights_queen_B, 
           zero.policy = TRUE)

## 
##  Moran I test under randomisation
## 
## data:  ols_resid_B  
## weights: weights_queen_B    
## 
## Moran I statistic standard deviate = 0.50433, p-value = 0.307
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.024722801      -0.021276596       0.008319183

# Clean up presentation
## AGAIN: broom package gets weird (naming conventions & doesn't like 'rename()',
## so bullet proof way is building a df.)

m_test_B <- moran.test(ols_resid_B, 
                       weights_queen_B, 
                       zero.policy = TRUE)

moran_tidy_B <- data.frame(`Moran's I Statistic (I)` = as.numeric(m_test_B$estimate[1]),
                        `Expectation` = as.numeric(m_test_B$estimate[2]),
                        `Variance` = as.numeric(m_test_B$estimate[3]),
                        `Standard Deviate (Z)` = as.numeric(m_test_B$statistic),
                        `p-value` = as.numeric(m_test_B$p.value),
                        check.names = FALSE) |> 
  mutate(across(where(is.numeric), ~ round(., 4)))

# Render as a clean table matching the OLS table style
knitr::kable(moran_tidy_B, 
             caption = "Moran's I Test for Residual Spatial Autocorrelation with Non-normalized Weights")

Moran’s I Test for Residual Spatial Autocorrelation with Non-normalized Weights
Moran’s I Statistic (I)	Expectation	Variance	Standard Deviate (Z)	p-value
0.0247	-0.0213	0.0083	0.5043	0.307

4.2. Spatial Error Non-normalized Replication

spatial_error_model_non_norm <- errorsarlm(adopt_pct ~ totpop + 
                                             urban_pct + 
                                             legp_king + 
                                             inst6002,
                                           data = map_data_joined,
                                           listw = weights_queen_B,
                                           zero.policy = TRUE)


# Raw output
#summary(spatial_error_model_non_norm)

4.3. Spatial Lag Non-normalized Replication

spatial_lag_model_non_norm <- lagsarlm(adopt_pct ~ totpop + 
                                               urban_pct + 
                                               legp_king + 
                                               inst6002,
                                             data = map_data_joined,
                                             listw = weights_queen_B,
                                             zero.policy = TRUE)

# Raw output
#summary(spatial_lag_model_non_norm)

4.4. Compare the results to the results between the normalized and non-normalized models.

AIC model values were estimated to minimize information loss and maximize predictive accuracy out-of-sample. Because AIC is more tolerant of complexity than BIC — especially given our small sample size (n = 57) — it guards against the risk of under-fitting critical spatial dependencies.

knitr::kable(AIC(ols_model, ols_model_B, 
                 spatial_error_model, spatial_error_model_non_norm,
                 spatial_lag_model, spatial_lag_model_non_norm))

	df	AIC
ols_model	6	-339.7324
ols_model_B	6	-339.7324
spatial_error_model	7	-338.0127
spatial_error_model_non_norm	7	-337.8126
spatial_lag_model	7	-337.8497
spatial_lag_model_non_norm	7	-341.0551

5. Explain the consequences of modeling spatial interaction effects via an OLS model.

Fitting a standard OLS model on data that has true spatial dependence could lead to several problems in the results and in the subsequent interpretation.

First, the covariate coefficients can become biased when the DV is influenced by neighboring units (spatial lag), omitting that spatial interaction term creates endogeneity because it is correlated with the error term. This endogeneity makes the model statistically biased and otherwise unreliable because the omitted variable is mathematically linked to the calculated error.

Second, on the other hand, if the spatial dependence is in the error term rather than in Y, the coefficient estimates remain unbiased (meaning it is still statistically and mathematically reliable), but are not efficient anymore (i.e., making the “best use” of the data). The standard errors will be underestimated (i.e., calculating less uncertainty than there actually is), leading to inflated t-statistics (i.e., the measure of signal versus noise) will look unnaturally high and an increased risk of committing a Type I Error (i.e., the numbers incorrectly indicate there is statistical significance, when in reality, there is not).

Third, ignoring the spatial dependence (in this case, spatially-based patterns) may produce inflated R2’s and misleading residual diagnostics, leading to inferential inaccuracies (e.g., “fake” or incorrect accuracy) regarding the predictions, particularly where there is strong spatial clusters. Ultimately, these issues can lead to bad conclusions.