Skip to contents

Introduction

This vignette covers advanced usage of the synthdid package, including custom regularization, time-varying covariates, diagnostic checks, and large-scale applications. It assumes familiarity with the basic package functionality covered in the main vignette.

Custom Regularization

Understanding Regularization Parameters

The synthdid estimator uses ridge regularization controlled by two parameters:

  • eta.omega: Controls regularization for unit weights (ω)
  • eta.lambda: Controls regularization for time weights (λ)

These are scaled by the estimated noise level σ to get the actual regularization: - zeta.omega = eta.omega * noise.level - zeta.lambda = eta.lambda * noise.level

Default Values

# Default regularization
# eta.omega = (N1 * T1)^(1/4)  where N1 = treated units, T1 = treated periods
# eta.lambda = 1e-6  (essentially no regularization for time weights)

data(california_prop99)
setup <- panel.matrices(california_prop99)

tau.default <- synthdid_estimate(setup$Y, setup$N0, setup$T0)

When to Adjust Regularization

Increase regularization when: - Optimization doesn’t converge - Weights are extremely concentrated (few units get all the weight) - You have many similar control units (collinearity) - Estimates are unstable across specifications

Decrease regularization when: - You want weights closer to the unregularized solution - Data has low noise (good fit is achievable) - You have strong prior knowledge of appropriate controls

Examples of Custom Regularization

# More regularization on unit weights
tau.high.omega <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
                                     eta.omega = 2)

# More regularization on time weights
tau.high.lambda <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
                                      eta.lambda = 0.01)

# Both
tau.high.both <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
                                    eta.omega = 2, eta.lambda = 0.01)

# Compare estimates
c(default = tau.default,
  high.omega = tau.high.omega,
  high.lambda = tau.high.lambda,
  high.both = tau.high.both)

Grid Search for Optimal Regularization

# Define grid
eta_omega_grid <- c(0.1, 0.5, 1, 2, 5)
eta_lambda_grid <- c(1e-6, 1e-4, 1e-3, 1e-2)

# Cross-validation function (simplified)
cv_error <- function(eta_omega, eta_lambda) {
  # Use placebo test to assess quality
  tau.placebo <- synthdid_placebo(
    synthdid_estimate(setup$Y, setup$N0, setup$T0,
                      eta.omega = eta_omega, eta.lambda = eta_lambda)
  )
  # Placebo should be close to zero
  abs(tau.placebo)
}

# Grid search (simplified example)
results <- expand.grid(eta_omega = eta_omega_grid, eta_lambda = eta_lambda_grid)
results$cv_error <- mapply(cv_error, results$eta_omega, results$eta_lambda)

# Best parameters
best_params <- results[which.min(results$cv_error), ]
print(best_params)

Time-Varying Covariates

Including Covariates

Time-varying covariates can improve estimates by controlling for confounders:

# Create a 3D array: N x T x K where K is number of covariates
N <- nrow(setup$Y)
T <- ncol(setup$Y)
K <- 2  # Two covariates

X <- array(dim = c(N, T, K))

# Fill with covariate data (example: economic indicators)
# X[,,1] <- gdp_matrix
# X[,,2] <- unemployment_matrix

# Estimate with covariates
tau.cov <- synthdid_estimate(setup$Y, setup$N0, setup$T0, X = X)

# Access covariate coefficients
beta <- attr(tau.cov, "weights")$beta
print(beta)

Memory Considerations with Covariates

Covariates can significantly increase memory usage:

# Estimate memory requirements
mem_no_cov <- synthdid_memory_estimate(N = 100, T = 50, K = 0)
mem_with_cov <- synthdid_memory_estimate(N = 100, T = 50, K = 10)

print(mem_no_cov)
print(mem_with_cov)

The package uses zero-copy memory views to minimize overhead, but large K can still be expensive.

Best Practices for Covariates

  1. Standardize covariates: Scale to similar magnitudes
  2. Avoid collinearity: Check correlations between covariates
  3. Use domain knowledge: Include covariates that theory suggests are important
  4. Check sensitivity: Compare estimates with and without covariates
tau.no.cov <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
tau.with.cov <- synthdid_estimate(setup$Y, setup$N0, setup$T0, X = X)

c(without.covariates = tau.no.cov, with.covariates = tau.with.cov)

Diagnostic Checks

Pre-Treatment Fit Quality

Check how well the synthetic control matches the treated unit in pre-treatment periods:

tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0)

# Visualize fit
synthdid_plot(tau.hat)

# Quantify fit with RMSE
weights <- attr(tau.hat, "weights")
Y <- attr(tau.hat, "setup")$Y
N0 <- attr(tau.hat, "setup")$N0
T0 <- attr(tau.hat, "setup")$T0

# Synthetic control prediction for treated unit in pre-period
Y_control_pre <- Y[1:N0, 1:T0]
Y_treated_pre <- Y[(N0+1):nrow(Y), 1:T0]
Y_synth_pre <- t(weights$omega) %*% Y_control_pre

# RMSE
rmse <- sqrt(mean((Y_treated_pre - Y_synth_pre)^2))
print(rmse)

Weight Diagnostics

Examine the distribution of weights:

# Top control units
synthdid_controls(tau.hat, weight.type = "omega")

# Top time periods
synthdid_controls(tau.hat, weight.type = "lambda")

# Weight concentration (effective number of units/periods)
omega <- attr(tau.hat, "weights")$omega
lambda <- attr(tau.hat, "weights")$lambda

n_effective_units <- 1 / sum(omega^2)
n_effective_periods <- 1 / sum(lambda^2)

cat(sprintf("Effective N0: %.1f (out of %d)\n", n_effective_units, N0))
cat(sprintf("Effective T0: %.1f (out of %d)\n", n_effective_periods, T0))

Interpretation: - Low effective N or T suggests few units/periods drive the estimate - Can indicate sensitivity to specific controls - May warrant robustness checks (leave-one-out, etc.)

While synthdid doesn’t assume strict parallel trends, checking them is still informative:

# Plot trends for treated vs synthetic control
synthdid_plot(tau.hat, trajectory.alpha = 0.8)

# Formal placebo test
tau.placebo <- synthdid_placebo(tau.hat, treated.fraction = 0.5)

# Placebo should be close to zero if trends are parallel
c(estimate = tau.hat, placebo = tau.placebo)

# Plot placebo comparison
synthdid_placebo_plot(tau.hat, treated.fraction = 0.5)

Sensitivity Analysis

Leave-One-Out Analysis

# Remove each control unit and re-estimate
N0 <- attr(tau.hat, "setup")$N0
loo_estimates <- sapply(1:N0, function(i) {
  Y_loo <- setup$Y[-i, ]
  synthdid_estimate(Y_loo, N0 - 1, setup$T0)
})

# Check sensitivity
summary(loo_estimates)
plot(loo_estimates, main = "Leave-One-Out Estimates",
     ylab = "Treatment Effect", xlab = "Dropped Unit")
abline(h = tau.hat, col = "red", lty = 2)

Varying Time Window

# Try different pre-treatment window lengths
T0_values <- seq(10, setup$T0, by = 5)
window_estimates <- sapply(T0_values, function(t0) {
  Y_window <- setup$Y[, 1:(t0 + (ncol(setup$Y) - setup$T0))]
  synthdid_estimate(Y_window, setup$N0, t0)
})

plot(T0_values, window_estimates, type = "b",
     xlab = "Pre-treatment periods", ylab = "Treatment Effect")

Large-Scale Applications

Parallel Processing

For computationally intensive tasks (especially standard errors), use parallel processing:

library(future)

# Set up parallel backend
plan(multisession, workers = 4)  # Use 4 cores

# Bootstrap SE will automatically use parallel workers
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
                              estimate_se = TRUE,
                              se_method = "bootstrap",
                              se_replications = 1000)

# Remember to close parallel workers when done
plan(sequential)

BLAS Thread Management

The package automatically manages BLAS threads to prevent oversubscription:

# This happens automatically, but you can verify:
# - During estimation, BLAS threads are limited
# - After estimation, original thread count is restored

# For manual control (advanced):
# source("R/blas_threads.R")
# original <- get_blas_num_threads()
# set_blas_num_threads(2)
# # ... computation ...
# set_blas_num_threads(original)

Memory-Efficient Workflows

For very large problems:

# 1. Estimate without SE first
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0, estimate_se = FALSE)

# 2. Save estimate
saveRDS(tau.hat, "tau_estimate.rds")

# 3. Compute SE in separate session with more memory
tau.hat <- readRDS("tau_estimate.rds")
se <- sqrt(vcov(tau.hat, method = "jackknife"))  # Jackknife is memory-efficient

# 4. Store SE as attribute
attr(tau.hat, "se") <- se
attr(tau.hat, "se_method") <- "jackknife"

Batch Processing Multiple Datasets

# Function to process one dataset
process_dataset <- function(data_file) {
  data <- readRDS(data_file)
  setup <- panel.matrices(data)

  tau <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
                            estimate_se = TRUE, se_method = "jackknife")

  list(
    estimate = c(tau),
    se = attr(tau, "se"),
    converged = synthdid_converged(tau)
  )
}

# Process all files
data_files <- list.files("data/", pattern = "*.rds", full.names = TRUE)
results <- lapply(data_files, process_dataset)

# Summarize
results_df <- do.call(rbind, lapply(results, as.data.frame))
print(results_df)

Advanced Plotting

Customizing synthdid_plot

library(ggplot2)

# Basic plot
p <- synthdid_plot(tau.hat)

# Customize with ggplot2
p +
  ggtitle("California Prop 99: Effect on Cigarette Sales") +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_color_manual(values = c("treated" = "darkred",
                                 "synthetic control" = "steelblue"))

Comparing Multiple Estimates

# Estimate with different methods
tau.did <- did_estimate(setup$Y, setup$N0, setup$T0)
tau.sc <- sc_estimate(setup$Y, setup$N0, setup$T0)
tau.sdid <- synthdid_estimate(setup$Y, setup$N0, setup$T0)

# Plot all three
estimates <- list(tau.did, tau.sc, tau.sdid)
synthdid_plot(estimates, facet = c("DID", "SC", "SynthDID"),
              facet.vertical = TRUE)

# Or overlay in same plot
synthdid_plot(estimates, facet = c("All", "All", "All"),
              alpha.multiplier = c(0.5, 0.5, 1))

Animated Plots

# Create animation showing effect of overlay parameter
overlay_values <- seq(0, 1, by = 0.1)

plots <- lapply(overlay_values, function(ov) {
  synthdid_plot(tau.hat, overlay = ov) +
    ggtitle(sprintf("Overlay = %.1f", ov))
})

# Combine into animation (requires gganimate)
# library(gganimate)
# animation <- plots %>% transition_states(overlay)

Working with the Formula Interface

Advanced Formula Specifications

# Basic usage
result <- synthdid(PacksPerCapita ~ treated,
                   data = california_prop99,
                   index = c("State", "Year"))

# Extract and inspect components
coef(result)
confint(result)
summary(result)

# Predictions
predict(result, type = "counterfactual")
predict(result, type = "effect")

# Residual diagnostics
resids <- residuals(result, type = "pretreatment")
plot(resids, main = "Pre-treatment Residuals")

Model Comparison with Formula Interface

# Fit all three estimators
fit_did <- synthdid(PacksPerCapita ~ treated, data = california_prop99,
                    index = c("State", "Year"), method = "did")
fit_sc <- synthdid(PacksPerCapita ~ treated, data = california_prop99,
                   index = c("State", "Year"), method = "sc")
fit_sdid <- synthdid(PacksPerCapita ~ treated, data = california_prop99,
                     index = c("State", "Year"), method = "synthdid")

# Compare
cbind(
  DID = coef(fit_did),
  SC = coef(fit_sc),
  SynthDID = coef(fit_sdid)
)

Simulation Studies

Power Analysis

# Function to simulate data and test
simulate_and_test <- function(true_effect = 1, n_sim = 100) {
  data <- random.low.rank()

  # Add treatment effect
  N0 <- data$N0
  T0 <- data$T0
  data$Y[(N0+1):nrow(data$Y), (T0+1):ncol(data$Y)] <-
    data$Y[(N0+1):nrow(data$Y), (T0+1):ncol(data$Y)] + true_effect

  # Estimate
  tau.hat <- synthdid_estimate(data$Y, data$N0, data$T0,
                                estimate_se = TRUE, se_method = "jackknife")

  # Test
  se <- attr(tau.hat, "se")
  z_stat <- tau.hat / se
  p_value <- 2 * pnorm(-abs(z_stat))

  list(estimate = c(tau.hat), se = se, p_value = p_value)
}

# Run simulations
set.seed(123)
sims <- replicate(100, simulate_and_test(true_effect = 1), simplify = FALSE)

# Power
power <- mean(sapply(sims, function(s) s$p_value < 0.05))
cat(sprintf("Power: %.2f\n", power))

# Coverage
coverage <- mean(sapply(sims, function(s) {
  ci_lower <- s$estimate - 1.96 * s$se
  ci_upper <- s$estimate + 1.96 * s$se
  ci_lower <= 1 && 1 <= ci_upper
}))
cat(sprintf("95%% CI Coverage: %.2f\n", coverage))

Placebo Studies

# Generate placebo distribution
n_placebos <- 100
placebo_estimates <- replicate(n_placebos, {
  data <- random.low.rank()  # No treatment effect
  synthdid_estimate(data$Y, data$N0, data$T0)
})

# Plot distribution
hist(placebo_estimates, breaks = 30, main = "Placebo Distribution",
     xlab = "Estimate", col = "lightblue")
abline(v = 0, col = "red", lwd = 2, lty = 2)

# Compare to actual estimate
abline(v = tau.hat, col = "darkblue", lwd = 2)
legend("topright", legend = c("True null", "Actual estimate"),
       col = c("red", "darkblue"), lwd = 2, lty = c(2, 1))

Performance Optimization

Early Stopping and Sparsification

The package implements several performance optimizations:

# Disable sparsification (for comparison)
tau.no.sparse <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
                                    sparsify = NULL)

# Enable sparsification (default)
tau.with.sparse <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
                                      sparsify = sparsify_function)

# Custom sparsification function
my_sparsify <- function(v) {
  # More aggressive sparsification
  v[v <= max(v) / 10] <- 0
  v / sum(v)
}

tau.custom.sparse <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
                                        sparsify = my_sparsify)

Benchmark Different Approaches

library(microbenchmark)

# Compare SE methods
benchmark <- microbenchmark(
  bootstrap = vcov(tau.hat, method = "bootstrap", replications = 100),
  jackknife = vcov(tau.hat, method = "jackknife"),
  placebo = vcov(tau.hat, method = "placebo", replications = 100),
  times = 10
)

print(benchmark)

References

Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic difference-in-differences. American Economic Review, 111(12), 4088-4118.

See Also