Advanced Topics in synthdid
synthdid package authors
2026-02-14
Source:vignettes/advanced-topics.Rmd
advanced-topics.RmdIntroduction
This vignette covers advanced usage of the synthdid
package, including custom regularization, time-varying covariates,
diagnostic checks, and large-scale applications. It assumes familiarity
with the basic package functionality covered in the main vignette.
Custom Regularization
Understanding Regularization Parameters
The synthdid estimator uses ridge regularization controlled by two parameters:
-
eta.omega: Controls regularization for unit weights (ω) -
eta.lambda: Controls regularization for time weights (λ)
These are scaled by the estimated noise level σ to get the actual
regularization: - zeta.omega = eta.omega * noise.level -
zeta.lambda = eta.lambda * noise.level
Default Values
# Default regularization
# eta.omega = (N1 * T1)^(1/4) where N1 = treated units, T1 = treated periods
# eta.lambda = 1e-6 (essentially no regularization for time weights)
data(california_prop99)
setup <- panel.matrices(california_prop99)
tau.default <- synthdid_estimate(setup$Y, setup$N0, setup$T0)When to Adjust Regularization
Increase regularization when: - Optimization doesn’t converge - Weights are extremely concentrated (few units get all the weight) - You have many similar control units (collinearity) - Estimates are unstable across specifications
Decrease regularization when: - You want weights closer to the unregularized solution - Data has low noise (good fit is achievable) - You have strong prior knowledge of appropriate controls
Examples of Custom Regularization
# More regularization on unit weights
tau.high.omega <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
eta.omega = 2)
# More regularization on time weights
tau.high.lambda <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
eta.lambda = 0.01)
# Both
tau.high.both <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
eta.omega = 2, eta.lambda = 0.01)
# Compare estimates
c(default = tau.default,
high.omega = tau.high.omega,
high.lambda = tau.high.lambda,
high.both = tau.high.both)Grid Search for Optimal Regularization
# Define grid
eta_omega_grid <- c(0.1, 0.5, 1, 2, 5)
eta_lambda_grid <- c(1e-6, 1e-4, 1e-3, 1e-2)
# Cross-validation function (simplified)
cv_error <- function(eta_omega, eta_lambda) {
# Use placebo test to assess quality
tau.placebo <- synthdid_placebo(
synthdid_estimate(setup$Y, setup$N0, setup$T0,
eta.omega = eta_omega, eta.lambda = eta_lambda)
)
# Placebo should be close to zero
abs(tau.placebo)
}
# Grid search (simplified example)
results <- expand.grid(eta_omega = eta_omega_grid, eta_lambda = eta_lambda_grid)
results$cv_error <- mapply(cv_error, results$eta_omega, results$eta_lambda)
# Best parameters
best_params <- results[which.min(results$cv_error), ]
print(best_params)Time-Varying Covariates
Including Covariates
Time-varying covariates can improve estimates by controlling for confounders:
# Create a 3D array: N x T x K where K is number of covariates
N <- nrow(setup$Y)
T <- ncol(setup$Y)
K <- 2 # Two covariates
X <- array(dim = c(N, T, K))
# Fill with covariate data (example: economic indicators)
# X[,,1] <- gdp_matrix
# X[,,2] <- unemployment_matrix
# Estimate with covariates
tau.cov <- synthdid_estimate(setup$Y, setup$N0, setup$T0, X = X)
# Access covariate coefficients
beta <- attr(tau.cov, "weights")$beta
print(beta)Memory Considerations with Covariates
Covariates can significantly increase memory usage:
# Estimate memory requirements
mem_no_cov <- synthdid_memory_estimate(N = 100, T = 50, K = 0)
mem_with_cov <- synthdid_memory_estimate(N = 100, T = 50, K = 10)
print(mem_no_cov)
print(mem_with_cov)The package uses zero-copy memory views to minimize overhead, but large K can still be expensive.
Best Practices for Covariates
- Standardize covariates: Scale to similar magnitudes
- Avoid collinearity: Check correlations between covariates
- Use domain knowledge: Include covariates that theory suggests are important
- Check sensitivity: Compare estimates with and without covariates
tau.no.cov <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
tau.with.cov <- synthdid_estimate(setup$Y, setup$N0, setup$T0, X = X)
c(without.covariates = tau.no.cov, with.covariates = tau.with.cov)Diagnostic Checks
Pre-Treatment Fit Quality
Check how well the synthetic control matches the treated unit in pre-treatment periods:
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
# Visualize fit
synthdid_plot(tau.hat)
# Quantify fit with RMSE
weights <- attr(tau.hat, "weights")
Y <- attr(tau.hat, "setup")$Y
N0 <- attr(tau.hat, "setup")$N0
T0 <- attr(tau.hat, "setup")$T0
# Synthetic control prediction for treated unit in pre-period
Y_control_pre <- Y[1:N0, 1:T0]
Y_treated_pre <- Y[(N0+1):nrow(Y), 1:T0]
Y_synth_pre <- t(weights$omega) %*% Y_control_pre
# RMSE
rmse <- sqrt(mean((Y_treated_pre - Y_synth_pre)^2))
print(rmse)Weight Diagnostics
Examine the distribution of weights:
# Top control units
synthdid_controls(tau.hat, weight.type = "omega")
# Top time periods
synthdid_controls(tau.hat, weight.type = "lambda")
# Weight concentration (effective number of units/periods)
omega <- attr(tau.hat, "weights")$omega
lambda <- attr(tau.hat, "weights")$lambda
n_effective_units <- 1 / sum(omega^2)
n_effective_periods <- 1 / sum(lambda^2)
cat(sprintf("Effective N0: %.1f (out of %d)\n", n_effective_units, N0))
cat(sprintf("Effective T0: %.1f (out of %d)\n", n_effective_periods, T0))Interpretation: - Low effective N or T suggests few units/periods drive the estimate - Can indicate sensitivity to specific controls - May warrant robustness checks (leave-one-out, etc.)
Parallel Trends Assessment
While synthdid doesn’t assume strict parallel trends, checking them is still informative:
# Plot trends for treated vs synthetic control
synthdid_plot(tau.hat, trajectory.alpha = 0.8)
# Formal placebo test
tau.placebo <- synthdid_placebo(tau.hat, treated.fraction = 0.5)
# Placebo should be close to zero if trends are parallel
c(estimate = tau.hat, placebo = tau.placebo)
# Plot placebo comparison
synthdid_placebo_plot(tau.hat, treated.fraction = 0.5)Sensitivity Analysis
Leave-One-Out Analysis
# Remove each control unit and re-estimate
N0 <- attr(tau.hat, "setup")$N0
loo_estimates <- sapply(1:N0, function(i) {
Y_loo <- setup$Y[-i, ]
synthdid_estimate(Y_loo, N0 - 1, setup$T0)
})
# Check sensitivity
summary(loo_estimates)
plot(loo_estimates, main = "Leave-One-Out Estimates",
ylab = "Treatment Effect", xlab = "Dropped Unit")
abline(h = tau.hat, col = "red", lty = 2)Varying Time Window
# Try different pre-treatment window lengths
T0_values <- seq(10, setup$T0, by = 5)
window_estimates <- sapply(T0_values, function(t0) {
Y_window <- setup$Y[, 1:(t0 + (ncol(setup$Y) - setup$T0))]
synthdid_estimate(Y_window, setup$N0, t0)
})
plot(T0_values, window_estimates, type = "b",
xlab = "Pre-treatment periods", ylab = "Treatment Effect")Large-Scale Applications
Parallel Processing
For computationally intensive tasks (especially standard errors), use parallel processing:
library(future)
# Set up parallel backend
plan(multisession, workers = 4) # Use 4 cores
# Bootstrap SE will automatically use parallel workers
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
estimate_se = TRUE,
se_method = "bootstrap",
se_replications = 1000)
# Remember to close parallel workers when done
plan(sequential)BLAS Thread Management
The package automatically manages BLAS threads to prevent oversubscription:
# This happens automatically, but you can verify:
# - During estimation, BLAS threads are limited
# - After estimation, original thread count is restored
# For manual control (advanced):
# source("R/blas_threads.R")
# original <- get_blas_num_threads()
# set_blas_num_threads(2)
# # ... computation ...
# set_blas_num_threads(original)Memory-Efficient Workflows
For very large problems:
# 1. Estimate without SE first
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0, estimate_se = FALSE)
# 2. Save estimate
saveRDS(tau.hat, "tau_estimate.rds")
# 3. Compute SE in separate session with more memory
tau.hat <- readRDS("tau_estimate.rds")
se <- sqrt(vcov(tau.hat, method = "jackknife")) # Jackknife is memory-efficient
# 4. Store SE as attribute
attr(tau.hat, "se") <- se
attr(tau.hat, "se_method") <- "jackknife"Batch Processing Multiple Datasets
# Function to process one dataset
process_dataset <- function(data_file) {
data <- readRDS(data_file)
setup <- panel.matrices(data)
tau <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
estimate_se = TRUE, se_method = "jackknife")
list(
estimate = c(tau),
se = attr(tau, "se"),
converged = synthdid_converged(tau)
)
}
# Process all files
data_files <- list.files("data/", pattern = "*.rds", full.names = TRUE)
results <- lapply(data_files, process_dataset)
# Summarize
results_df <- do.call(rbind, lapply(results, as.data.frame))
print(results_df)Advanced Plotting
Customizing synthdid_plot
library(ggplot2)
# Basic plot
p <- synthdid_plot(tau.hat)
# Customize with ggplot2
p +
ggtitle("California Prop 99: Effect on Cigarette Sales") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_manual(values = c("treated" = "darkred",
"synthetic control" = "steelblue"))Comparing Multiple Estimates
# Estimate with different methods
tau.did <- did_estimate(setup$Y, setup$N0, setup$T0)
tau.sc <- sc_estimate(setup$Y, setup$N0, setup$T0)
tau.sdid <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
# Plot all three
estimates <- list(tau.did, tau.sc, tau.sdid)
synthdid_plot(estimates, facet = c("DID", "SC", "SynthDID"),
facet.vertical = TRUE)
# Or overlay in same plot
synthdid_plot(estimates, facet = c("All", "All", "All"),
alpha.multiplier = c(0.5, 0.5, 1))Animated Plots
# Create animation showing effect of overlay parameter
overlay_values <- seq(0, 1, by = 0.1)
plots <- lapply(overlay_values, function(ov) {
synthdid_plot(tau.hat, overlay = ov) +
ggtitle(sprintf("Overlay = %.1f", ov))
})
# Combine into animation (requires gganimate)
# library(gganimate)
# animation <- plots %>% transition_states(overlay)Working with the Formula Interface
Advanced Formula Specifications
# Basic usage
result <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"))
# Extract and inspect components
coef(result)
confint(result)
summary(result)
# Predictions
predict(result, type = "counterfactual")
predict(result, type = "effect")
# Residual diagnostics
resids <- residuals(result, type = "pretreatment")
plot(resids, main = "Pre-treatment Residuals")Model Comparison with Formula Interface
# Fit all three estimators
fit_did <- synthdid(PacksPerCapita ~ treated, data = california_prop99,
index = c("State", "Year"), method = "did")
fit_sc <- synthdid(PacksPerCapita ~ treated, data = california_prop99,
index = c("State", "Year"), method = "sc")
fit_sdid <- synthdid(PacksPerCapita ~ treated, data = california_prop99,
index = c("State", "Year"), method = "synthdid")
# Compare
cbind(
DID = coef(fit_did),
SC = coef(fit_sc),
SynthDID = coef(fit_sdid)
)Simulation Studies
Power Analysis
# Function to simulate data and test
simulate_and_test <- function(true_effect = 1, n_sim = 100) {
data <- random.low.rank()
# Add treatment effect
N0 <- data$N0
T0 <- data$T0
data$Y[(N0+1):nrow(data$Y), (T0+1):ncol(data$Y)] <-
data$Y[(N0+1):nrow(data$Y), (T0+1):ncol(data$Y)] + true_effect
# Estimate
tau.hat <- synthdid_estimate(data$Y, data$N0, data$T0,
estimate_se = TRUE, se_method = "jackknife")
# Test
se <- attr(tau.hat, "se")
z_stat <- tau.hat / se
p_value <- 2 * pnorm(-abs(z_stat))
list(estimate = c(tau.hat), se = se, p_value = p_value)
}
# Run simulations
set.seed(123)
sims <- replicate(100, simulate_and_test(true_effect = 1), simplify = FALSE)
# Power
power <- mean(sapply(sims, function(s) s$p_value < 0.05))
cat(sprintf("Power: %.2f\n", power))
# Coverage
coverage <- mean(sapply(sims, function(s) {
ci_lower <- s$estimate - 1.96 * s$se
ci_upper <- s$estimate + 1.96 * s$se
ci_lower <= 1 && 1 <= ci_upper
}))
cat(sprintf("95%% CI Coverage: %.2f\n", coverage))Placebo Studies
# Generate placebo distribution
n_placebos <- 100
placebo_estimates <- replicate(n_placebos, {
data <- random.low.rank() # No treatment effect
synthdid_estimate(data$Y, data$N0, data$T0)
})
# Plot distribution
hist(placebo_estimates, breaks = 30, main = "Placebo Distribution",
xlab = "Estimate", col = "lightblue")
abline(v = 0, col = "red", lwd = 2, lty = 2)
# Compare to actual estimate
abline(v = tau.hat, col = "darkblue", lwd = 2)
legend("topright", legend = c("True null", "Actual estimate"),
col = c("red", "darkblue"), lwd = 2, lty = c(2, 1))Performance Optimization
Early Stopping and Sparsification
The package implements several performance optimizations:
# Disable sparsification (for comparison)
tau.no.sparse <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
sparsify = NULL)
# Enable sparsification (default)
tau.with.sparse <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
sparsify = sparsify_function)
# Custom sparsification function
my_sparsify <- function(v) {
# More aggressive sparsification
v[v <= max(v) / 10] <- 0
v / sum(v)
}
tau.custom.sparse <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
sparsify = my_sparsify)References
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic difference-in-differences. American Economic Review, 111(12), 4088-4118.
See Also
- Main vignette:
vignette("synthdid") - Troubleshooting:
vignette("troubleshooting") - Convergence diagnostics:
vignette("convergence-diagnostics")