Troubleshooting Guide for synthdid
synthdid package authors
2026-02-14
Source:vignettes/troubleshooting.Rmd
troubleshooting.RmdIntroduction
This vignette provides practical guidance for diagnosing and
resolving common issues when using the synthdid package. It
covers convergence problems, numerical instability, memory management,
and guidance on choosing the appropriate estimator for your
application.
Convergence Issues
Checking Convergence
The synthdid package now includes built-in convergence diagnostics that are automatically computed during estimation (with zero overhead):
library(synthdid)
data(california_prop99)
setup <- panel.matrices(california_prop99)
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
# Quick convergence check
synthdid_converged(tau.hat)
# Detailed diagnostics
conv_info <- synthdid_convergence_info(tau.hat)
print(conv_info)
# Or use summary() which includes convergence status
summary(tau.hat)What Causes Convergence Failures?
Common causes of convergence failures include:
-
Insufficient iterations: The default
max.iter = 1e4may be too small for difficult problems -
Tight tolerances: The stopping criterion
min.decreasemay be too strict - Ill-conditioned problems: Extreme values or near-collinearity in the data
- Sparse data: Very few control units or pre-treatment periods
Solutions for Convergence Problems
1. Increase Maximum Iterations
# Increase max.iter
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0, max.iter = 5e4)
synthdid_converged(tau.hat)2. Relax Stopping Criterion
# Use a looser stopping criterion
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0, min.decrease = 1e-3)3. Adjust Regularization
# Increase regularization to stabilize optimization
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
eta.omega = 2, eta.lambda = 1e-5)When Convergence Warnings Are Acceptable
Not all convergence warnings indicate problems:
- If the final decrease is very small (< 1e-8), the optimization is essentially converged
- If the estimate is stable across different starting points, non-convergence may be harmless
- For exploratory analysis, approximate convergence may be sufficient
Always check the robustness of your results when convergence warnings appear.
Numerical Stability Issues
Common Numerical Problems
Zero or Near-Zero Noise Level
When pre-treatment data has very little variation,
noise.level can be extremely small or zero, causing
numerical issues:
# Check noise level
Y <- setup$Y
N0 <- setup$N0
T0 <- setup$T0
diffs <- Y[1:N0, 2:T0] - Y[1:N0, 1:(T0-1)]
noise.level <- sd(c(diffs))
print(noise.level)
# If noise.level is very small, specify it manually
if (noise.level < 1e-8) {
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
noise.level = 1e-6)
}Extreme Values in Data
Large magnitude differences in your data can cause overflow or precision loss:
# Check data scale
summary(as.vector(setup$Y))
# Consider rescaling if values are very large or very small
Y_scaled <- setup$Y / 1000 # Scale down by factor of 1000
tau.hat <- synthdid_estimate(Y_scaled, setup$N0, setup$T0)
# Remember to scale back the estimate
tau.hat * 1000Near-Singular Matrices
When control units are nearly identical or time periods are highly correlated:
# Check for near-collinearity in controls
cor_matrix <- cor(t(setup$Y[1:setup$N0, 1:setup$T0]))
max_cor <- max(abs(cor_matrix[upper.tri(cor_matrix)]))
print(max_cor)
# If max_cor > 0.99, consider removing redundant units
if (max_cor > 0.99) {
warning("High collinearity detected among control units")
}Solutions for Numerical Instability
- Use scaled optimization: Normalize your outcome variable to have reasonable magnitude
-
Increase regularization: Higher
eta.omegaandeta.lambdastabilize optimization - Check data quality: Remove units with missing or extreme values
- Use double precision: R uses double precision by default, but verify your data import didn’t downcast
Memory Management
Estimating Memory Requirements
Use the built-in memory estimation tool before running large-scale analyses:
# Estimate memory for your problem size
mem <- synthdid_memory_estimate(
N = nrow(setup$Y),
T = ncol(setup$Y),
K = 0, # Number of covariates
replications = 200, # For bootstrap SE
include_se = TRUE
)
print(mem)Memory-Intensive Operations
The most memory-intensive operations are:
- Bootstrap standard errors: Stores all replication estimates in memory
- Covariate problems: 3D arrays can be very large
- Parallel processing: Each worker needs a copy of the data
Solutions for Memory Issues
1. Use Faster SE Methods
# Jackknife is much more memory-efficient than bootstrap
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
estimate_se = TRUE, se_method = "jackknife")2. Reduce Bootstrap Replications
# Use fewer replications for preliminary analysis
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0,
estimate_se = TRUE,
se_method = "bootstrap",
se_replications = 100) # Instead of 2003. Process in Batches
For extremely large problems, compute the estimate first, then SE separately:
# Compute estimate without SE
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0, estimate_se = FALSE)
# Compute SE later when you have time/resources
se <- sqrt(vcov(tau.hat, method = "jackknife"))4. Monitor Memory Usage
# Check current memory usage
gc()
# Use memory profiling for large problems
Rprof(memory.profiling = TRUE)
tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
Rprof(NULL)
summaryRprof(memory = "both")Choosing the Right Estimator
SC vs DID vs SynthDID
Use this decision tree to choose the appropriate estimator:
Use Difference-in-Differences (DID) when:
- You believe parallel trends holds for all control units
- You have many control units (averaging improves efficiency)
- Control units are relatively homogeneous
- You want the simplest, most interpretable estimator
tau.did <- did_estimate(setup$Y, setup$N0, setup$T0)Use Synthetic Control (SC) when:
- Parallel trends may only hold for a weighted combination of controls
- You want to match pre-treatment trends exactly
- You have few treated units (ideally just one)
- Post-treatment extrapolation is a concern
tau.sc <- sc_estimate(setup$Y, setup$N0, setup$T0)Use Synthetic Difference-in-Differences (SynthDID) when:
- You want to combine the benefits of both methods
- You have multiple treated units and multiple time periods
- You want robustness to both parallel trends violations and time-varying confounders
- You’re willing to accept slightly more complexity for better performance
tau.sdid <- synthdid_estimate(setup$Y, setup$N0, setup$T0)Comparing Estimators
It’s good practice to compare all three estimators:
# Compute all three
tau.did <- did_estimate(setup$Y, setup$N0, setup$T0)
tau.sc <- sc_estimate(setup$Y, setup$N0, setup$T0)
tau.sdid <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
# Compare estimates
estimates <- list(DID = tau.did, SC = tau.sc, SynthDID = tau.sdid)
sapply(estimates, function(x) c(x))
# Visualize differences
synthdid_plot(estimates, facet = c("DID", "SC", "SynthDID"))Interpretation tips: - If all three estimates are similar, your results are robust - Large differences suggest the choice of estimator matters - DID > SC suggests time weights help; SC > DID suggests unit weights help - SynthDID is typically between the two extremes
When Estimates Diverge
If estimators give very different results:
- Check parallel trends: Plot pre-treatment trends for each method
- Inspect weights: Look at which units/periods get high weight
- Consider covariates: Time-varying confounders may explain differences
- Test sensitivity: Try different regularization parameters
- Look for structural breaks: Check if treatment effects vary over time
# Check unit weights
synthdid_controls(estimates)
# Check time weights
synthdid_controls(estimates, weight.type = "lambda")
# Plot unit-level effects
synthdid_units_plot(estimates)Data Requirements and Limitations
Minimum Requirements
For reliable estimates, you need:
- At least 2 control units (more is better)
- At least 2 pre-treatment periods (more is better)
- At least 1 treated unit and 1 post-treatment period
- No missing values in the outcome matrix
-
Finite values (no
Inf,-Inf, orNaN)
Known Limitations
- Staggered adoption: Current implementation assumes simultaneous treatment
- Time-varying treatment intensity: Binary treatment only
- Multiple treatments: Cannot handle multiple distinct interventions
- Unbalanced panels: Requires complete rectangular matrix
Getting Help
If you’re still experiencing issues:
- Check the GitHub issues
- Review the main package vignette:
vignette("synthdid") - See the convergence diagnostics vignette:
vignette("convergence-diagnostics") - Check the advanced topics vignette:
vignette("advanced-topics") - File a bug report with a reproducible example
Creating a Reproducible Example
When reporting issues, include:
library(synthdid)
# Minimal data that reproduces the issue
Y <- matrix(rnorm(200), 20, 10)
N0 <- 15
T0 <- 7
# The command that fails or produces unexpected results
tau.hat <- synthdid_estimate(Y, N0, T0)
# Session info
sessionInfo()