Parallel Processing for Standard Errors • synthdid

library(synthdid)
library(future)
set.seed(12345)

Introduction

Computing standard errors for synthetic difference-in-differences estimates can be computationally intensive, especially with bootstrap or placebo methods that require hundreds of replications. This vignette demonstrates:

When to use parallel processing (and when not to)
How to set up parallel processing with future
Performance comparisons across different scenarios
Automatic thread management to prevent performance issues
Best practices for different dataset sizes

Quick Start

Sequential Processing (Default)

By default, synthdid uses sequential (single-core) processing:

# No setup needed - this is the default
result <- synthdid(PacksPerCapita ~ treated,
                   data = california_prop99,
                   index = c("State", "Year"),
                   se = TRUE,
                   se_method = "jackknife")

When to use: - Small datasets (< 50 units) - Fast methods (jackknife) - Interactive analysis - When you need other cores for different tasks

Parallel Processing

For larger datasets or slower methods, parallel processing provides substantial speedups:

library(future)

# Set up parallel processing (do this once)
plan(multisession, workers = 4)

# Run the same code - automatically uses parallel processing
result <- synthdid(PacksPerCapita ~ treated,
                   data = california_prop99,
                   index = c("State", "Year"),
                   se = TRUE,
                   se_method = "bootstrap",
                   se_replications = 200)

# Clean up when done
plan(sequential)

When to use: - Large datasets (> 50 units) - Slow methods (bootstrap, placebo) - Production pipelines - When maximum speed is needed

Understanding Standard Error Methods

The three SE methods have different computational characteristics:

Jackknife (Fastest)

data(california_prop99)

# Jackknife: N iterations where N = number of units
result_jack <- synthdid(PacksPerCapita ~ treated,
                        data = california_prop99,
                        index = c("State", "Year"),
                        se = TRUE,
                        se_method = "jackknife")
#> Warning in value[[3L]](cond): jackknife standard errors require more than one
#> treated unit and at least two controls with weight.

summary(result_jack, fast = TRUE)
#> Call:
#> synthdid(formula = PacksPerCapita ~ treated, data = california_prop99, 
#>     index = c("State", "Year"), se = TRUE, se_method = "jackknife")
#> 
#> Treatment Effect Estimate:
#>         Estimate Std. Error t value Pr(>|t|)
#> treated    -15.6         NA      NA       NA
#> 
#> Dimensions:
#>                          Value 
#>  Treated units:           1.000
#>  Control units:          38.000
#>  Effective controls:     16.388
#>  Post-treatment periods: 12.000
#>  Pre-treatment periods:  19.000
#>  Effective periods:       2.784
#> 
#> Top Control Units (omega weights):
#>               Weight
#> Nevada         0.124
#> New Hampshire  0.105
#> Connecticut    0.078
#> Delaware       0.070
#> Colorado       0.058
#> 
#> Top Time Periods (lambda weights):
#>      Weight
#> 1988  0.427
#> 1986  0.366
#> 1987  0.206
#> 
#> Convergence Status:
#>   Overall: NOT CONVERGED
#>   Lambda:  ✗ (10000/10000 iterations, 100.0% utilization)
#>   Omega:   ✗ (10000/10000 iterations, 100.0% utilization)
#> 
#>   Recommendation: Consider increasing max.iter or relaxing min.decrease.
#>   Use synthdid_convergence_info() for detailed diagnostics.

Characteristics: - Iterations: N (number of units) = 39 for California Prop 99 - Computation: Fast, deterministic - Parallelization benefit: Low to moderate (fewer iterations) - Recommendation: Usually fine with sequential processing

Bootstrap (Medium Speed)

# Bootstrap: User-specified replications (typically 200-1000)
result_boot <- synthdid(PacksPerCapita ~ treated,
                        data = california_prop99,
                        index = c("State", "Year"),
                        se = TRUE,
                        se_method = "bootstrap",
                        se_replications = 200)

Characteristics: - Iterations: 200-1000 (user specified) - Computation: Moderate, random sampling - Parallelization benefit: High (many independent iterations) - Recommendation: Use parallel for replications > 100

Placebo (Slowest)

# Placebo: User-specified replications (typically 100-500)
result_placebo <- synthdid(PacksPerCapita ~ treated,
                           data = california_prop99,
                           index = c("State", "Year"),
                           se = TRUE,
                           se_method = "placebo",
                           se_replications = 100)

Characteristics: - Iterations: 100-500 (user specified) - Computation: Slowest (re-estimates with different treated units) - Parallelization benefit: Very high (many expensive iterations) - Recommendation: Almost always use parallel processing

Performance Comparison

Small Dataset Example

California Prop 99: 39 units, 31 time periods

Jackknife SE

# Sequential
plan(sequential)
system.time({
  result_seq <- synthdid(PacksPerCapita ~ treated,
                         data = california_prop99,
                         index = c("State", "Year"),
                         se = TRUE,
                         se_method = "jackknife")
})
# Typical time: 3-5 seconds

# Parallel (4 cores)
plan(multisession, workers = 4)
system.time({
  result_par <- synthdid(PacksPerCapita ~ treated,
                         data = california_prop99,
                         index = c("State", "Year"),
                         se = TRUE,
                         se_method = "jackknife")
})
# Typical time: 2-3 seconds
# Speedup: ~1.5-2x (modest benefit for small dataset)
plan(sequential)

Verdict: For jackknife on small datasets, sequential is often fine.

Bootstrap SE

# Sequential
plan(sequential)
system.time({
  result_seq <- synthdid(PacksPerCapita ~ treated,
                         data = california_prop99,
                         index = c("State", "Year"),
                         se = TRUE,
                         se_method = "bootstrap",
                         se_replications = 200)
})
# Typical time: 40-50 seconds

# Parallel (4 cores)
plan(multisession, workers = 4)
system.time({
  result_par <- synthdid(PacksPerCapita ~ treated,
                         data = california_prop99,
                         index = c("State", "Year"),
                         se = TRUE,
                         se_method = "bootstrap",
                         se_replications = 200)
})
# Typical time: 12-15 seconds
# Speedup: ~3-3.5x (excellent!)
plan(sequential)

Verdict: For bootstrap, parallel processing provides substantial benefits.

Large Dataset Example

For larger datasets (e.g., 100+ units), the benefits are even more pronounced:

# Simulate larger dataset
large_data <- simulate_dgp(N = 100, T = 40, treatment_quantile = 0.8)

# Sequential bootstrap
plan(sequential)
system.time({
  result_seq <- synthdid(Y ~ treat, data = large_data,
                         index = c("unit", "time"),
                         se = TRUE, se_method = "bootstrap",
                         se_replications = 200)
})
# Typical time: 3-5 minutes

# Parallel bootstrap (8 cores)
plan(multisession, workers = 8)
system.time({
  result_par <- synthdid(Y ~ treat, data = large_data,
                         index = c("unit", "time"),
                         se = TRUE, se_method = "bootstrap",
                         se_replications = 200)
})
# Typical time: 30-45 seconds
# Speedup: ~6-7x (near-linear scaling!)
plan(sequential)

Verdict: For large datasets, parallel processing is essential.

Setting Up Parallel Processing

Choosing the Number of Workers

The optimal number of workers depends on your system:

# Check available cores
parallel::detectCores()
#> [1] 4

Recommendations:

# Conservative: Leave one core for system
plan(multisession, workers = parallel::detectCores() - 1)

# Aggressive: Use all cores (may slow down other tasks)
plan(multisession, workers = parallel::detectCores())

# Specific: Choose exact number
plan(multisession, workers = 4)

# For shared systems: Be considerate
plan(multisession, workers = 2)

Parallel Processing Strategies

Strategy 1: Multisession (Recommended)

# Works on all platforms (Windows, Mac, Linux)
plan(multisession, workers = 4)

Pros: - Cross-platform - Isolated workers (safe) - No shared memory issues

Cons: - Slight overhead from data copying - Each worker starts fresh R session

Strategy 2: Multicore (Unix/Mac Only)

# Only works on Unix/Mac (not Windows)
plan(multicore, workers = 4)

Pros: - Lower overhead (forked processes) - Shared memory (faster)

Cons: - Not available on Windows - Can cause issues in RStudio - May interfere with some packages

Recommendation: Use multisession unless you have specific reasons to use multicore.

Automatic Thread Management

The Thread Oversubscription Problem

When you use parallel processing with synthdid, there’s a potential performance trap: thread oversubscription.

What Happens Without Thread Management

Your system: 4 CPU cores
Your setup: plan(multisession, workers = 4)

Each R worker uses multi-threaded BLAS (default: 8 threads)

Total threads: 4 workers × 8 BLAS threads = 32 threads
Result: Threads compete for 4 cores → context switching → SLOW

Performance impact: Instead of 4x speedup, you might only get 1.5x!

Automatic Solution in synthdid

The synthdid package automatically prevents this problem:

plan(multisession, workers = 4)

# synthdid automatically detects parallel processing and:
# 1. Sets BLAS to 1 thread per worker
# 2. Runs your computation efficiently
# 3. Restores original BLAS threads when done

result <- synthdid(PacksPerCapita ~ treated,
                   data = california_prop99,
                   index = c("State", "Year"),
                   se = TRUE,
                   se_method = "bootstrap",
                   se_replications = 200)

# Console output:
# "Parallel processing detected: Setting BLAS to single-threaded mode (was 8 threads)"

Result: 4 workers × 1 BLAS thread = 4 threads (optimal for 4 cores) → 3.5-4x speedup!

Recommended: Install RhpcBLASctl

For the most reliable thread control, install the optional RhpcBLASctl package:

install.packages("RhpcBLASctl")

This package provides runtime control over BLAS threads and works with: - OpenBLAS (most common on Linux) - Intel MKL (high-performance systems) - Apple Accelerate (macOS)

Without it, synthdid falls back to environment variables, which may require restarting R.

Verifying Thread Management

# Check if RhpcBLASctl is available
if (requireNamespace("RhpcBLASctl", quietly = TRUE)) {
  cat("BLAS threads:", RhpcBLASctl::blas_get_num_procs(), "\n")
  cat("Thread management: Available\n")
} else {
  cat("Thread management: Using environment variables\n")
  cat("Recommendation: install.packages('RhpcBLASctl')\n")
}

Best Practices

Decision Tree: Sequential vs Parallel

Start here:
│
├─ Dataset < 50 units?
│  ├─ YES → SE method = jackknife?
│  │  ├─ YES → Use SEQUENTIAL (fast enough)
│  │  └─ NO  → Use PARALLEL (bootstrap/placebo benefit)
│  │
│  └─ NO → Use PARALLEL (large datasets always benefit)

Practical Guidelines

Use Sequential When:

Small datasets (< 50 units)
Jackknife SE (already fast)
Interactive analysis (quick iteration)
Debugging (easier to trace errors)
Other processes need CPU (being a good citizen)

Example:

# Quick analysis on California Prop 99
plan(sequential)  # or just don't set up parallel
result <- synthdid(PacksPerCapita ~ treated,
                   data = california_prop99,
                   index = c("State", "Year"),
                   se = TRUE,
                   se_method = "jackknife")

Use Parallel When:

Large datasets (> 50 units)
Bootstrap SE (many replications)
Placebo SE (computationally intensive)
Production pipelines (maximize throughput)
Many estimates (running multiple models)

Example:

# Production analysis with bootstrap
library(future)
plan(multisession, workers = parallel::detectCores() - 1)

result <- synthdid(Y ~ treatment,
                   data = my_large_dataset,
                   index = c("unit", "time"),
                   se = TRUE,
                   se_method = "bootstrap",
                   se_replications = 500)

plan(sequential)  # Clean up

Advanced: Running Multiple Estimates in Parallel

If you need to run synthdid on multiple datasets or specifications, you can parallelize at a higher level:

Example: Multiple Specifications

library(future)
library(furrr)

# Set up parallel processing
plan(multisession, workers = 4)

# Different outcome variables
outcomes <- c("PacksPerCapita", "AlcoholConsumption", "Healthcare")

# Run all specifications in parallel
results <- future_map(outcomes, function(outcome) {
  formula <- as.formula(paste(outcome, "~ treated"))

  # Note: Each worker runs synthdid sequentially
  # (automatic thread management handles BLAS threads)
  synthdid(formula,
           data = my_data,
           index = c("State", "Year"),
           se = TRUE,
           se_method = "jackknife")  # Use jackknife since we parallelize at higher level
})

names(results) <- outcomes
plan(sequential)

Key insight: When parallelizing multiple estimates, use se_method = "jackknife" or se = FALSE to avoid nested parallelism.

Example: Cross-Validation

# Split data for cross-validation
cv_folds <- 5

# Run CV in parallel
cv_results <- future_map(1:cv_folds, function(fold) {
  train_data <- subset(my_data, cv_fold != fold)
  test_data <- subset(my_data, cv_fold == fold)

  # Train on training set
  model <- synthdid(Y ~ treat,
                    data = train_data,
                    index = c("unit", "time"),
                    se = FALSE)  # Skip SE for speed

  # Evaluate on test set
  # ... your evaluation code ...
})

Troubleshooting

Issue: Parallel Processing Seems Slow

Possible causes:

Thread oversubscription (should be automatic, but check)

# Check BLAS configuration
sessionInfo()$BLAS

# If you have RhpcBLASctl
RhpcBLASctl::blas_get_num_procs()

Too many workers for dataset size

# Try fewer workers
plan(multisession, workers = 2)  # Instead of 8

Small dataset where overhead dominates

# Use sequential for small datasets
plan(sequential)

Issue: “Cannot Allocate Vector of Size X”

Memory issues with parallel processing:

# Reduce number of workers
plan(multisession, workers = 2)  # Use less memory

# Or reduce replications
result <- synthdid(..., se_replications = 100)  # Instead of 500

Issue: Parallel Processing Not Speeding Up

Check your setup:

# Verify parallel plan is active
print(future::plan())

# Should show: "multisession" or "multicore"
# Not: "sequential"

# Check number of workers
nbrOfWorkers()

Issue: RStudio Hangs with Multicore

Solution: Use multisession instead:

# Don't use multicore in RStudio
# plan(multicore, workers = 4)  # May hang

# Use multisession instead
plan(multisession, workers = 4)  # Works reliably

Performance Summary Table

Based on California Prop 99 dataset (39 units, 31 periods):

Method	Replications	Sequential	Parallel (4 cores)	Speedup
Jackknife	39	3-5 sec	2-3 sec	1.5-2x
Bootstrap	200	40-50 sec	12-15 sec	3-3.5x
Bootstrap	500	90-120 sec	25-35 sec	3-3.5x
Placebo	100	60-80 sec	18-25 sec	3-3.5x

For larger datasets (100 units):

Method	Replications	Sequential	Parallel (8 cores)	Speedup
Jackknife	100	20-30 sec	5-8 sec	3-4x
Bootstrap	200	3-5 min	30-45 sec	6-7x
Placebo	100	5-8 min	45-70 sec	6-7x

Key takeaway: Larger datasets and more replications benefit most from parallel processing.

Complete Example Workflow

Here’s a complete analysis workflow using parallel processing:

library(synthdid)
library(future)

# 1. Load and prepare data
data(california_prop99)

# 2. Set up parallel processing
cat("Setting up parallel processing with", parallel::detectCores() - 1, "workers\n")
plan(multisession, workers = parallel::detectCores() - 1)

# 3. Quick estimate (no SE)
cat("\nStep 1: Quick estimate without SE...\n")
quick_result <- synthdid(PacksPerCapita ~ treated,
                         data = california_prop99,
                         index = c("State", "Year"),
                         se = FALSE)
print(quick_result)

# 4. Full estimate with bootstrap SE
cat("\nStep 2: Computing bootstrap standard errors...\n")
cat("(This will use parallel processing automatically)\n")

system.time({
  final_result <- synthdid(PacksPerCapita ~ treated,
                           data = california_prop99,
                           index = c("State", "Year"),
                           se = TRUE,
                           se_method = "bootstrap",
                           se_replications = 200)
})

# 5. View results
summary(final_result)

# 6. Compare with other methods
cat("\nStep 3: Comparing with DID and SC methods...\n")
did_result <- update(final_result, method = "did")
sc_result <- update(final_result, method = "sc")

# 7. Plot results
plot(final_result)

# 8. Clean up parallel processing
cat("\nCleaning up parallel processing...\n")
plan(sequential)

cat("\nAnalysis complete!\n")

Recommendations by Scenario

Scenario 1: Interactive Data Exploration

Setup: Working in RStudio, trying different specifications

# Keep it simple - sequential is fine
plan(sequential)

# Quick iterations
result1 <- synthdid(Y1 ~ treat, data = data, index = c("unit", "time"), se = FALSE)
result2 <- synthdid(Y2 ~ treat, data = data, index = c("unit", "time"), se = FALSE)
result3 <- synthdid(Y3 ~ treat, data = data, index = c("unit", "time"), se = FALSE)

# Add SE only to final model
final <- synthdid(Y1 ~ treat, data = data, index = c("unit", "time"),
                  se = TRUE, se_method = "jackknife")

Scenario 2: Production Pipeline

Setup: Automated analysis on server, maximum performance needed

library(future)

# Use all available cores
plan(multisession, workers = parallel::detectCores())

# Run comprehensive analysis
result <- synthdid(Y ~ treat,
                   data = large_dataset,
                   index = c("unit", "time"),
                   se = TRUE,
                   se_method = "bootstrap",
                   se_replications = 500)

# Save results
saveRDS(result, "results/synthdid_estimate.rds")

plan(sequential)

Scenario 3: Research Paper

Setup: Need robust SEs, have time to compute

library(future)

# Conservative: Leave cores for other tasks
plan(multisession, workers = parallel::detectCores() - 2)

# High-quality bootstrap SE
result <- synthdid(Y ~ treat,
                   data = paper_data,
                   index = c("unit", "time"),
                   se = TRUE,
                   se_method = "bootstrap",
                   se_replications = 1000)  # More replications for publication

summary(result)
confint(result, level = 0.95)

plan(sequential)

Scenario 4: Shared Server

Setup: Working on shared computational resources

library(future)

# Be considerate: Use only a few cores
plan(multisession, workers = 4)  # Even if 64 cores available

# Run analysis
result <- synthdid(Y ~ treat,
                   data = my_data,
                   index = c("unit", "time"),
                   se = TRUE,
                   se_method = "bootstrap",
                   se_replications = 200)

plan(sequential)

Summary

Key Points

Parallel processing provides 3-7x speedups for bootstrap and placebo SE methods
Thread management is automatic - no configuration needed
Use plan(multisession, workers = N) to enable parallel processing
Install RhpcBLASctl for optimal thread control
Choose workers wisely - typically detectCores() - 1

Quick Reference

# Enable parallel processing
library(future)
plan(multisession, workers = 4)

# Run synthdid (automatically uses parallel workers)
result <- synthdid(Y ~ treat, data = data, index = c("unit", "time"),
                   se = TRUE, se_method = "bootstrap")

# Disable when done
plan(sequential)

When to Use What

Situation	Recommendation	Workers
Small dataset + jackknife	Sequential	1
Small dataset + bootstrap	Parallel	2-4
Large dataset + any SE	Parallel	4-8
Interactive work	Sequential	1
Production pipeline	Parallel	All - 1
Shared system	Parallel	2-4

Additional Resources

See vignette("formula-interface") for the modern synthdid interface
See THREAD_MANAGEMENT.md for technical details on thread management
See ?future::plan for advanced parallel processing options