Parallel Processing for Standard Errors
Source:vignettes/parallel-processing.Rmd
parallel-processing.RmdIntroduction
Computing standard errors for synthetic difference-in-differences estimates can be computationally intensive, especially with bootstrap or placebo methods that require hundreds of replications. This vignette demonstrates:
- When to use parallel processing (and when not to)
-
How to set up parallel processing with
future - Performance comparisons across different scenarios
- Automatic thread management to prevent performance issues
- Best practices for different dataset sizes
Quick Start
Sequential Processing (Default)
By default, synthdid uses sequential (single-core) processing:
# No setup needed - this is the default
result <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "jackknife")When to use: - Small datasets (< 50 units) - Fast methods (jackknife) - Interactive analysis - When you need other cores for different tasks
Parallel Processing
For larger datasets or slower methods, parallel processing provides substantial speedups:
library(future)
# Set up parallel processing (do this once)
plan(multisession, workers = 4)
# Run the same code - automatically uses parallel processing
result <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "bootstrap",
se_replications = 200)
# Clean up when done
plan(sequential)When to use: - Large datasets (> 50 units) - Slow methods (bootstrap, placebo) - Production pipelines - When maximum speed is needed
Understanding Standard Error Methods
The three SE methods have different computational characteristics:
Jackknife (Fastest)
data(california_prop99)
# Jackknife: N iterations where N = number of units
result_jack <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "jackknife")
#> Warning in value[[3L]](cond): jackknife standard errors require more than one
#> treated unit and at least two controls with weight.
summary(result_jack, fast = TRUE)
#> Call:
#> synthdid(formula = PacksPerCapita ~ treated, data = california_prop99,
#> index = c("State", "Year"), se = TRUE, se_method = "jackknife")
#>
#> Treatment Effect Estimate:
#> Estimate Std. Error t value Pr(>|t|)
#> treated -15.6 NA NA NA
#>
#> Dimensions:
#> Value
#> Treated units: 1.000
#> Control units: 38.000
#> Effective controls: 16.388
#> Post-treatment periods: 12.000
#> Pre-treatment periods: 19.000
#> Effective periods: 2.784
#>
#> Top Control Units (omega weights):
#> Weight
#> Nevada 0.124
#> New Hampshire 0.105
#> Connecticut 0.078
#> Delaware 0.070
#> Colorado 0.058
#>
#> Top Time Periods (lambda weights):
#> Weight
#> 1988 0.427
#> 1986 0.366
#> 1987 0.206
#>
#> Convergence Status:
#> Overall: NOT CONVERGED
#> Lambda: ✗ (10000/10000 iterations, 100.0% utilization)
#> Omega: ✗ (10000/10000 iterations, 100.0% utilization)
#>
#> Recommendation: Consider increasing max.iter or relaxing min.decrease.
#> Use synthdid_convergence_info() for detailed diagnostics.Characteristics: - Iterations: N (number of units) = 39 for California Prop 99 - Computation: Fast, deterministic - Parallelization benefit: Low to moderate (fewer iterations) - Recommendation: Usually fine with sequential processing
Bootstrap (Medium Speed)
# Bootstrap: User-specified replications (typically 200-1000)
result_boot <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "bootstrap",
se_replications = 200)Characteristics: - Iterations: 200-1000 (user specified) - Computation: Moderate, random sampling - Parallelization benefit: High (many independent iterations) - Recommendation: Use parallel for replications > 100
Placebo (Slowest)
# Placebo: User-specified replications (typically 100-500)
result_placebo <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "placebo",
se_replications = 100)Characteristics: - Iterations: 100-500 (user specified) - Computation: Slowest (re-estimates with different treated units) - Parallelization benefit: Very high (many expensive iterations) - Recommendation: Almost always use parallel processing
Performance Comparison
Small Dataset Example
California Prop 99: 39 units, 31 time periods
Jackknife SE
# Sequential
plan(sequential)
system.time({
result_seq <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "jackknife")
})
# Typical time: 3-5 seconds
# Parallel (4 cores)
plan(multisession, workers = 4)
system.time({
result_par <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "jackknife")
})
# Typical time: 2-3 seconds
# Speedup: ~1.5-2x (modest benefit for small dataset)
plan(sequential)Verdict: For jackknife on small datasets, sequential is often fine.
Bootstrap SE
# Sequential
plan(sequential)
system.time({
result_seq <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "bootstrap",
se_replications = 200)
})
# Typical time: 40-50 seconds
# Parallel (4 cores)
plan(multisession, workers = 4)
system.time({
result_par <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "bootstrap",
se_replications = 200)
})
# Typical time: 12-15 seconds
# Speedup: ~3-3.5x (excellent!)
plan(sequential)Verdict: For bootstrap, parallel processing provides substantial benefits.
Large Dataset Example
For larger datasets (e.g., 100+ units), the benefits are even more pronounced:
# Simulate larger dataset
large_data <- simulate_dgp(N = 100, T = 40, treatment_quantile = 0.8)
# Sequential bootstrap
plan(sequential)
system.time({
result_seq <- synthdid(Y ~ treat, data = large_data,
index = c("unit", "time"),
se = TRUE, se_method = "bootstrap",
se_replications = 200)
})
# Typical time: 3-5 minutes
# Parallel bootstrap (8 cores)
plan(multisession, workers = 8)
system.time({
result_par <- synthdid(Y ~ treat, data = large_data,
index = c("unit", "time"),
se = TRUE, se_method = "bootstrap",
se_replications = 200)
})
# Typical time: 30-45 seconds
# Speedup: ~6-7x (near-linear scaling!)
plan(sequential)Verdict: For large datasets, parallel processing is essential.
Setting Up Parallel Processing
Choosing the Number of Workers
The optimal number of workers depends on your system:
# Check available cores
parallel::detectCores()
#> [1] 4Recommendations:
# Conservative: Leave one core for system
plan(multisession, workers = parallel::detectCores() - 1)
# Aggressive: Use all cores (may slow down other tasks)
plan(multisession, workers = parallel::detectCores())
# Specific: Choose exact number
plan(multisession, workers = 4)
# For shared systems: Be considerate
plan(multisession, workers = 2)Parallel Processing Strategies
Strategy 1: Multisession (Recommended)
# Works on all platforms (Windows, Mac, Linux)
plan(multisession, workers = 4)Pros: - Cross-platform - Isolated workers (safe) - No shared memory issues
Cons: - Slight overhead from data copying - Each worker starts fresh R session
Strategy 2: Multicore (Unix/Mac Only)
# Only works on Unix/Mac (not Windows)
plan(multicore, workers = 4)Pros: - Lower overhead (forked processes) - Shared memory (faster)
Cons: - Not available on Windows - Can cause issues in RStudio - May interfere with some packages
Recommendation: Use multisession unless
you have specific reasons to use multicore.
Automatic Thread Management
The Thread Oversubscription Problem
When you use parallel processing with synthdid, there’s a potential performance trap: thread oversubscription.
What Happens Without Thread Management
Your system: 4 CPU cores
Your setup: plan(multisession, workers = 4)
Each R worker uses multi-threaded BLAS (default: 8 threads)
Total threads: 4 workers × 8 BLAS threads = 32 threads
Result: Threads compete for 4 cores → context switching → SLOW
Performance impact: Instead of 4x speedup, you might only get 1.5x!
Automatic Solution in synthdid
The synthdid package automatically prevents this problem:
plan(multisession, workers = 4)
# synthdid automatically detects parallel processing and:
# 1. Sets BLAS to 1 thread per worker
# 2. Runs your computation efficiently
# 3. Restores original BLAS threads when done
result <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "bootstrap",
se_replications = 200)
# Console output:
# "Parallel processing detected: Setting BLAS to single-threaded mode (was 8 threads)"Result: 4 workers × 1 BLAS thread = 4 threads (optimal for 4 cores) → 3.5-4x speedup!
Recommended: Install RhpcBLASctl
For the most reliable thread control, install the optional
RhpcBLASctl package:
install.packages("RhpcBLASctl")This package provides runtime control over BLAS threads and works with: - OpenBLAS (most common on Linux) - Intel MKL (high-performance systems) - Apple Accelerate (macOS)
Without it, synthdid falls back to environment variables, which may require restarting R.
Verifying Thread Management
# Check if RhpcBLASctl is available
if (requireNamespace("RhpcBLASctl", quietly = TRUE)) {
cat("BLAS threads:", RhpcBLASctl::blas_get_num_procs(), "\n")
cat("Thread management: Available\n")
} else {
cat("Thread management: Using environment variables\n")
cat("Recommendation: install.packages('RhpcBLASctl')\n")
}Best Practices
Decision Tree: Sequential vs Parallel
Start here:
│
├─ Dataset < 50 units?
│ ├─ YES → SE method = jackknife?
│ │ ├─ YES → Use SEQUENTIAL (fast enough)
│ │ └─ NO → Use PARALLEL (bootstrap/placebo benefit)
│ │
│ └─ NO → Use PARALLEL (large datasets always benefit)
Practical Guidelines
Use Sequential When:
- Small datasets (< 50 units)
- Jackknife SE (already fast)
- Interactive analysis (quick iteration)
- Debugging (easier to trace errors)
- Other processes need CPU (being a good citizen)
Example:
Use Parallel When:
- Large datasets (> 50 units)
- Bootstrap SE (many replications)
- Placebo SE (computationally intensive)
- Production pipelines (maximize throughput)
- Many estimates (running multiple models)
Example:
# Production analysis with bootstrap
library(future)
plan(multisession, workers = parallel::detectCores() - 1)
result <- synthdid(Y ~ treatment,
data = my_large_dataset,
index = c("unit", "time"),
se = TRUE,
se_method = "bootstrap",
se_replications = 500)
plan(sequential) # Clean upAdvanced: Running Multiple Estimates in Parallel
If you need to run synthdid on multiple datasets or specifications, you can parallelize at a higher level:
Example: Multiple Specifications
library(future)
library(furrr)
# Set up parallel processing
plan(multisession, workers = 4)
# Different outcome variables
outcomes <- c("PacksPerCapita", "AlcoholConsumption", "Healthcare")
# Run all specifications in parallel
results <- future_map(outcomes, function(outcome) {
formula <- as.formula(paste(outcome, "~ treated"))
# Note: Each worker runs synthdid sequentially
# (automatic thread management handles BLAS threads)
synthdid(formula,
data = my_data,
index = c("State", "Year"),
se = TRUE,
se_method = "jackknife") # Use jackknife since we parallelize at higher level
})
names(results) <- outcomes
plan(sequential)Key insight: When parallelizing multiple estimates,
use se_method = "jackknife" or se = FALSE to
avoid nested parallelism.
Example: Cross-Validation
# Split data for cross-validation
cv_folds <- 5
# Run CV in parallel
cv_results <- future_map(1:cv_folds, function(fold) {
train_data <- subset(my_data, cv_fold != fold)
test_data <- subset(my_data, cv_fold == fold)
# Train on training set
model <- synthdid(Y ~ treat,
data = train_data,
index = c("unit", "time"),
se = FALSE) # Skip SE for speed
# Evaluate on test set
# ... your evaluation code ...
})Troubleshooting
Issue: Parallel Processing Seems Slow
Possible causes:
-
Thread oversubscription (should be automatic, but check)
# Check BLAS configuration sessionInfo()$BLAS # If you have RhpcBLASctl RhpcBLASctl::blas_get_num_procs() -
Too many workers for dataset size
# Try fewer workers plan(multisession, workers = 2) # Instead of 8 -
Small dataset where overhead dominates
# Use sequential for small datasets plan(sequential)
Issue: Parallel Processing Not Speeding Up
Check your setup:
# Verify parallel plan is active
print(future::plan())
# Should show: "multisession" or "multicore"
# Not: "sequential"
# Check number of workers
nbrOfWorkers()Issue: RStudio Hangs with Multicore
Solution: Use multisession instead:
# Don't use multicore in RStudio
# plan(multicore, workers = 4) # May hang
# Use multisession instead
plan(multisession, workers = 4) # Works reliablyPerformance Summary Table
Based on California Prop 99 dataset (39 units, 31 periods):
| Method | Replications | Sequential | Parallel (4 cores) | Speedup |
|---|---|---|---|---|
| Jackknife | 39 | 3-5 sec | 2-3 sec | 1.5-2x |
| Bootstrap | 200 | 40-50 sec | 12-15 sec | 3-3.5x |
| Bootstrap | 500 | 90-120 sec | 25-35 sec | 3-3.5x |
| Placebo | 100 | 60-80 sec | 18-25 sec | 3-3.5x |
For larger datasets (100 units):
| Method | Replications | Sequential | Parallel (8 cores) | Speedup |
|---|---|---|---|---|
| Jackknife | 100 | 20-30 sec | 5-8 sec | 3-4x |
| Bootstrap | 200 | 3-5 min | 30-45 sec | 6-7x |
| Placebo | 100 | 5-8 min | 45-70 sec | 6-7x |
Key takeaway: Larger datasets and more replications benefit most from parallel processing.
Complete Example Workflow
Here’s a complete analysis workflow using parallel processing:
library(synthdid)
library(future)
# 1. Load and prepare data
data(california_prop99)
# 2. Set up parallel processing
cat("Setting up parallel processing with", parallel::detectCores() - 1, "workers\n")
plan(multisession, workers = parallel::detectCores() - 1)
# 3. Quick estimate (no SE)
cat("\nStep 1: Quick estimate without SE...\n")
quick_result <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = FALSE)
print(quick_result)
# 4. Full estimate with bootstrap SE
cat("\nStep 2: Computing bootstrap standard errors...\n")
cat("(This will use parallel processing automatically)\n")
system.time({
final_result <- synthdid(PacksPerCapita ~ treated,
data = california_prop99,
index = c("State", "Year"),
se = TRUE,
se_method = "bootstrap",
se_replications = 200)
})
# 5. View results
summary(final_result)
# 6. Compare with other methods
cat("\nStep 3: Comparing with DID and SC methods...\n")
did_result <- update(final_result, method = "did")
sc_result <- update(final_result, method = "sc")
# 7. Plot results
plot(final_result)
# 8. Clean up parallel processing
cat("\nCleaning up parallel processing...\n")
plan(sequential)
cat("\nAnalysis complete!\n")Recommendations by Scenario
Scenario 1: Interactive Data Exploration
Setup: Working in RStudio, trying different specifications
# Keep it simple - sequential is fine
plan(sequential)
# Quick iterations
result1 <- synthdid(Y1 ~ treat, data = data, index = c("unit", "time"), se = FALSE)
result2 <- synthdid(Y2 ~ treat, data = data, index = c("unit", "time"), se = FALSE)
result3 <- synthdid(Y3 ~ treat, data = data, index = c("unit", "time"), se = FALSE)
# Add SE only to final model
final <- synthdid(Y1 ~ treat, data = data, index = c("unit", "time"),
se = TRUE, se_method = "jackknife")Scenario 2: Production Pipeline
Setup: Automated analysis on server, maximum performance needed
library(future)
# Use all available cores
plan(multisession, workers = parallel::detectCores())
# Run comprehensive analysis
result <- synthdid(Y ~ treat,
data = large_dataset,
index = c("unit", "time"),
se = TRUE,
se_method = "bootstrap",
se_replications = 500)
# Save results
saveRDS(result, "results/synthdid_estimate.rds")
plan(sequential)Scenario 3: Research Paper
Setup: Need robust SEs, have time to compute
library(future)
# Conservative: Leave cores for other tasks
plan(multisession, workers = parallel::detectCores() - 2)
# High-quality bootstrap SE
result <- synthdid(Y ~ treat,
data = paper_data,
index = c("unit", "time"),
se = TRUE,
se_method = "bootstrap",
se_replications = 1000) # More replications for publication
summary(result)
confint(result, level = 0.95)
plan(sequential)Scenario 4: Shared Server
Setup: Working on shared computational resources
library(future)
# Be considerate: Use only a few cores
plan(multisession, workers = 4) # Even if 64 cores available
# Run analysis
result <- synthdid(Y ~ treat,
data = my_data,
index = c("unit", "time"),
se = TRUE,
se_method = "bootstrap",
se_replications = 200)
plan(sequential)Summary
Key Points
- Parallel processing provides 3-7x speedups for bootstrap and placebo SE methods
- Thread management is automatic - no configuration needed
-
Use
plan(multisession, workers = N)to enable parallel processing - Install RhpcBLASctl for optimal thread control
-
Choose workers wisely - typically
detectCores() - 1
When to Use What
| Situation | Recommendation | Workers |
|---|---|---|
| Small dataset + jackknife | Sequential | 1 |
| Small dataset + bootstrap | Parallel | 2-4 |
| Large dataset + any SE | Parallel | 4-8 |
| Interactive work | Sequential | 1 |
| Production pipeline | Parallel | All - 1 |
| Shared system | Parallel | 2-4 |
Additional Resources
- See
vignette("formula-interface")for the modern synthdid interface - See
THREAD_MANAGEMENT.mdfor technical details on thread management - See
?future::planfor advanced parallel processing options