Skip to contents

We compare the Monte Carlo (MC) method with nonparametric bootstrapping (NB) using the simple mediation model with missing data using multiple imputation. One advantage of MC over NB is speed. This is because the model is only fitted once in MC whereas it is fitted many times in NB.

Data

n <- 1000
a <- 0.50
b <- 0.50
cp <- 0.25
s2_em <- 1 - a^2
s2_ey <- 1 - cp^2 - a^2 * b^2 - b^2 * s2_em - 2 * cp * a * b
em <- rnorm(n = n, mean = 0, sd = sqrt(s2_em))
ey <- rnorm(n = n, mean = 0, sd = sqrt(s2_ey))
X <- rnorm(n = n)
M <- a * X + em
Y <- cp * X + b * M + ey
df <- data.frame(X, M, Y)

# Create data set with missing values.

miss <- sample(1:dim(df)[1], 300)
df[miss[1:100], "X"] <- NA
df[miss[101:200], "M"] <- NA
df[miss[201:300], "Y"] <- NA

Multiple Imputation

Perform the appropriate multiple imputation approach to deal with missing values. In this example, we impute multivariate missing data under the normal model.

mi <- amelia(
  x = df,
  m = 5L,
  p2s = 0
)

Model Specification

The indirect effect is defined by the product of the slopes of paths X to M labeled as a and M to Y labeled as b. In this example, we are interested in the confidence intervals of indirect defined as the product of a and b using the := operator in the lavaan model syntax.

model <- "
  Y ~ cp * X + b * M
  M ~ a * X
  X ~~ X
  indirect := a * b
  direct := cp
  total := cp + (a * b)
"

Model Fitting

We can now fit the model using the sem() function from lavaan. We do not need to deal with missing values in this stage.

fit <- sem(data = df, model = model)

Monte Carlo Confidence Intervals (Multiple Imputation)

The fit lavaan object and mi mids object can then be passed to the MCMI() function from semmcci to generate Monte Carlo confidence intervals using multiple imputation as described in Pesigan and Cheung (2023).

MCMI(fit, R = 100L, alpha = 0.05, mi = mi)
#> Monte Carlo Confidence Intervals (Multiple Imputation Estimates)
#>             est     se   R   2.5%  97.5%
#> cp       0.2274 0.0303 100 0.1761 0.2971
#> b        0.5192 0.0332 100 0.4510 0.5769
#> a        0.4790 0.0292 100 0.4202 0.5305
#> X~~X     1.0613 0.0441 100 0.9796 1.1444
#> Y~~Y     0.5439 0.0241 100 0.4994 0.5876
#> M~~M     0.7642 0.0396 100 0.7030 0.8402
#> indirect 0.2486 0.0173 100 0.2160 0.2747
#> direct   0.2274 0.0303 100 0.1761 0.2971
#> total    0.4760 0.0291 100 0.4271 0.5509

Nonparametric Bootstrap Confidence Intervals (Multiple Imputation)

Nonparametric bootstrap confidence intervals can be generated in bmemLavaan using the following.

summary(
  bmemLavaan::bmem(data = df, model = model, method = "mi", boot = 100L, m = 5L)
)
#> 
#> Estimate method:                          multiple imputation
#> Sample size:                              1000      
#> Number of request bootstrap draws:        100       
#> Number of successful bootstrap draws:     100       
#> Type of confidence interval:              perc
#> 
#> Values of statistics:
#> 
#>                      Value      SE      2.5%     97.5%
#>   chisq               0.000    0.000    0.000    0.000   
#>   GFI                 1.000    0.000    1.000    1.000   
#>   AGFI                1.000    0.000    1.000    1.000   
#>   RMSEA               0.000    0.000    0.000    0.000   
#>   NFI                 1.000    0.000    1.000    1.000   
#>   NNFI                1.000    0.000    1.000    1.000   
#>   CFI                 1.000    0.000    1.000    1.000   
#>   BIC                 7742.967 81.777   7575.258 7857.675
#>   SRMR                0.000    0.000    0.000    0.000   
#> 
#> Estimation of parameters:
#> 
#>                      Estimate   SE      2.5%     97.5%
#> Regressions:
#>   Y ~
#>     X        (cp)     0.234    0.030    0.176    0.296
#>     M         (b)     0.513    0.032    0.460    0.570
#>   M ~
#>     X         (a)     0.476    0.030    0.426    0.540
#> 
#> Variances:
#>     X                 1.057    0.046    0.950    1.144
#>     Y                 0.556    0.027    0.488    0.600
#>     M                 0.755    0.035    0.684    0.813
#> 
#> 
#> 
#> Defined parameters:
#>     a*b    (indr)     0.244    0.020    0.206    0.285
#>     cp     (drct)     0.234    0.030    0.176    0.296
#>     cp+(*) (totl)     0.479    0.030    0.428    0.539

Benchmark

Arguments

Variables Values Notes
R 100 Number of Monte Carlo replications.
B 100 Number of bootstrap samples.
m 5 Number of imputations.

Benchmark

benchmark_mi_01 <- microbenchmark(
  MC = {
    fit <- sem(
      data = df,
      model = model
    )
    mi <- Amelia::amelia(
      x = df,
      m = m,
      p2s = 0
    )
    MCMI(
      fit,
      R = R,
      decomposition = "chol",
      pd = FALSE,
      mi = mi
    )
  },
  NB = bmemLavaan::bmem(
    data = df,
    model = model,
    method = "mi",
    boot = B,
    m = m
  ),
  times = 10
)

Summary of Benchmark Results

summary(benchmark_mi_01, unit = "ms")
#>   expr        min         lq       mean     median         uq        max neval
#> 1   MC   357.7178   369.8743   392.1105   373.3804   374.9043   580.4906    10
#> 2   NB 35608.5881 35675.6870 39860.3487 36334.5583 42444.1272 49886.7982    10

Summary of Benchmark Results Relative to the Faster Method

summary(benchmark_mi_01, unit = "relative")
#>   expr      min       lq     mean   median       uq      max neval
#> 1   MC  1.00000  1.00000   1.0000  1.00000   1.0000  1.00000    10
#> 2   NB 99.54379 96.45355 101.6559 97.31245 113.2132 85.93902    10

Plot

Benchmark - Monte Carlo Method with Precalculated Estimates and Multiple Imputation

fit <- sem(
  data = df,
  model = model
)
mi <- Amelia::amelia(
  x = df,
  m = m,
  p2s = 0
)
benchmark_mi_02 <- microbenchmark(
  MC = MCMI(
    fit,
    R = R,
    decomposition = "chol",
    pd = FALSE,
    mi = mi
  ),
  NB = bmemLavaan::bmem(
    data = df,
    model = model,
    method = "mi",
    boot = B,
    m = m
  ),
  times = 10
)

Summary of Benchmark Results

summary(benchmark_mi_02, unit = "ms")
#>   expr        min         lq       mean     median         uq        max neval
#> 1   MC   236.7124   241.3807   313.8474   247.8928   422.3154   423.4957    10
#> 2   NB 35447.8084 40962.4077 46805.9772 48673.6985 53452.4033 53530.8860    10

Summary of Benchmark Results Relative to the Faster Method

summary(benchmark_mi_02, unit = "relative")
#>   expr      min       lq     mean   median       uq      max neval
#> 1   MC   1.0000   1.0000   1.0000   1.0000   1.0000   1.0000    10
#> 2   NB 149.7506 169.7004 149.1361 196.3498 126.5699 126.4024    10

Plot

References

Pesigan, I. J. A., & Cheung, S. F. (2023). Monte Carlo confidence intervals for the indirect effect with missing data. Behavior Research Methods, 56(3), 1678–1696. https://doi.org/10.3758/s13428-023-02114-4