Benchmark: Comparing the Monte Carlo Method with Nonparametric Bootstrapping (MI)
Ivan Jacob Agaloos Pesigan
2024-04-14
Source:vignettes/benchmark-mi.Rmd
benchmark-mi.Rmd
We compare the Monte Carlo (MC) method with nonparametric bootstrapping (NB) using the simple mediation model with missing data using multiple imputation. One advantage of MC over NB is speed. This is because the model is only fitted once in MC whereas it is fitted many times in NB.
Data
n <- 1000
a <- 0.50
b <- 0.50
cp <- 0.25
s2_em <- 1 - a^2
s2_ey <- 1 - cp^2 - a^2 * b^2 - b^2 * s2_em - 2 * cp * a * b
em <- rnorm(n = n, mean = 0, sd = sqrt(s2_em))
ey <- rnorm(n = n, mean = 0, sd = sqrt(s2_ey))
X <- rnorm(n = n)
M <- a * X + em
Y <- cp * X + b * M + ey
df <- data.frame(X, M, Y)
# Create data set with missing values.
miss <- sample(1:dim(df)[1], 300)
df[miss[1:100], "X"] <- NA
df[miss[101:200], "M"] <- NA
df[miss[201:300], "Y"] <- NA
Multiple Imputation
Perform the appropriate multiple imputation approach to deal with missing values. In this example, we impute multivariate missing data under the normal model.
mi <- amelia(
x = df,
m = 5L,
p2s = 0
)
Model Specification
The indirect effect is defined by the product of the slopes of paths
X
to M
labeled as a
and
M
to Y
labeled as b
. In this
example, we are interested in the confidence intervals of
indirect
defined as the product of a
and
b
using the :=
operator in the
lavaan
model syntax.
model <- "
Y ~ cp * X + b * M
M ~ a * X
X ~~ X
indirect := a * b
direct := cp
total := cp + (a * b)
"
Model Fitting
We can now fit the model using the sem()
function from
lavaan
. We do not need to deal with missing values in this
stage.
fit <- sem(data = df, model = model)
Monte Carlo Confidence Intervals (Multiple Imputation)
The fit
lavaan
object and mi
mids
object can then be passed to the MCMI()
function from semmcci
to generate Monte Carlo confidence
intervals using multiple imputation as described in Pesigan and Cheung
(2023).
MCMI(fit, R = 100L, alpha = 0.05, mi = mi)
#> Monte Carlo Confidence Intervals (Multiple Imputation Estimates)
#> est se R 2.5% 97.5%
#> cp 0.2274 0.0303 100 0.1761 0.2971
#> b 0.5192 0.0332 100 0.4510 0.5769
#> a 0.4790 0.0292 100 0.4202 0.5305
#> X~~X 1.0613 0.0441 100 0.9796 1.1444
#> Y~~Y 0.5439 0.0241 100 0.4994 0.5876
#> M~~M 0.7642 0.0396 100 0.7030 0.8402
#> indirect 0.2486 0.0173 100 0.2160 0.2747
#> direct 0.2274 0.0303 100 0.1761 0.2971
#> total 0.4760 0.0291 100 0.4271 0.5509
Nonparametric Bootstrap Confidence Intervals (Multiple Imputation)
Nonparametric bootstrap confidence intervals can be generated in
bmemLavaan
using the following.
summary(
bmemLavaan::bmem(data = df, model = model, method = "mi", boot = 100L, m = 5L)
)
#>
#> Estimate method: multiple imputation
#> Sample size: 1000
#> Number of request bootstrap draws: 100
#> Number of successful bootstrap draws: 100
#> Type of confidence interval: perc
#>
#> Values of statistics:
#>
#> Value SE 2.5% 97.5%
#> chisq 0.000 0.000 0.000 0.000
#> GFI 1.000 0.000 1.000 1.000
#> AGFI 1.000 0.000 1.000 1.000
#> RMSEA 0.000 0.000 0.000 0.000
#> NFI 1.000 0.000 1.000 1.000
#> NNFI 1.000 0.000 1.000 1.000
#> CFI 1.000 0.000 1.000 1.000
#> BIC 7742.967 81.777 7575.258 7857.675
#> SRMR 0.000 0.000 0.000 0.000
#>
#> Estimation of parameters:
#>
#> Estimate SE 2.5% 97.5%
#> Regressions:
#> Y ~
#> X (cp) 0.234 0.030 0.176 0.296
#> M (b) 0.513 0.032 0.460 0.570
#> M ~
#> X (a) 0.476 0.030 0.426 0.540
#>
#> Variances:
#> X 1.057 0.046 0.950 1.144
#> Y 0.556 0.027 0.488 0.600
#> M 0.755 0.035 0.684 0.813
#>
#>
#>
#> Defined parameters:
#> a*b (indr) 0.244 0.020 0.206 0.285
#> cp (drct) 0.234 0.030 0.176 0.296
#> cp+(*) (totl) 0.479 0.030 0.428 0.539
Benchmark
benchmark_mi_01 <- microbenchmark(
MC = {
fit <- sem(
data = df,
model = model
)
mi <- Amelia::amelia(
x = df,
m = m,
p2s = 0
)
MCMI(
fit,
R = R,
decomposition = "chol",
pd = FALSE,
mi = mi
)
},
NB = bmemLavaan::bmem(
data = df,
model = model,
method = "mi",
boot = B,
m = m
),
times = 10
)
Summary of Benchmark Results
summary(benchmark_mi_01, unit = "ms")
#> expr min lq mean median uq max neval
#> 1 MC 368.0412 437.2158 458.1068 475.4443 486.0285 492.7675 10
#> 2 NB 40535.6089 41156.2078 42297.1027 42077.2300 42965.1487 44748.3086 10
Summary of Benchmark Results Relative to the Faster Method
summary(benchmark_mi_01, unit = "relative")
#> expr min lq mean median uq max neval
#> 1 MC 1.0000 1.00000 1.00000 1.00000 1.00000 1.00000 10
#> 2 NB 110.1388 94.13248 92.33023 88.50086 88.40048 90.81018 10
Benchmark - Monte Carlo Method with Precalculated Estimates and Multiple Imputation
fit <- sem(
data = df,
model = model
)
mi <- Amelia::amelia(
x = df,
m = m,
p2s = 0
)
benchmark_mi_02 <- microbenchmark(
MC = MCMI(
fit,
R = R,
decomposition = "chol",
pd = FALSE,
mi = mi
),
NB = bmemLavaan::bmem(
data = df,
model = model,
method = "mi",
boot = B,
m = m
),
times = 10
)
Summary of Benchmark Results
summary(benchmark_mi_02, unit = "ms")
#> expr min lq mean median uq max neval
#> 1 MC 228.6275 286.7165 318.5973 333.1918 353.131 374.9326 10
#> 2 NB 41033.9483 43091.2840 44699.1419 44119.5682 44616.917 51380.0153 10
Summary of Benchmark Results Relative to the Faster Method
summary(benchmark_mi_02, unit = "relative")
#> expr min lq mean median uq max neval
#> 1 MC 1.0000 1.0000 1.0000 1.0000 1.0000 1.000 10
#> 2 NB 179.4795 150.2923 140.2998 132.4149 126.3467 137.038 10