Skip to contents

In this example, we use the semmcci package to generate Monte Carlo confidence intervals from multiple imputation estimates as described in Pesigan & Cheung (2023). We use data from a study by Elliot and others (2007) on the effects of an intervention on healthy dietary behavior mediated by knowledge of healthy dietary behavior. This data was used as an empirical example in Yuan and MacKinnon (2009) and Wu and Jia (2013).

Data

The data frame elliot2007(), which is included in the manMCMedMiss package, has 354 cases and 3 variables:

  • x - Intervention group membership.
  • m - Knowledge of healthy dietary behavior (knowledge post-intervention minus knowledge pre-intervention).
  • y - Healthy dietary behavior (behavior post-intervention minus behavior pre-intervention).
knitr::kable(head(elliot2007))
x m y
0 1.0 1.666666
0 0.0 4.000000
0 0.0 0.000000
0 -0.5 -0.666667
0 0.5 1.333333
0 1.0 2.333334

Amputation

Generate sample data with missing values. The missing data mechanism is missing at random (MAR). The proportion of missing cases is 0.30.

set.seed(42)
data_missing <- AmputeData(
  elliot2007,
  mech = "MAR",
  prop = 0.10
)
knitr::kable(head(data_missing, n = 10))
x m y
0 1.000000 1.666666
0 0.000000 4.000000
0 0.000000 0.000000
0 -0.500000 -0.666667
0 0.500000 1.333333
0 1.000000 2.333334
NA 1.166667 3.333333
NA 0.833333 NA
0 1.666667 1.666667
0 -0.333333 1.666666

Multiple Imputation

Multiple data sets with complete data are generated from the original data set with missing values using multiple imputation. There are several packages in R such as Amelia, and mice that can be used to generate completed data sets. In the example below, we use the mice package to generate m = 100 completed data sets using the joint specification approach by specifying method = "norm".

library(mice)
mi <- mice(
  data_missing,
  method = "norm",
  m = 100,
  print = FALSE,
  seed = 42
)

Model Fitting

We fit the model using lavaan. We do not need to deal with missing values in this stage. The output is saved in the object fit.

library(lavaan)
model <- "
  y ~ x + b * m
  m ~ a * x
  x ~~ x
  indirect := a * b
"
fit <- sem(
  model = model,
  data = data_missing
)

Monte Carlo Confidence Intervals

The fit lavaan object and mi mids object can then be passed to the semmcci::MCMI() function to generate Monte Carlo confidence intervals using multiple imputation.

library(semmcci)
MCMI(fit, mi = mi, alpha = 0.05)
#> Monte Carlo Confidence Intervals (Multiple Imputation Estimates)
#>             est     se     R    2.5%  97.5%
#> y~x      0.1434 0.1276 20000 -0.1024 0.3929
#> b        0.1516 0.0554 20000  0.0420 0.2613
#> a        0.3661 0.1275 20000  0.1136 0.6185
#> x~~x     0.2418 0.0190 20000  0.2047 0.2790
#> y~~y     1.1485 0.0894 20000  0.9713 1.3217
#> m~~m     1.2115 0.0944 20000  1.0269 1.3973
#> indirect 0.0557 0.0296 20000  0.0087 0.1228

References

Elliot, D. L., Goldberg, L., Kuehl, K. S., Moe, E. L., Breger, R. K., & Pickering, M. A. (2007). The PHLAME (Promoting Healthy Lifestyles: Alternative Models’ Effects) firefighter study: Outcomes of two models of behavior change. Journal of occupational and environmental medicine, 49(2), 204–213. http://doi.org/10.1097/JOM.0b013e3180329a8d

Pesigan, I. J. A., & Cheung, S. F. (2023). Monte Carlo confidence intervals for the indirect effect with missing data. Behavior Research Methods. https://doi.org/10.3758/s13428-023-02114-4

Wu, W., & Jia, F. (2013). A new procedure to test mediation with missing data through nonparametric bootstrapping and multiple imputation, Multivariate Behavioral Research, 48(5), 663-691. http://doi.org/10.1080/00273171.2013.816235

Yuan, Y., & MacKinnon, D. P. (2009). Bayesian mediation analysis. Psychological methods, 14(4), 301–322. http://doi.org/10.1037/a0016972