betaMC: Monte Carlo Method Combined with Multiple Imputation
Ivan Jacob Agaloos Pesigan
Source:vignettes/example-mc-mi.Rmd
example-mc-mi.RmdThe Monte Carlo Method combined with multiple imputation described in
(Pesigan-Cheung-2023?) is
implemented using the MCMI() function in the context of
linear regression. Multiple imputation is used to deal with missing
values in a data set. The vector of parameter estimates and the
corresponding sampling covariance matrix are estimated for each of the
imputed data sets. Results are combined to arrive at the pooled vector
of parameter estimates and the corresponding sampling covariance matrix.
The pooled estimates are then used to generate the sampling distribution
of regression parameters. The output of the MCMI() function
can be passed to the BetaMC(), RSqMC(),
DeltaRSqMC(), SCorMC(), PCorMC(),
and DiffBetaMC() functions to generate confidence intervals
for various regression effect sizes.
In this example, we use the data set and the model used in betaMC: Example Using the BetaMC
Function. We use the mice::ampute() function to
generate a data set with missing values.
df <- betaMC::nas1982
set.seed(42)
df <- mice::ampute(df)$amp
df
#> QUALITY NFACUL NGRADS PCTSUPP PCTGRT NARTIC PCTPUB
#> 1 12 13 19 16 8 14 39
#> 2 23 29 72 67 3 61 66
#> 3 29 38 111 NA 13 68 68
#> 4 36 16 28 52 63 49 75
#> 5 44 40 104 64 53 NA 83
#> 6 21 14 28 59 29 65 79
#> 7 40 44 16 81 35 79 82
#> 8 42 NA 57 65 40 187 82
#> 9 24 16 18 87 19 NA 75
#> 10 30 37 41 43 8 NA 54
#> 11 20 20 45 26 25 49 50
#> 12 8 11 27 7 0 9 27
#> 13 NA 29 112 64 35 65 69
#> 14 14 14 57 10 0 11 43
#> 15 27 38 167 28 NA 196 84
#> 16 46 27 113 62 52 173 85
#> 17 NA 32 122 51 19 79 69
#> 18 42 NA 116 56 32 208 73
#> 19 33 32 54 49 19 120 69
#> 20 31 42 79 41 NA 114 71
#> 21 23 30 76 22 20 87 67
#> 22 18 NA 62 39 6 10 39
#> 23 29 41 98 41 12 101 66
#> 24 21 23 52 33 4 59 78
#> 25 45 NA 222 64 32 274 70
#> 26 25 26 63 39 NA 160 89
#> 27 18 16 24 4 31 39 63
#> 28 NA 38 154 55 34 84 63
#> 29 21 19 40 7 5 60 84
#> 30 24 16 18 25 63 31 63
#> 31 15 13 29 23 15 62 85
#> 32 15 23 41 51 4 24 NA
#> 33 36 32 69 65 16 122 75
#> 34 38 21 38 28 48 92 91
#> 35 32 28 90 70 36 117 61
#> 36 27 22 52 10 27 114 86
#> 37 16 20 80 46 10 19 40
#> 38 26 32 41 13 6 64 56
#> 39 NA 26 81 70 58 155 100
#> 40 26 40 81 42 10 70 68
#> 41 14 19 87 15 5 72 79
#> 42 12 17 26 9 6 15 59
#> 43 29 29 71 74 17 85 76
#> 44 34 27 20 0 29 79 57
#> 45 28 26 70 68 27 84 73
#> 46 NA 36 59 57 67 172 83Multiple Imputation
mi <- mice::mice(
df,
m = 100,
seed = 42,
print = FALSE
)Regression
Fit the regression model using the lm() function. Note
that this does not deal with missing values. The fitted model
(object) is updated with each imputed data within the
MCMI() function.
object <- lm(QUALITY ~ NARTIC + PCTGRT + PCTSUPP, data = df)Monte Carlo Sampling Distribution of Parameters
hc3 <- MCMI(object, mi = mi, type = "hc3")Standardized Regression Slopes
BetaMC(hc3)
#> Call:
#> BetaMC(object = hc3)
#>
#> Standardized regression slopes
#> type = "hc3"
#> est se R 0.05% 0.5% 2.5% 97.5% 99.5% 99.95%
#> NARTIC 0.5197 0.0787 20000 0.2267 0.3008 0.3499 0.6579 0.7014 0.7456
#> PCTGRT 0.4017 0.0859 20000 0.1058 0.1705 0.2270 0.5636 0.6125 0.6778
#> PCTSUPP 0.2327 0.0905 20000 -0.0861 -0.0144 0.0484 0.4024 0.4546 0.5359Multiple Correlation Coefficients
RSqMC(hc3)
#> Call:
#> RSqMC(object = hc3)
#>
#> R-squared and adjusted R-squared
#> type = "hc3"
#> est se R 0.05% 0.5% 2.5% 97.5% 99.5% 99.95%
#> rsq 0.8154 0.0610 20000 0.5097 0.5978 0.6583 0.8976 0.9217 0.9451
#> adj 0.8022 0.0671 20000 0.4607 0.5575 0.6241 0.8873 0.9138 0.9396Improvement in R-squared
DeltaRSqMC(hc3)
#> Call:
#> DeltaRSqMC(object = hc3)
#>
#> Improvement in R-squared
#> type = "hc3"
#> est se R 0.05% 0.5% 2.5% 97.5% 99.5% 99.95%
#> NARTIC 0.2070 0.0717 20000 0.0032 0.0243 0.0601 0.3448 0.4013 0.4730
#> PCTGRT 0.1270 0.0614 20000 0.0016 0.0108 0.0275 0.2654 0.3265 0.4089
#> PCTSUPP 0.0435 0.0346 20000 0.0000 0.0002 0.0019 0.1301 0.1736 0.2534Semipartial Correlation Coefficients
SCorMC(hc3)
#> Call:
#> SCorMC(object = hc3)
#>
#> Semipartial correlations
#> type = "hc3"
#> est se R 0.05% 0.5% 2.5% 97.5% 99.5% 99.95%
#> NARTIC 0.4549 0.0858 20000 0.0570 0.1559 0.2451 0.5872 0.6335 0.6878
#> PCTGRT 0.3564 0.0890 20000 0.0396 0.1038 0.1657 0.5152 0.5714 0.6395
#> PCTSUPP 0.2085 0.0816 20000 -0.0751 -0.0124 0.0420 0.3607 0.4167 0.5034Squared Partial Correlation Coefficients
PCorMC(hc3)
#> Call:
#> PCorMC(object = hc3)
#>
#> Squared partial correlations
#> type = "hc3"
#> est se R 0.05% 0.5% 2.5% 97.5% 99.5% 99.95%
#> NARTIC 0.5289 0.1169 20000 0.0159 0.0969 0.2207 0.6855 0.7432 0.7962
#> PCTGRT 0.4082 0.1221 20000 0.0080 0.0551 0.1228 0.5953 0.6606 0.7452
#> PCTSUPP 0.1917 0.1148 20000 0.0000 0.0007 0.0085 0.4390 0.5307 0.6277Differences of Standardized Regression Slopes
DiffBetaMC(hc3)
#> Call:
#> DiffBetaMC(object = hc3)
#>
#> Differences of standardized regression slopes
#> type = "hc3"
#> est se R 0.05% 0.5% 2.5% 97.5% 99.5% 99.95%
#> NARTIC-PCTGRT 0.1179 0.1441 20000 -0.3857 -0.2584 -0.1715 0.3922 0.4777 0.5928
#> NARTIC-PCTSUPP 0.2869 0.1376 20000 -0.1701 -0.0728 0.0073 0.5486 0.6278 0.7173
#> PCTGRT-PCTSUPP 0.1690 0.1430 20000 -0.3190 -0.2025 -0.1103 0.4459 0.5356 0.6320