betaMC: Monte Carlo Method Combined with Multiple Imputation

The Monte Carlo Method combined with multiple imputation described in (Pesigan-Cheung-2023?) is implemented using the MCMI() function in the context of linear regression. Multiple imputation is used to deal with missing values in a data set. The vector of parameter estimates and the corresponding sampling covariance matrix are estimated for each of the imputed data sets. Results are combined to arrive at the pooled vector of parameter estimates and the corresponding sampling covariance matrix. The pooled estimates are then used to generate the sampling distribution of regression parameters. The output of the MCMI() function can be passed to the BetaMC(), RSqMC(), DeltaRSqMC(), SCorMC(), PCorMC(), and DiffBetaMC() functions to generate confidence intervals for various regression effect sizes.

In this example, we use the data set and the model used in betaMC: Example Using the BetaMC Function. We use the mice::ampute() function to generate a data set with missing values.

library(betaMC)

df <- betaMC::nas1982

set.seed(42)
df <- mice::ampute(df)$amp
df
#>    QUALITY NFACUL NGRADS PCTSUPP PCTGRT NARTIC PCTPUB
#> 1       12     13     19      16      8     14     39
#> 2       23     29     72      67      3     61     66
#> 3       29     38    111      NA     13     68     68
#> 4       36     16     28      52     63     49     75
#> 5       44     40    104      64     53     NA     83
#> 6       21     14     28      59     29     65     79
#> 7       40     44     16      81     35     79     82
#> 8       42     NA     57      65     40    187     82
#> 9       24     16     18      87     19     NA     75
#> 10      30     37     41      43      8     NA     54
#> 11      20     20     45      26     25     49     50
#> 12       8     11     27       7      0      9     27
#> 13      NA     29    112      64     35     65     69
#> 14      14     14     57      10      0     11     43
#> 15      27     38    167      28     NA    196     84
#> 16      46     27    113      62     52    173     85
#> 17      NA     32    122      51     19     79     69
#> 18      42     NA    116      56     32    208     73
#> 19      33     32     54      49     19    120     69
#> 20      31     42     79      41     NA    114     71
#> 21      23     30     76      22     20     87     67
#> 22      18     NA     62      39      6     10     39
#> 23      29     41     98      41     12    101     66
#> 24      21     23     52      33      4     59     78
#> 25      45     NA    222      64     32    274     70
#> 26      25     26     63      39     NA    160     89
#> 27      18     16     24       4     31     39     63
#> 28      NA     38    154      55     34     84     63
#> 29      21     19     40       7      5     60     84
#> 30      24     16     18      25     63     31     63
#> 31      15     13     29      23     15     62     85
#> 32      15     23     41      51      4     24     NA
#> 33      36     32     69      65     16    122     75
#> 34      38     21     38      28     48     92     91
#> 35      32     28     90      70     36    117     61
#> 36      27     22     52      10     27    114     86
#> 37      16     20     80      46     10     19     40
#> 38      26     32     41      13      6     64     56
#> 39      NA     26     81      70     58    155    100
#> 40      26     40     81      42     10     70     68
#> 41      14     19     87      15      5     72     79
#> 42      12     17     26       9      6     15     59
#> 43      29     29     71      74     17     85     76
#> 44      34     27     20       0     29     79     57
#> 45      28     26     70      68     27     84     73
#> 46      NA     36     59      57     67    172     83

Multiple Imputation

mi <- mice::mice(
  df,
  m = 100,
  seed = 42,
  print = FALSE
)

Regression

Fit the regression model using the lm() function. Note that this does not deal with missing values. The fitted model (object) is updated with each imputed data within the MCMI() function.

object <- lm(QUALITY ~ NARTIC + PCTGRT + PCTSUPP, data = df)

Monte Carlo Sampling Distribution of Parameters

hc3 <- MCMI(object, mi = mi, type = "hc3")

Standardized Regression Slopes

BetaMC(hc3)
#> Call:
#> BetaMC(object = hc3)
#> 
#> Standardized regression slopes
#> type = "hc3"
#>            est     se     R   0.05%    0.5%   2.5%  97.5%  99.5% 99.95%
#> NARTIC  0.5197 0.0787 20000  0.2267  0.3008 0.3499 0.6579 0.7014 0.7456
#> PCTGRT  0.4017 0.0859 20000  0.1058  0.1705 0.2270 0.5636 0.6125 0.6778
#> PCTSUPP 0.2327 0.0905 20000 -0.0861 -0.0144 0.0484 0.4024 0.4546 0.5359

Multiple Correlation Coefficients

RSqMC(hc3)
#> Call:
#> RSqMC(object = hc3)
#> 
#> R-squared and adjusted R-squared
#> type = "hc3"
#>        est     se     R  0.05%   0.5%   2.5%  97.5%  99.5% 99.95%
#> rsq 0.8154 0.0610 20000 0.5097 0.5978 0.6583 0.8976 0.9217 0.9451
#> adj 0.8022 0.0671 20000 0.4607 0.5575 0.6241 0.8873 0.9138 0.9396

Improvement in R-squared

DeltaRSqMC(hc3)
#> Call:
#> DeltaRSqMC(object = hc3)
#> 
#> Improvement in R-squared
#> type = "hc3"
#>            est     se     R  0.05%   0.5%   2.5%  97.5%  99.5% 99.95%
#> NARTIC  0.2070 0.0717 20000 0.0032 0.0243 0.0601 0.3448 0.4013 0.4730
#> PCTGRT  0.1270 0.0614 20000 0.0016 0.0108 0.0275 0.2654 0.3265 0.4089
#> PCTSUPP 0.0435 0.0346 20000 0.0000 0.0002 0.0019 0.1301 0.1736 0.2534

Semipartial Correlation Coefficients

SCorMC(hc3)
#> Call:
#> SCorMC(object = hc3)
#> 
#> Semipartial correlations
#> type = "hc3"
#>            est     se     R   0.05%    0.5%   2.5%  97.5%  99.5% 99.95%
#> NARTIC  0.4549 0.0858 20000  0.0570  0.1559 0.2451 0.5872 0.6335 0.6878
#> PCTGRT  0.3564 0.0890 20000  0.0396  0.1038 0.1657 0.5152 0.5714 0.6395
#> PCTSUPP 0.2085 0.0816 20000 -0.0751 -0.0124 0.0420 0.3607 0.4167 0.5034

Squared Partial Correlation Coefficients

PCorMC(hc3)
#> Call:
#> PCorMC(object = hc3)
#> 
#> Squared partial correlations
#> type = "hc3"
#>            est     se     R  0.05%   0.5%   2.5%  97.5%  99.5% 99.95%
#> NARTIC  0.5289 0.1169 20000 0.0159 0.0969 0.2207 0.6855 0.7432 0.7962
#> PCTGRT  0.4082 0.1221 20000 0.0080 0.0551 0.1228 0.5953 0.6606 0.7452
#> PCTSUPP 0.1917 0.1148 20000 0.0000 0.0007 0.0085 0.4390 0.5307 0.6277

Differences of Standardized Regression Slopes

DiffBetaMC(hc3)
#> Call:
#> DiffBetaMC(object = hc3)
#> 
#> Differences of standardized regression slopes
#> type = "hc3"
#>                   est     se     R   0.05%    0.5%    2.5%  97.5%  99.5% 99.95%
#> NARTIC-PCTGRT  0.1179 0.1441 20000 -0.3857 -0.2584 -0.1715 0.3922 0.4777 0.5928
#> NARTIC-PCTSUPP 0.2869 0.1376 20000 -0.1701 -0.0728  0.0073 0.5486 0.6278 0.7173
#> PCTGRT-PCTSUPP 0.1690 0.1430 20000 -0.3190 -0.2025 -0.1103 0.4459 0.5356 0.6320

References

Dudgeon, P. (2017). Some improvements in confidence intervals for standardized regression coefficients. Psychometrika, 82(4), 928–951. https://doi.org/10.1007/s11336-017-9563-z

Ivan Jacob Agaloos Pesigan