---
title: "Using multiple imputation with regmedint"
author: "Kazuki Yoshida"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using multiple imputation with regmedint}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r, message = FALSE, tidy = FALSE, echo = F}
## knitr configuration: http://yihui.name/knitr/options#chunk_options
library(knitr)
showMessage <- FALSE
showWarning <- TRUE
set_alias(w = "fig.width", h = "fig.height", res = "results")
opts_chunk$set(comment = "##", error= TRUE, warning = showWarning, message = showMessage,
               tidy = FALSE, cache = FALSE, echo = TRUE,
               fig.width = 7, fig.height = 7,
               fig.path = "man/figures")
```

Missing data is the norm in real-life data analysis. Multiple imputation via the `mice` package is a popular option in R. Here we introduce simple missingness and demonstrate use of `regmedint` along with `mice`.

## Missing data generation
For demonstration purpose, missing data is introduced here.

```{r}
set.seed(138087069)
library(regmedint)
library(tidyverse)
## Prepare dataset
data(vv2015)
vv2015 <- vv2015 %>%
    select(-cens) %>%
    ## Generate exposure-dependent missing of mediator
    mutate(logit_p_m_miss = -1 + 0.5 * x,
           p_m_miss = exp(logit_p_m_miss) / (1 + exp(logit_p_m_miss)),
           ## Indicator draw
           ind_m_miss = rbinom(n = length(p_m_miss), size = 1, prob = p_m_miss),
           m_true = m,
           m = if_else(ind_m_miss == 1L, as.numeric(NA), m))
```

## Truth fit
Taking the advantage of the simulated setting, the true model is fit here.

```{r}
regmedint_true <-
    regmedint(data = vv2015,
              ## Variables
              yvar = "y",
              avar = "x",
              mvar = "m_true",
              cvar = c("c"),
              eventvar = "event",
              ## Values at which effects are evaluated
              a0 = 0,
              a1 = 1,
              m_cde = 1,
              c_cond = 0.5,
              ## Model types
              mreg = "logistic",
              yreg = "survAFT_weibull",
              ## Additional specification
              interaction = TRUE,
              casecontrol = FALSE)
summary(regmedint_true)
```

## Naive complete case analysis

```{r}
regmedint_cca <- vv2015 %>%
    filter(!is.na(m)) %>%
    regmedint(data = .,
              ## Variables
              yvar = "y",
              avar = "x",
              mvar = "m",
              cvar = c("c"),
              eventvar = "event",
              ## Values at which effects are evaluated
              a0 = 0,
              a1 = 1,
              m_cde = 1,
              c_cond = 0.5,
              ## Model types
              mreg = "logistic",
              yreg = "survAFT_weibull",
              ## Additional specification
              interaction = TRUE,
              casecontrol = FALSE)
summary(regmedint_cca)
```

## Multiple imputation

This specific data setting is a little tricky in that the outcome variable is a censored survival time variable. Here we will use a log transformed survival time.

```{r}
library(mice)
vv2015_mod <- vv2015 %>%
    mutate(log_y = log(y)) %>%
    select(x,m,c,log_y,event)
## Run mice
vv2015_mice <- mice(data = vv2015_mod, m = 50, printFlag = FALSE)
## Object containig 50 imputed dataset
vv2015_mice
```
After creating such MI datasets, mediation analysis can be performed in each dataset.

```{r}
## Fit in each MI dataset
vv2015_mice_regmedint <-
    vv2015_mice %>%
    ## Stacked up dataset
    mice::complete("long") %>%
    as_tibble() %>%
    mutate(y = exp(log_y)) %>%
    group_by(.imp) %>%
    ## Nested data frame
    nest() %>%
    mutate(fit = map(data, function(data) {
        regmedint(data = data,
                  ## Variables
                  yvar = "y",
                  avar = "x",
                  mvar = "m",
                  cvar = c("c"),
                  eventvar = "event",
                  ## Values at which effects are evaluated
                  a0 = 0,
                  a1 = 1,
                  m_cde = 1,
                  c_cond = 0.5,
                  ## Model types
                  mreg = "logistic",
                  yreg = "survAFT_weibull",
                  ## Additional specification
                  interaction = TRUE,
                  casecontrol = FALSE)
    })) %>%
    ## Extract point estimates and variance estimates
    mutate(coef_fit = map(fit, coef),
           vcov_fit = map(fit, vcov))
vv2015_mice_regmedint
```

The results can be combined using the mitools package.

```{r}
regmedint_mi <- mitools::MIcombine(results = vv2015_mice_regmedint$coef_fit,
                                   variances = vv2015_mice_regmedint$vcov_fit)
regmedint_mi_summary <- summary(regmedint_mi)
```

## Comparison

We can observe the MI estimtates are generally more in alignment with the true estimates than the complete-case analysis estimates.

```{r}
cbind(true = coef(regmedint_true),
      cca = coef(regmedint_cca),
      mi = regmedint_mi_summary$results)
```
