---
title: "Some helper functions for statistical analysis"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Some helper functions for statistical analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

Many widely used and powerful statistical analysis commands --- such as `lm`, `glm`, `lme4::lmer`, etc --- have a simple and consistent calling syntax, often involving a "formula" (e.g., `y ~ x`), which makes them consistent, and easy to remember and apply.
Some other functions, even simple ones, don't use the formula syntax, or can be a bit awkward to use in some contexts, or require default values of arguments to be explicitly overridden.
In the `psyntur`, there are some tools that aim to make this functions easier to apply.

These functions and the accompanying data sets can be loaded with the usual `library` command.
```{r setup}
library(psyntur)
```

# Independent samples t-test with `t_test`

R's `stats::t.test` makes it easy to perform independent, paired, or one-sample t-tests.
For the independent sample t-test, the default is the Welch two sample t-test.
While arguably a good choice in practice, when t-tests are being taught to illustrate a simple example of normal linear model, the assumption of homogeneity of variance is used.
To use this with `t.test`, this requires `var.equal = TRUE` to be used.
The `t_test` function is `psyntur` is used when the standard independent t-test with homogeneity of variance is the desired default test.
For example, in the following, we use it with the `faithfulfaces` data set. 
```{r}
t_test(trustworthy ~ face_sex, data = faithfulfaces)
```

# Paired samples t-test with `paired_t_test`

For paired t-tests, the `paired_t_test` function can be used.
In this function, a formula is not used.
Instead, two variables in the same data frame, which are assumed to be paired in some manner, are used.
For example, the `pairedsleep` data set (included in `psyntur`) is as follows.
```{r}
pairedsleep
```
This gives the difference from control in number of hours slept by `r nrow(pairedsleep)` different patients when each took two different drugs.
These time differences under the two drugs are `y1` and `y2`.
A paired samples t-test can be performed as follows with this data.
```{r}
paired_t_test(y1, y2, data = pairedsleep)
```


# Pairwise t-tests with `pairwise_t_test`

For independent t-tests applied all pairs of a set of variables, to which p-value adjustments are applied, we can use `pairwise_t_test`.
For example, the following creates a categorical variable with four values, which are the interaction of two binary variables.
```{r}
data_df <- dplyr::mutate(vizverb, IV = interaction(task, response))
```
Independent samples t-tests with Bonferroni corrections on the `time` variable applied to all pairs of the four levels of the `IV` variable can be done as follows.
```{r}
pairwise_t_test(time ~ IV, data = data_df)
```

# Shipiro-Wilk test with `shapiro_test`

The Shapiro-Wilk test of normality can be applied to a single numeric vector in a data frame as in the following example.
```{r}
shapiro_test(time, data = data_df)
```
To test the normality of each subset of a variable, such as `time`, corresponding to the values of a categorical variable, we can use a `by` variable as in the following example.
```{r}
shapiro_test(time, by = IV, data = data_df)
```


