---
title: 'VeRUS: Verification of Reference Intervals Based on the Uncertainty of Sampling'
author: "Matthias Beck & Christopher M Rank"
date: "`r Sys.Date()`"

output: 
  html_document:
    theme: default
    toc: true
    toc_depth: 3

vignette: >
  %\VignetteIndexEntry{VeRUS: Verification of Reference Intervals Based on the Uncertainty of Sampling} 
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r global_options, echo=FALSE, eval=TRUE}
knitr::opts_chunk$set(fig.width = 7, fig.height = 5, fig.align = "center", echo = TRUE, eval = TRUE, warning = FALSE, message = FALSE)
# increasing the width of the stdout-stream
options(width = 200)
```

## Introduction

So far, most methods for the verification of reference intervals require the 
collection of reference samples that reasonably represent the local population. 
This is often not feasible due to the high costs and the time-consuming process 
of collecting these samples. The method **VeRUS** (Verification of Reference 
Intervals Based on the Uncertainty of Sampling) is a novel approach to verify 
reference intervals without the collection of samples. Instead, **VeRUS** 
combines the already available data from the laboratory information system with 
the uncertainty of sampling to verify the reference intervals. A detailed 
description of the method can be found in 
[https://doi.org/10.1515/cclm-2025-0728].

## Description of VeRUS 
At the core of VeRUS is the construction of uncertainty margins around the 
limits of the reference intervals. The uncertainty margins are defined as the 
approximated confidence interval of population quantiles at a given sample size. 
As the uncertainty margins are independent of the sample size of the reference
sample, they enable direct comparison of reference intervals derived from 
different sample sizes. 
These uncertainty margins can be shown when printing the result of *findRI*. 
For demonstration purposes, *testcase 1* is used as an example for RWD: The 
*print* and *plot* functions can now be used to display the uncertainty margins.

```{r loading_example, echo=TRUE}
library(refineR)
fit <- findRI(Data = testcase1)
print(fit, uncertaintyRegion = "uncertaintyMargin")
```
```{r plot_example, echo=TRUE}
# The uncertainty margins are shown in the plot
plot(fit, uncertaintyRegion = "uncertaintyMargin")
```

## Application of VeRUS

The standard use case of **VeRUS** requires only the result of refineR's *findRI* 
function and the candidate reference interval ,i.e., the one that is supposed to be verified. 
Suppose laboratory A wants to verify the reference interval [9.1, 31.5] for a 
certain biomarker. The verification can be done as follows:

1. Ensuring comparable analytical methods and demographics: 
    - Analytical methods: The analytical methods used in laboratory A should be comparable to those used to establish the reference interval.
    - Demographics: The local population of laboratory A should be similar to the population from which the reference interval was derived.
2. Careful filtering of the RWD according to Ammer et al. 2023 [https://doi.org/10.1093/jalm/jfac101].
3. Application of the *findRI* function to model the reference distribution from the filtered RWD.
4. Using the *verifyRI* function to verify the reference interval.


The most relevant parameters of the *verifyRI* function are:

- *RIdata*: (*RWDRI*) object (result of the *findRI* function) or a (*numeric*) vector specifying a reference interval.
- *RIcand*: (*RWDRI*) object or a (*numeric*) vector specifying the reference interval that needs to be verified.
- *RIperc*: (*numeric*) vector specifying the percentiles which define the reference interval (default: c(0.025, 0.975)).
- *printResults*: (*logical*) indicating whether to print results to the console.
- *generatePlot*: (*logical*) indicating whether to generate a verification plot.

```{r verification_1, echo=TRUE} 
# Suppose the currently used reference interval is [9.1, 31.5]
verifyRI(RIdata = fit, RIcand = c(9.1, 31.5))
```


As can be seen, the reference interval [9.1, 31.5] is verified by the method 
**VeRUS**. The first table that is printed shows the uncertainty margins around 
the point estimates of each reference limit. The second table that is printed 
shows the fraction of samples within the Data argument of `findRI` that are 
outside (below or above) or within the respective reference interval. 


The *verifyRI* function is flexible. If only the 99th percentile is relevant, 
one could verify it as follows: 
```{r one_sided, echo=TRUE}
verifyRI(RIdata = fit, RIcand = 67, RIperc = 0.99)
```


However, if more than one reference limit of a numeric *RIcand* is available, 
it is recommended to provide at least two of them as a vector (e.g., c(lower, upper))
and their corresponding percentiles as *RIperc*. Providing only a single value 
- if *RIdata* is an RWDRI object and *RICand* is a numeric vector - necessitates 
an imputation step which may lead to unexpected results.


### Possible Inputs of the *verifyRI* Function


The *verifyRI* function can be used to compare:

-	a RWDRI object (result of the *findRI* function) with a numeric reference interval 
-	two numeric reference intervals 
-	two RWDRI objects 

If the parameters describing the reference distribution from which the candidate 
RI is derived are known, a custom RWDRI object can be defined and provided to 
the *verifyRI* function to improve the accuracy of the verification.

```{r show_possible_inputs_of_verifyRI, fig.show='hold'}
# compare RWDRI object with a numeric reference interval
verifyRI(RIdata = fit, RIcand = c(9.1, 33.5),
 title = "Comparison of RWDRI with Numeric RI",
 printResults = FALSE)
# compare two numeric RIs
verifyRI(RIdata = c(4, 26), RIcand = c(1, 29),
 title = "Comparison of two Numeric RIs",
 printResults = FALSE)

# compare two RWDRI objects
# custom RWDRI object describing the reference distribution
# Mu and Sigma define the mean and standard deviation of a normal distribution.
# Then the inverse Box-Cox transformation is applied to the data with the power parameter Lambda.
# The Shift parameter is used to shift the distribution to the desired location.
custom_RWDRI <- list(Mu = 20, Sigma = 5, Lambda = 0.9, Shift = 0)
class(custom_RWDRI) <- "RWDRI"
verifyRI(RIdata = fit, RIcand = custom_RWDRI,
 title = "Comparison of two RWDRI objects",
 printResults = FALSE)
```

### Interpretation of the Results
The verification of each reference limit may result in one of three outcomes:

1.	The point estimate (PE) of each reference limit is within the corresponding 
uncertainty margin of the other reference interval. Here the greatest level of 
confidence in the verification is given. This is indicated by "Overlap with PE" 
in the first table and by the green color of the overlapping uncertainty margin.
2.	The uncertainty margins of the candidate RI and the local RI overlap but at 
least one point estimate is outside the uncertainty margin of the other interval. 
This is indicated by "Overlap of margins" in the first table and by the yellow 
color of the overlapping uncertainty margin.
3.	The uncertainty margins of the candidate RI and the local RI do not overlap. 
This is indicated by "No overlap" in the first table and by the red color of the 
non-overlapping uncertainty margin.

## Quantification of the Similarity of Two Reference Intervals
VeRUS can be used not only to verify reference intervals but also to determine the 
similarity of two reference intervals. The function *getRISimilarity* can be used for this.

```{r similarity, echo=TRUE}
# Suppose the currently used reference interval is [9.1, 55]
getRISimilarity(RIdata = fit, RIcand = c(9.1, 55))
```


RI similarity is quantified using uncertainty margins (UMs) around reference 
limits (RLs). UMs, which approximate quantile confidence intervals, narrow with 
larger sample sizes. The similarity level of two RIs is quantified by the 
largest sample size, i.e. the smallest UMs, for which the UMs of all 
corresponding RLs overlap. As sample sizes are not a particularly intuitive measure 
of similarity, it is also expressed as an s-value. The s-value presents the 
level of similarity analogous to the p-value: s ≤ 0.05 relates to overlapping 
CIs of RIs established with the standard non-parametric method with N=120 and 
therefore indicates the minimal sample size for which the RIs are considered 
equivalent. Smaller s-values indicate a higher similarity of the RIs and 
therefore higher certainty of the equivalence of the RIs. The s-value is 
calculated as follows: *s = 6 / Nmin* where Nmin is the maximum sample size for 
which the UMs of all corresponding RLs overlap.

## Advanced Applications

### Invisible Output of the *verifyRI* Function

The *verifyRI* function returns an invisible object. It is a named list with 
the following elements: 

- *testPassedPointEst*	(*logical*) indicating whether all point estimates are within the corresponding uncertainty margins i.e. highest level of confidence in the verification
- *testPassedMargins*   (*logical*) indicating whether all uncertainty margins overlap
- *RIVerificationTab*   (*data.frame*) summarizing the verification results

```{r output_verifyRI, echo=TRUE}
# We deactivate the printing of the results and plotting
verification_result <- verifyRI(RIdata = fit, RIcand = c(9.1, 33), printResults = FALSE, generatePlot = FALSE)
verification_tab <- verification_result$RIVerificationTab
# The verification table also contains the values of the power parameter (Lambda) of each estimated model.
# The Lambda values of the models of the candidate distribution and the local distribution are the same when and RWDRI object is compared to a numeric candidate RI.
print(paste0("All Lambdas are equal: ", identical(verification_tab$RICandLambda, verification_tab$RIdataLambda)))
# For this example, the Lambda values are irrelevant, as they are the same.
cols_of_interest <- colnames(verification_tab)
cols_of_interest <- cols_of_interest[!cols_of_interest %in% c("RICandLambda", "RIdataLambda")]

knitr::kable(verification_tab[, cols_of_interest])
```


### Non-Default Parameters to Define Uncertainty Margins

By default, the uncertainty margins approximate the 90% confidence intervals 
expected when estimating the population quantiles *RIperc* with a sample size 
of 120 which is the minimum sample size recommended for the non-parametric 
direct approach of establishing RIs. The width of the uncertainty margins can 
be adjusted with the parameters:

- *n*      (*integer*) indicating the sample size for which the sampling uncertainty shall be taken into account. Default is 120
- *UMprop* (*numeric*) defining the desired confidence level for the uncertainty margins. Default is 0.9 (90% approximated confidence interval).

```{r adjust_uncertainty_margins, fig.show='hold'}

verifyRI(RIdata = custom_RWDRI, RIcand = c(9.1, 31.5), title = "n=120, UMprop=0.9")
# increasing n may be desired to get a more strict verification
verifyRI(RIdata = custom_RWDRI, RIcand = c(9.1, 31.5), n = 1000, UMprop = 0.95, title = "n=1000, UMprop=0.95")
```

By default, an asymmetry correction is applied to the uncertainty margins. This
correction may be deactivated by setting the parameter *asymmetryCorr* to
*FALSE*. This is only relevant for highly skewed distributions.

The *n*, *UMprop*, and *asymmetryCorr* parameters can also be set in the
*getRI*, *print*, and *plot* functions.
```{r adjust_uncertainty_margins_2, echo=TRUE}
plot(fit, uncertaintyRegion = "uncertaintyMargin", n = 1000, UMprop = 0.95, asymmetryCorr = TRUE)
getRI(fit, n = 1000, UMprop = 0.95, asymmetryCorr = TRUE)
print(fit, uncertaintyRegion = "uncertaintyMargin", n = 1000, UMprop = 0.95, asymmetryCorr = TRUE)
```

### Customization of the Plot

The plot of *verifyRI* can be customized by setting the parameters:

- *xlab*  (character) The label of the x-axis.
- *title* (character) The title of the plot.
- *Scale* (character) The scale of the x-axis. Possible values are "original", "splitXAxis", and "transformed". Default is "original".
- *candLabel* (character) The label for the candidate reference interval.
- *dataLabel* (character) The label for the local reference interval.

```{r adjust_plot_1, echo=TRUE, fig.show='hold'}
scaleOptions <- c("original", "splitXAxis", "transformed")
labels <- c("default", "Laboratory", "Timepoint")
for(i in 1:3) {

if(i == 1){
  candLabel <- dataLabel <- NULL
}
if(i == 2){
  candLabel <- "Lab A"
  dataLabel <- "Lab B"
}
if(i == 3){
  candLabel <- "Timepoint A"
  dataLabel <- "Timepoint B"
}

  verifyRI(
    RIdata = fit, RIcand = c(9.1, 33), printResults = FALSE,
    xlab = paste("x-axis scale:", scaleOptions[i]),
    title = paste("Verification plot with", scaleOptions[i], "scale"),
    Scale = scaleOptions[i],
    candLabel = candLabel,
    dataLabel = dataLabel
  )
}
```


### Advanced Parameters for *getRISimilarity*

The *getRISimilarity* function can be customized with the following parameters:

- *UMprop*        (numeric) The desired confidence level for the uncertainty margins.
  Default is 0.9 (90% confidence interval).
- *asymmetryCorr* (logical) If TRUE, an asymmetry correction is applied to the
  uncertainty margins. Default is TRUE.
- *Overlap*       (*character*) either "OverlapMargins" (uncertainty margins overlap) or "OverlapPointEst" (point estimates are within uncertainty margins). Default is "OverlapMargins"

```{r similarity_2, echo=TRUE}
# Suppose the currently used reference interval is [9.1, 55]
getRISimilarity(RIdata = fit, RIcand = c(9.1, 55), UMprop = 0.95, asymmetryCorr = FALSE, Overlap = "OverlapPointEst")
```


### Alternative Verification Method

Haeckel et al. proposed using **Equivalence Limits** [https://doi.org/10.1515/labmed-2016-0002] 
to verify reference intervals. Equivalence Limits can be applied by setting the parameter *marginType* to *EL*.

```{r equivalence_limits, echo=TRUE}
verifyRI(
  RIdata = fit, RIcand = c(9.1, 33), printResults = TRUE,
  marginType = "EL",
  title = "Verification with Equivalence Limits"
)
```


## References
Beck, M., Dufey, F., Ammer, T., Schützenmeister, A., Zierk, J., Rank, C. M., &
Rauh, M. (2025). VeRUS: verification of reference intervals based on the
uncertainty of sampling. Clinical Chemistry and Laboratory Medicine. 
https://doi.org/10.1515/cclm-2025-0728.

Ammer, T., Schuetzenmeister, A., Rank, C.M., Doyle, K. Estimation of Reference
Intervals from Routine Data Using the refineR Algorithm — A Practical Guide. The
Journal of Applied Laboratory Medicine, 8(1):84-91 (2023).
https://doi.org/10.1093/jalm/jfac101.

Haeckel, Rainer, Wosniok, Werner and Arzideh, Farhad. Equivalence limits of
reference intervals for partitioning of population data. Relevant differences of
reference limits. LaboratoriumsMedizin. 40(3):199-205 (2016).
https://doi.org/10.1515/labmed-2016-0002.