---
title: "recor: Rearrangement Correlation Coefficient"
author: "Xinbo Ai"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{recor: Rearrangement Correlation Coefficient}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

<span style="background-color: #f0f0f0;">**Pearson's $r$ is undoubtedly the gold measure for linear dependence. Now, it might be the gold measure also for nonlinear monotone dependence, if adjusted.**</span>  

## Overview

*recor* is an R package that implements the **Rearrangement Correlation Coefficient ($r^\#$)**, an adjusted version of Pearson's correlation coefficient designed to accurately measure arbitrary monotone dependence relationships (both linear and nonlinear). Based on cutting-edge statistical research, this package addresses the underestimation problem of traditional correlation coefficients in nonlinear monotone scenarios. The rearrangement correlation is derived from a tighter inequality than the classical Cauchy-Schwarz inequality, providing sharper bounds and expanded capture range.

## Features 

- 🎯 **Extended Capture Range**: From linear to arbitrary monotone dependence.
- 📊 **High Precision Measurement**: More accurate strength measurement than classical coefficients.
- 🔄 **Backward Compatibility**: Reverts to Pearson's $r$ in linear scenarios, and to Spearman's $\rho$ when calculated on ranks. 
- 🚀 **Efficient Implementation**: Optimized computation with C++ backend.
- 📈 **Multiple Input Support**: Automatically handles various input types (vector, matrix, data.frame) consistently with ```stats::cor()```.


## Quick Start
### Basic Usage

```r
library(recor)

x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
recor(x, y)
#> [1] 1

# Nonlinear monotone relationship
x <- c(1, 2, 3, 4, 5)
y <- c(1, 8, 27, 65, 125) # y = x^3
recor(x, y) # Higher value than Pearson's r
#> [1] 1
cor(x, y)
#> [1] 0.944458


# Matrix example
set.seed(123)
mat <- matrix(rnorm(100), ncol = 5)
colnames(mat) <- LETTERS[1:5]
recor(mat) # 5x5 correlation matrix
#>    A           B          C          D          E
#> A  1.00000000 -0.09511994 -0.1283021  0.1243721 -0.2328551
#> B -0.09511994  1.00000000  0.1022576  0.2381745  0.3780232
#> C -0.12830211  0.10225762  1.0000000 -0.1523651 -0.3603780
#> D  0.12437205  0.23817455 -0.1523651  1.0000000 -0.1289523
#> E -0.23285513  0.37802315 -0.3603780 -0.1289523  1.0000000


# Two matrices
mat1 <- matrix(rnorm(50), ncol = 5)
mat2 <- matrix(rnorm(50), ncol = 5)
recor(mat1, mat2) # 5x5 cross-correlation matrix
#>       [,1]         [,2]        [,3]        [,4]        [,5]
#> [1,]  0.0001379295  0.019273397 -0.14776094 -0.01203410  0.14712263
#> [2,] -0.0850363746  0.135125063 -0.10799623  0.35026884  0.20233183
#> [3,] -0.2825948208 -0.020383616 -0.31990514 -0.33267352 -0.48254414
#> [4,]  0.4067584970 -0.008022853  0.08223935  0.02728547  0.37567963
#> [5,]  0.5566966868 -0.059564374  0.03296252  0.22249817 -0.03009148

# data.frame
recor(iris[, 1:4])
#>                Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length    1.0000000  -0.1210250    0.9156110   0.8445397
#> Sepal.Width    -0.1210250   1.0000000   -0.4628225  -0.3909946
#> Petal.Length    0.9156110  -0.4628225    1.0000000   0.9694665
#> Petal.Width     0.8445397  -0.3909946    0.9694665   1.0000000
```

## Theoretical Foundation

### Mathematical Definition
The rearrangement correlation coefficient is based on rearrangement inequality theorems that provide tighter bounds than the Cauchy-Schwarz inequality. Mathematically, for samples $x$ and $y$, it is defined as:

${r^\# }\left( {x,y} \right) = \frac{{{s_{x,y}}}}{{\left| {{s_{{x^ \uparrow },{y^ \updownarrow }}}} \right|}}$

Where:

- ${{s_{x,y}}}$ is the sample covariance between $x$ and $y$
- ${{x^ \uparrow }}$ denotes the increasing rearrangement of $x$
- ${{y^ \updownarrow }}$ denotes either ${y^ \uparrow }$ (increasing rearrangement of $y$) if ${{s_{x,y}}} \ge 0$, or ${y^ \downarrow }$ (decreasing rearrangement of $y$) if ${{s_{x,y}}} < 0$.

### R Implementation
${r^\# }$ can be computed in R as follows:
```r
recor <- function(x, y = NULL) {
    recor_vector <- function(x, y) {
        numerator <- cov(x, y)
        if (numerator >= 0) {
            denominator <- abs(cov(
                sort(x, decreasing = FALSE),
                sort(y, decreasing = FALSE)
            ))
        } else {
            denominator <- abs(cov(
                sort(x, decreasing = FALSE),
                sort(y, decreasing = TRUE)
            ))
        }
        numerator / denominator
    }

    if (is.matrix(x) || is.data.frame(x)) {
        x <- as.matrix(x)
        if (is.null(y)) {
            p <- ncol(x)
            result <- matrix(1, nrow = p, ncol = p)
            rownames(result) <- colnames(result) <- colnames(x)

            for (i in 1:p) {
                for (j in 1:p) {
                    if (i != j) {
                        result[i, j] <- result[j, i] <- recor_vector(x[, i], x[, j])
                    }
                }
            }
            return(result)
        } else if (is.matrix(y) || is.data.frame(y)) {
            y <- as.matrix(y)
            if (nrow(x) != nrow(y)) {
                stop("The number of rows of x and y must be the same")
            }

            p <- ncol(x)
            q <- ncol(y)
            result <- matrix(0, nrow = p, ncol = q)
            rownames(result) <- colnames(x)
            colnames(result) <- colnames(y)

            for (i in 1:p) {
                for (j in 1:q) {
                    result[i, j] <- recor_vector(x[, i], y[, j])
                }
            }
            return(result)
        }
    }

    if (is.null(y)) {
        stop("y is needed when x is a vector")
    }

    if (length(x) != length(y)) {
        stop("x and y must have the same length")
    }

    if (length(x) < 2) {
        stop("x and y must have at least two elements")
    }

    recor_vector(x, y)
}
```
It is to be noted that the above R implementation is for illustrative purposes only. The actual *recor* package employs a highly optimized C++ backend to ensure efficient computation.

### Intuitive Example
Do we need a new monotone measure given that rank-based measures such as Spearman's $\rho$ can already measure monotone dependence? The answser is YES in sense that r# has a higher resolution and is more accurate. To take a simple example, let  $x = (4, 3, 2, 1)$  and

- $y_1 = (5, 4, 3, 2)$
- $y_2 = (5, 4, 3, 3.25)$ 
- $y_3 = (5, 4, 3, 3.50)$
- $y_4 = (5, 4, 3, 3.75)$
- $y_5 = (5, 4, 3, 4.50)$

Obviously, $y_1$ and $x$ behaves exactly in the same way, with their values getting small and small step
by step. The behavior of $y_2, y_3, y_4$ and $y_5$ are becoming more and more different from that of $x$. However, the $\rho$ values are all the same for $y_2, y_3, y_4$. In contrast, the $r^\#$ values can reveal all these differences exactly.
```r
x <- c(4, 3, 2, 1) 

y_list <- list(y1 = c(5, 4, 3, 2.00),
               y2 = c(5, 4, 3, 3.25),
               y3 = c(5, 4, 3, 3.50),
               y4 = c(5, 4, 3, 3.75),
               y5 = c(5, 4, 3, 4.50))

# recor
lapply(y_list, recor, x)
#> $y1
#> [1] 1
#> 
#> $y2
#> [1] 0.9259259
#> 
#> $y3
#> [1] 0.8461538
#> 
#> $y4
#> [1] 0.76
#> 
#> $y5
#> [1] 0.3846154

#cor
lapply(y_list, cor, x, method = "spearman")
#> $y1
#> [1] 1
#> 
#> $y2
#> [1] 0.8
#> 
#> $y3
#> [1] 0.8
#> 
#> $y4
#> [1] 0.8
#> 
#> $y5
#> [1] 0.4
```

## References

Ai, X. (2024). Adjust Pearson's r to Measure Arbitrary Monotone Dependence. In *Advances in Neural Information Processing Systems* (Vol. 37, pp. 37385-37407).

## License

This project is licensed under GPL-3. 

## Support & Feedback

- 📧 **Email Support**: axb@bupt.edu.cn
- 🐛 **Issue Reporting**: [GitHub Issues](https://github.com/byaxb/recor/issues)
- 📚 **Documentation**: [Complete Documentation](https://github.com/byaxb/recor/wiki)

## Citation

If you use this package in your research, please cite our work as: <br>

```bibtex
@inproceedings{NEURIPS2024_41c38a83,
 author = {Ai, Xinbo},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
 pages = {37385--37407},
 publisher = {Curran Associates, Inc.},
 title = {Adjust Pearson\textquotesingle s r to Measure Arbitrary Monotone Dependence},
 url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/41c38a83bd97ba28505b4def82676ba5-Paper-Conference.pdf},
 volume = {37},
 year = {2024}
}
```

---

*recor: Making Correlation Measurement More Accurate*
