% -*- Mode: noweb; noweb-default-code-mode: R-mode; -*-
%\SweaveUTF8
\documentclass[11pt]{article}
\usepackage{graphicx}
\usepackage[utf8]{inputenc}
%% \usepackage{germanU}
%% \usepackage[noae]{Sweave}
\usepackage[a4paper]{geometry}  %% , text={14.5cm,22cm}
\usepackage{color} %uncomment BF
\usepackage{booktabs} % nice tables with \toprule \middlerule \bottomrule
\usepackage{amsmath} % for align
% \usepackage{wasysym} % for promille sign
% \usepackage{amssymb}
% \usepackage[textfont=it,font=small,labelfont=it]{caption}
\interfootnotelinepenalty=10000 % prevent LaTex from two-sided footnotes
\usepackage{relevance-descr}
\usepackage{parskip}
%\VignetteEngine{knitr::knitr}
% \usepackage{knitr}
%\VignetteDepends{relevance}
%\VignetteIndexEntry{'Calculate Relevance and Significance Measures'}
\setlength{\topmargin}{-10mm}
\addtolength{\textheight}{30mm}
\addtolength{\oddsidemargin}{-5mm}
\addtolength{\textwidth}{15mm}%%--- 15.0 + 1.5 = 16.5

%% ================================================================

\begin{document}
%% \bibliography{regrbib}
%% not for knitr:
%% \SweaveOpts{concordance=TRUE,width=9,height=6, echo=false}
\setkeys{Gin}{width=0.95\textwidth}
\baselineskip 15pt

\title{\vspace*{-10mm}
Package \T{relevance} for calculating Relevance and Significance Measures
as well as Success of Replication}
\author{Werner A. Stahel, ETH Zurich}
\maketitle

\begin{abstract}\noindent
  Relevance and significance measures are characteristics of statistical
  results that lead to an informative inference. 
  The relevance measure is based on the specification of a threshold of 
  relevance and indicates whether a result is to be called 
  (scientifically) relevant, negligible, or ambiguous.
  
  The package \T{relevance} calculates these measures for a simple
  comparison of two samples as well as for many regression models 
  and provides a suitable prinitng method and a plotting method for the
  terms of a model.
  
  Reproducibility is a basic requirement in empirical science.
  Replication studies have become popular, with differing ideas on how to
  measure their success to reproduce the results of the original study.
  The package implements measures based on the relevance measure.

\end{abstract}

<<preliminary, echo=F, message=F>>=
## library(plgraphics, lib.loc="/u/stahel/R/regdevelop/pkg/plgraphics.Rcheck")
library(relevance) ##, lib.loc="/u/stahel/R/regdevelop/pkg/relevance.Rcheck")
## options(warn=1)
@ 
\section{Introduction to the relevance measure}

This package implements the concepts of relevance and significance as
introduced by Stahel (2021). 
They allow for meaningful statistical inference beyond the questionable 
common practice of Null Hypothesis Significance Testing that is in turn
often reduced to citing a p-value.

\Tit{The problem.}
Consider the problem of estimating an \emph{effect}, for example,
a mean (an expected value), a difference of means between two samples, 
or a regression coefficient.

\Tit{The Zero Hypothesis Testing Paradox.}
In common practice, statistical inference is reduced to testing whether
the effect might be zero, and the respective p-value is provided as the
result. This has been widely criticized as being too simple an answer.
In fact, it relates to a question that is not scientifically
meaningful as seen by the ``Zero Hypothesis Testing Paradox'':
When a study is undertaken to find a difference between samples or some 
influence between variables, the
\emph{true} effect---e.g., the difference between the expected values of
two samples---will never be precisely zero. 
Therefore, the strawman hypothesis of zero true effect could
in almost all reasonable applications be rejected if one had the
patience and resources to obtain enough observations.
Thus, the question that is answered mutates to:
``Did we produce sufficiently many observations to prove
the (alternative) hypothesis that was true on an apriori basis?''
This does not seem to be a fascinating task.

\Vneed{30mm}
\Tit{Relevance.}
The scientifically meaningful question is whether the effect is
\emph{relevant}, and this needs the specification of a 
\emph{relevance threshold} $\zeta$.
The \emph{relevance measure} is defined as the ratio of the 
effect $\wh\eff$ and the threshold,
\[
  \Rl{} = \eff\big/\zeta
  \;.
\]
It is thus a parameter of the model. It is estimated by plugging in the
estimated effect, $\wh\eff$,
\[
  \Rl e=\wh\eff/\zeta
  \;,
\]
and a confidence interval is obtained in the
same manner from the confidence interval for the effect parameter.
Its limits are called

\Quad  $\Rl s$, ``secured relevance'': the lower end;
\\
\Quad  $\Rl p$, ``potential relevance'': the upper end.

\Tit{Significance.}
Let us return to the problem of testing a null hypothesis,
and even to the case of testing $\eff=0$.
The common way to express the result is to provide the p-value.
However, this measure is more difficult to interpret than needed.
We have been trained to compare it to the ``level'' of $5\%$ and
celebrate if it is \emph{below}. It is thus a measure of lack of
significance, and the desired range is just $0\le p\le 0.05$.
We also developed the skill of judging the values in this range
as to ``how significant'' the result is.

In ``ancient'' times, before the computer produced p-values readily,
statisticians examined the test statistics and then compared them to
corresponding ``critical values.'' In the widespread
case that the t test was concerned, they used the t statistic as an
informal quantitative measure of significance of an effect by comparing it 
to the number 2, which is approximately the critical value
for moderate to large numbers of degrees of freedom.

The significance measure $\Sg0$ picks up this idea, but standardizes
with the actual critical value,
\[
  \Sg0 = \wh\eff \big/ (q\;\mbox{se})
\;,
\]
where $\mbox{se}$ is the standard error of $\wh\eff$ and $q$ is the
appropriate quantile.
Then, the test rejects the null hypothesis $\eff=0$ whenever $|\Sg0|>1$,
and $\Sg0$ is proportional to the estimated effect.
It is thus interpretable in a quantitative way as a measure of significance
without special training.

\Tit{Regression models.}
In regression, there are different ways to characterize the
relevance of the individual terms of the model.
Firstly, for scalar predictors, the coefficient is the obvious
effect to examine. 
An alternative is the effect of dropping the predictor from the model, 
which also reflects its collinearity with the other predictors and
generalizes to the case where the predictor is a factor 
(or another term with more than one degree of freedom),
thus also encompassing \emph{analysis of variance.}
A third aspect is the relevance of the term for prediction of the 
target variable.
For details, see Stahel (2021).

\Tit{Choice of Relevance Thresholds.}
As noted above, the new relevance measure presupposes the choice of a 
relevance threshold. 
Ideally, this threshold is determined for each scientific question
on the basis of specific knowledge about the phenomenon that is modeled.
Since this is a severe burden, Stahel (2021) proposes some conventions 
for most common statistical models that may be used as a standard, 
like the testing level of $5\%$ is for classical null hypothesis testing.
(Note that the latter choice also affects the relevance measures 
$\Rl s$ and $\Rl p$.)

The convention includes, as a first step, to determine an appropriate
``effect scale'' for the model at hand,
and then setting a relevance threshold for it.
Table~\ref{tab:recommendation}, taken from Stahel (2021) collects the
proposed effect sizes and thresholds.
The symbol $\%\ell$ indicates that the threshold refers to a log scale.
For small effects on the log scale, these transform to the respective
percentages in the original scale.

\begin{table}[hh]
  \caption{Models, recommended effect scales and relevance thresholds
  \hfill\break} %% ???
  \label{tab:recommendation}
  \centering
  \begin{tabular}{|l|c|c|c|c|}
    Problem&Rl tyoe&Basic model
      &Effect $\eff=g\fn{\theta}$&Rel.\ thresh.\ $\zeta$ \Hline{&&&&}
    One, or two &stand& $\N\fn{\mu,\sigma^2}$ & $\mu/\sigma$ & $10\,\%$ \\[-2pt]
    \Quad paired samples &&&&\Hline{&&&&} %\\[3pt]
    Two independent &stand& $\N\fn{\mu_k,\sigma^2}$ & $d=(\mu_1-\mu_0)/\sigma$
                    & $20\,\%$ \\
    \Quad samples&&&$\eff=d/2$& $10\,\%$ \Hline{&&&&} %\\[5pt]
    %% (\mu_1-\mu_0)\sqrt{\nu_0\nu_1}/\sigma
    Regression  && $Y_i=\alpha+\vc x_i\tr\vc\beta+\eps_i$ &&\\
     && $\eps_i\sim\N\fn{0,\sigma^2}$ &&\\[-8pt]
    \Quad coefficient effect &coef&  & $\beta_j \delta_j\big/\sigma$&$10\,\%$ \\
    \Quad drop effect &drop&  & $\eta_J$&$10\,\%$ \\
    \Quad prediction effect &pred&&$-\oneover2\log\fn{1-R^2}$ &$0.5\,\%\ell$ or
                                                          $5\,\%\ell$ \Hline{&&&&} %\\[5pt]
    Relative Difference&rel
           & $\log\fn Y\sim
             \N\fn{\mu_k,\sigma^2}$ & $\log\fn{\mu_1/\mu_0}$
                                 &$10\,\%\ell$\Hline{&&&&}
    Proportion  &prop& $\Bin\fn{n, p}$ &  
           $\log\fn{p/(1-p)}$ & $10\,\%\ell$\Hline{&&&&}
%%-     Simple regression  & $Y_i=\alpha+\beta x_i+E$ & \\
%%-            & $ E\sim\N\fn{0,\sigma^2}$
%%-       & $\beta \sqrt{\MS X}/\sigma$&$10\,\%$ \\[5pt]
    Logistic regression %, coefficients
           &prop& $\logit\fn{P\fn{Y_i=1}} = \alpha+\vc x_i\tr\vc\beta$
      & $\vc\beta_j s_j$ & $10\,\%\ell$\Hline{&&&&} %\\[8pt] 
    Correlation&corr
           & $\vc Y\sim
             \N_2\fn{\vc\mu, \Sig}$ &&\\
           && $\rho=\Sig_{12}/\sqrt{\Sig_{11}\Sig_{22}}$
           & $\oneover2 \log\fn{\frac{1+\rho}{1-\rho}}$&$10\,\%\ell$
      %% &better: $-\oneover2\log\fn{1-\rho^2}$&$0.5\,\%\ell$ or $5\,\%\ell$\\[5pt]
    \\[5pt]\hline
  \end{tabular}
\end{table}

In the package, the thresholds used by default are given by 
<<rlvthres>>=
getOption("rlv.threshold")
@ 
and can be modified by setting these options again, see below.

\Tit{A classification of results.}
Based on the relevance measure and its confidence interval -- 
or on the confidence interval for the original parameter and its 
position relative to the threshold -- the result can be classified into the
following cases:
\begin{itemize}
\item   \T{"Rlv"}, if the effect is statistically proven to be
  larger than the threshold, that is, if the confidence interval
  for the parameter lies above the threshold, implying 
  $\wh\eff>\zeta$ or $\Rl s>1$
\item
  \T{"Amb"} if the confidence interval contains the threshold
  and thus $\Rl s<1\le\Rl p$, 
\item
  \T{"Ngl"} if the interval only covers values
  lower than the threshold, but contains \T{0}, 
  $0\le\Rl p<1$,  
  and
  \item
  \T{"Ctr"} if the interval only contains negative values,
  $\Rl p<0$.
\end{itemize}

%% =======================================================================
\section{Replication}
\label{sec:replication}
Replication is a basic concept of science.
Any empirical result should be tested by repeating the study in an
independent situation and examining if the results lead to the same
conclusion again. 
The independent repetition is called a ``replication'' and provides new
data that can be similar to the data of the ``original study'' or quite
different. 

The section describes how to apply the relevance measure to the problem
of comparing the replication with the original.
It can be skipped if this problem is not in the focus.

The conclusion that the replication has been successful, contradicting the
original, or the result is ambugious is based on two aspects:
\begin{itemize}
\item[A.]
  The conclusions from the replication results may be the same as those
  obtained in the original study.
\item[B.]
  The data from the replication may be ``compatible'' with those of the
  original study.  
\end{itemize}
The first criterion relies on a statistical inference from the replication
data, whereas the second is based on an inference about the difference
between the two studies.
The concept of relevance is essential for both of these aspects.
%\cite{StaW22} 
Stahel (2022) describes the approach in detail.

In essence, the conclusion of the comparison, based on both aspects,
can again be expressed by a classification.
Let $\IEff$ be the confidence interval from the replication and
$\IEDS$, the confidence interval for the difference between studies.
Then, the result, assuming a positive original effect, is a 
\begin{itemize}
\item[(Cnf)] 
  \Term{Confirmation,} if $\IEff$ only contains relevant
  values (case Rlv), and
  the negative standardized effect difference EDS is small 
  (cases Ngl or Amb);
  if $\IEff$ is only significant (Amb.Sig) and
  the estimate $\wh\eff_1$ is larger than the relevance threshold,
  we call it a \Term{weak confirmation} (CnfW),
\item[(Att)]
  \Term{Attenuation,} if $\IEff$ lies on the same side of 0
  as in the original study (Rlv or Amb.Sig) and $\IEDS$ is relevant (Rlv),
%%-   If the estimated EDS is clearly negative (Rlv), one could call %%  
%%-   it a \Term{clear attenuation}---even if $\IEff$ turns out relevant (Rlv).
\item[(Enh)]
  \Term{Enhancement,} if the replication suggests a clearly
  stronger effect, that is, case (Rlv) for $\IEff$
  and significantly positive EDS (Ctr);
  this will be rare,  
\item[(Amb)]
  \Term{Ambiguous,} if $\IEff$ covers the relevance threshold and
  it also covers zero (Amb) or the estimate $\wh\eff_1$ is below the
  reference threshold,
%%-   and EDS is high, or
%%-   if $\IEff$ covers 0, regardless of the EDS,
\item[(Anh)]
  \Term{Annihilation,} if $\IEff$ covers only irrelevant values (Ngl), 
\item[(Ctr)]
  \Term{Contradiction,} if all values of $\IEff$ have the
  opposite sign (Ctr),
\item[(Drp)]
  \Term{Dropout,} if the replication failed to mimik
  the experimental or observational setup.
\end{itemize}
These cases are collected in Table \ref{tab:replclass}.

\begin{table}[h]
  \centering
  \begin{tabular}{r|ccc}
    \multicolumn{1}{r||}{Effect estimate}
    &\multicolumn{3}{c}{Effect Difference (standardized), $\IEDS$}\\
    \multicolumn{1}{r||}{ $\IEff$ in replication}
    & relevant, Rlv & Amb or Ngl & contradicting, Ctr \\\hline
    relevant, Rlv  & attenuation, Att & confirmation, Cnf & enhancement, Enh \\
    significant, Sig& attenuation, Att & weak conf., CnfW$^*$ & ---  \\
    ambiguous, Amb & ambiguous, Amb & ambiguous, Amb & --- \\
    negligible, Ngl& annihilation, Anh& annihilation, Anh$^{**}$ &  --- \\
    contradicting, Ctr& contradiction, Ctr&  --- & --- 
  \end{tabular}
  \caption{\label{tab:classification}Classification
    of results of a replication of a relevant effect, based on the
    classificaion of the confidence interval $\IEff$ for the effect
    in the replication and the confidence interval $\IEDS$ of the EDS.
    It is assumed that the original effect was relevant or at least significant.
    Then, the cases marked --- cannot occur.
    $^*$ This conclusion also requires $\Rl e\ge1$; otherwise, it counts as
    ambiguous. \
    $^{**}$ This cannot occur if the original effect was relevant.
  }
\end{table}

%% =======================================================================
\section{Functions}
\label{sec:functions}

\subsection{Function \T{twosamples}}
(\T{onesample} is a synonym.)
This function provides inference for the comparison of two samples, paired
or unpaired, and also for a single sample.
Its call mimics \T{t.test}.

<<twosamples>>=
  t.test(sleep[sleep$group == 1, "extra"], sleep[sleep$group == 2, "extra"])
( r.sleep <- 
    twosamples(sleep[sleep$group == 1, "extra"], sleep[sleep$group == 2, "extra"])
)
@ 
The output shows the estimated effect and its confidence interval together
with the relevance measures. 
The estimated relevance $\Rl e$ compares the standardized effect
$\wb X/S =$\Sexpr{r.sleep["effect"]}/\Sexpr{r.sleep["se"]*sqrt(18)},
where $S$ is the estimated standard deviation of the observations,
to its relevance threshold $0.1$.
The classical results, t test statistic, standard error and p value 
are also calculated, but not shown with the default printing options.
They can also be obtained, as well as the significance \T{Sig0},
by changing options (for details, see Section \ref{sec:options}),
<<sleep2>>=
t.oldopt <- options(show.inference = "classical")
r.sleep
options(t.oldopt)  ##  restore the old options
@ 
The function also calculates inference about the mean of a single sample.
It accepts the formula version of arguments: The statement
\T{twosamples(extra~group, data=sleep)} yields the same results as the 
more complicated call above.
It also compares two samples of binary data, resulting in inference 
based on Fisher's test. A single sample leads to binomial inference. 
See the Examples section below.

%% =========================================================
\subsection{Function \T{correlation}}
Inference about a correlation coefficient is produced by the function
\T{correlation}. It is based on \T{cor.test} from the \T{stats} package
and thus allows for choosing Spearman's nonparametric correlation.
<<correlation>>=
correlation(iris[1:50,1:2], method="spearman")
@
%% =========================================================
\subsection{Function \T{termtable}}
For regression models with a linear predictor, the basic function is
\T{termtable}, which is applied to a model fit object.
For each term reflecting a scalar predictor, its result contains the 
ordinary and standardized coefficient, their confidence intervals,
significance against 0, p-value, and relevances.
For all types of terms, with one or more degrees of freedom, it adds
the relevances for dropping the term and 
for its contribution to prediction.

Since this leads to 22 columns, the print method selects columns
according to \Hneed{50mm} \T{getOption("show.inference")}.

<<termtable>>=
  data(swiss, package="datasets")
  rr <- lm(Fertility ~ . , data = swiss)
  rt <- termtable(rr)
  rt
  names(rt) ## The result of termtable has 24 columns
  ## The following statements are commented out to avoid excessive output
  ##   str(rt)  
  ##   data.frame(rt) ## or  print(rt, show="all")
    ## This avoids selection and preparation of columns by 'print.inference'. 
@ 
Again, other results can be selected using options.
<<<termtablePrint>>=
  t.oldopt <- options(show.inference = "classical")
  rt
  options(t.oldopt)  ##  restore the old options
@ 

\Tit{Plot.}
\T{inference} objects relate to a specific plotting method that shows 
the confidence interval(s) on the relevance scale. Here is the example.
<<plotInference, fig.height=4, fig.width=9>>=
plot(rt)
@

\subsection{Function \T{termeffects}}
For terms with more than one degree of freedom, notably for factors with
more than two levels,
the function \T{termeffects} calculates effects of levels and 
respective inference measures. 
As seen here, there are \T{print} and \T{plot} methods for the resulting objects.
<<termeffects, fig.width=9, fig.height=6>>=
  data(d.blast)
  r.blast <-
    lm(log10(tremor)~location+log10(distance)+log10(charge), 
       data=d.blast)
  ( rte <- termeffects(r.blast) )

  plot(termeffects(r.blast))  ## plot effects for terms with >1 df
@ 

\subsection{Function \T{inference}}
This function generates statistics describing relevance and significance 
for several situations, mainly for regression models. 
When it is applied to a model fit object, it calls \T{termtable} 
and \T{termeffects} and stores the \T{summary} of the object.
The corresponding printing method includes a final part that describes 
the global aspects of the model as shown here.
<<inference>>=
  ( rr <- inference(r.blast) )
@
\T{\!\!inference} also applies to other situations where an estimate, 
its standard error and the number of observations is available.

\subsection{Function \T{replication}}
This function generates inference for a replication study based on its
inference results for the original and replication studies and 
the inference on the difference between the two, 
as described in Section \ref{sec:replication}.

The function produces an object of class \T{inference} and 
\T{replication}.
There are \T{print} and \T{plot} methods for this class.

Here is a series of examples taken from the classical set of replications
reported by the ``Open Science Collaboration'' in 2015:
Open Science Collaboration (2015). 
``Estimating the reproducibility of psychological science.''
Science 349, 943-952.
The 10 studies implying one sample or paired samples test were selected.
<<replication>>=
data(d.osc15Onesample)
to <- structure(d.osc15Onesample[,c("effecto","teststatistico","no")],
      names=c("effect","teststatistic","n"))
tr <- structure(d.osc15Onesample[,c("effectr","teststatisticr","nr")],
      names=c("effect","teststatistic","n"))
( rr <- replication(to, tr, rlv.threshold=0.1) )
plot(rr)
@ 

This plot shows the inference about the difference between studies.
The following shows the results for the original and the replication study next to each other.
<<replication.plgroups>>=
plot(attr(rr, "estimate"), refline=c(0,1),
     label2=attr(rr, "rplclass"), xlab="relevance")
@

\subsection{Generally useful functions}
The package includes several functions that are not directly related to 
relevance or significance, see their help pages for details and examples.

\T{\!showd} allows for inspecting a data.frame or vector in a 
brief informative way.
<<show>>=
  showd(d.blast)
@
\T{\!\!logst} is a version of a "started log" that copes with zeros and 
even with negative values in a suitable way, and 
\T{asinp} implements an appropriate transformation for percentages.

Functions that apply to data with missing values (\T{NA}s) are
\T{sumNA, dropNA, replaceNA} and \T{formatNA}.

%% =====================================================================
\section{Options}
\label{sec:options}

The package works with some specific options, see \T{?relevance.options}.
The more important ones are the following. 
\begin{itemize}
\item \T{rlv.threshold}:
vector of relevance thresholds for 
  \begin{itemize}
  \item \T{rel}: a relative effect, that is, a change in a prameter 
    expressed as a percentage of the parameter, %% !!! log scale?
  \item \T{stand}: an effect standardized by a standard deviation, 
    like Cohen's d for two samples,
  \item \T{prop}: a proportion, expressed in logit units,
  \item \T{corr}: a correlation coefficient,
  \item \T{coef}: a coefficient in the linear predictor of a regression model,
  \item \T{drop}: the effect of dropping a term from a regression model,
  \item \T{pred}: the effect of a term on the prediction accuracy.
  \end{itemize}
\item \T{show.inference}: selects the inference items to be presented by the 
  \T{print} methods.
  Currently, three styles are implemented:
  \begin{itemize}
  \item \T{relevance}: selects the columns determined by
    \T{getOption("show.simple.relevance")}, \linebreak
    \T{getOption("show.terms.relevance")}
    and \T{getOption("show.termeffects.relevance")}, for the three 
    print methods (see below), repectively; 
    these are the important columns for inference based on relevance;
  \item \T{classical} and \T{test}: these select 
    \T{getOption("show.?.classical")}, 
    in the same manner, suitable for inference based on p values or
    significance, respectively.
  \end{itemize}
  The choice of any elements of the vector resulting from a call of
  \T{towsamples} or any columns of a \T{termtable} object is achieved by 
  typing, e.g.,\\
  \T{options(show.inference=c("classical","Sig0","Rls"))}.
\item \T{rlv.symbols} and \T{p.symbols}:
  symbols to be used for characterizing \T{Rls} or p-values, respectively,
\item \T{digits.reduced}: digits used for relevance and significance measures 
  and test statistics. These numbers are rounded to \T{digits.reduced} decimals,
  \T{p-values} to one more.
\item  \T{na.print}: symbol to print \T{NA} values.
\end{itemize}
The package's defaults can always be restored by typing
\T{options(relevance.options)}

Here is an example of choosing more detailed ouput for termeffects.
<<getOption>>=
  t.opt <- options(show.terms.relevance=c("coef", "dropRls", "dropRls.symbol"))
  rt
  
## restore the old options
  options(t.opt) ## the former options
  options(relevance.options) ## restore the package's defaults
@ 

\subsection{Function \T{print}} 
These options are used when calling the \T{print} methods on the objects 
produced by the functions in Section \ref{sec:functions}.
These objects are either of class \T{inference} or \T{termeffects}.
The methods \T{print.inference} and \T{print.termeffects}
accept an argument \T{show} that acts as if the corresponding 
printing options had been changed. Thus,\\
\T{print(rt, show=c("coef", "dropRls", "dropRls.symbol"))}
leads to the output shown above.

The printing methods convert their first argument into 
printable form by producing an object of
class \T{printInference}. They terminate by calling the method
\T{print.printInference}, which in turn produces the output---unless 
\T{print=FALSE} is set.
This two-step procedure allows for editing the output in the following
manner: 
<<printlist>>=
rpr <- print(termeffects(r.blast), print=FALSE)
attr(rpr, "head") <- sub("lm", "Linear Regression", attr(rpr, "head"))
rpr
@ 


%% ============================================================================
\section{Examples}
Here, we document the examples that appear in the basic reference.

\subsection{sleep data}

<<sleep>>=
data(sleep)
dd <- subset(sleep, group==2)
onesample(60*dd$extra, rlv.threshold=60, standardize=FALSE)
@

\subsection{Anchoring}
The experiment is described as follows:
“Students were asked to guesstimate the height of
Mount Everest. One group was ‘anchored’ by telling them that it was more than 2000 feet, the
other group was told that it was less than 45,500 feet. The hypothesis was that respondents
would be influenced by their ‘anchor,’ such that the first group would produce smaller numbers than the second”. The true height is 
29,029 feet.

The inference about the difference between the groups and a graphical display are obtained as follows.
<<anchoring, fig.height=4, fig.width=9>>=
data(d.everest)
rr <- twosamples(log(y)~g, data=d.everest, var.equal=TRUE)
print(rr, show="classical")
rr

pltwosamples(log(y)~g, data=d.everest)
@

\subsection{Blasting}
When digging a tunnel in a populated area, it is important to make sure that the blasting does not damage nearby buildings. 
To this end, the tremor caused by the blastings is measured in the basement of such houses, along with the distance and the charge used,
and a model is used to predict the resulting tremor.
The dataset \T{d.blast} contains such data for a freeway tunnel beneath a Swiss city. 
The logarithmic tremor is modelled as a linear function of the logarithmic distance and charge,
an additive adjustment to the house where the measurements are taken (factor \T{location}).
For the example in the paper, a subset is used, and \T{time}, a rescaled calendar day, is appended. 
%% We do not show plots here since they have been displayed above, using the whole dataset.
<<blast, fig.height=3>>=
dd <- d.blast[seq(1,388,3),]
dd <- na.omit(dd[dd$location %in% paste("loc",c(1,2,4),sep=""),])
dd$time <- as.numeric(dd$date-min(dd$date))/365

rlm <- lm(log10(tremor)~location+log10(distance)+log10(charge)+time, data=dd,
          contrasts=list(location="contr.sum"))
( rt <- termtable(rlm) )
plot(rt)
@

%% ============================================================================
\subsection*{References}

Stahel, Werner A. (2021). 
\textit{New relevance and significance measures to replace p-values}. 
PLOS ONE, June 16, 2021, doi.org/10.1371/journal.pone.0252991

Stahel, Werner A. (2022). 
\textit{Replicability: Terminology, Measuring Success, and Strategy}.
Available in the documentation.

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: 
