---
title: "Nonlinear Regression (Archontoulis and Miguez) paper"
author: "Fernando Miguez"
date: "`r Sys.Date()`"
fig_width: 6
fig_height: 4
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Nonlinear Regression (Archontoulis and Miguez) paper}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.width = 7, fig.height = 6)
library(lattice)
library(nlme)
library(ggplot2)
library(nlraa)
```

# Introduction

The **nlraa** is distributed as part of publications that illustrates the
fit of nonlinear regression models. 

## Example

We start by looking at biomass accumulation data from an experiment conducted in
Greece by Danalatos and Archontoulis. 

```{r strsm}
data(sm)
str(sm)
head(sm)
```

The data represents Yield as harvested biomass for three crops: maize
(M), fiber sorghum (F) and sweet sorghum (S).

```{r sm-ggplot, echo = FALSE}
ggplot(data = sm, aes(y = Yield, x = DOY)) +
   facet_grid(. ~ Input) +
   geom_point(aes(fill=Crop, shape=Crop), size=2) +
   scale_shape_manual(values=c(24,21,1)) +
   scale_fill_manual(values = c("grey","black","black")) +
   scale_x_continuous("Day of the Year") +
   scale_y_continuous("Dry biomass (Mg/ha)") +
   theme_bw()
```

Before starting with the model fit we need to manipulate the data by creating an
index which describes the experimental unit (eu).  We also delete the DOY 141
when crops where planted.

```{r create-eu}
sm$eu <- with(sm, factor(Block):factor(Input):factor(Crop))
sm2 <- subset(sm, DOY != 141)
```

The next step is to create the **groupedData** which is a convenient structre
to be used throughout the fitting process in **nlme**.

```{r grouped-data}
smG <- groupedData(Yield ~ DOY | eu, data = sm2)
```

Originally, Danalatos et al. (2009) fitted the beta growth function as
described by Yin et al. (2003). In **nlraa** we provide the **selfStart**
function **SSbgf** to improve the fitting process.

```{r nls-list-sm}
fit.lis <- nlsList(Yield ~ SSbgf(DOY, w.max, t.e, t.m), data = smG)
## But this works better
## Added 2020/1/2
fit.lis.rp <- nlsList(Yield ~ SSbgrp(DOY, w.max, lt.e, ldt), data = smG) 
```

There are three crops, two levels of agronomic input and four blocks
which results in 24 possible combinations. We fitted the model to
these 24 experimental units and obtained apparent convergence in
20 (Note: was only 10 in the original paper, but this improved dramatically when I recomputed the partial derivatives 2020/1/3). Still, this suggests that some modifications are needed.

```{r nls-list-plot, echo = FALSE}
print(plot(fit.lis))
print(plot(intervals(fit.lis)))
```

From the residuals plot we see some evidence of the inadequacy of the
model. In particular the model over predicts at low values. We relax
the convergence criteria to achieve a fitted model.

```{r relax-control}
fit.me <- nlme(fit.lis, control = list(maxIter = 100, msMaxIter = 300, pnlsMaxIter = 20))
```

Despite the message, we do obtain a 'partially' fitted model.

```{r plot-resid-nlme, echo = FALSE}
print(plot(fit.me))
print(plot(augPred(fit.me, level = 0:1)))
```

A modified beta growth function proposed by Yin et. al (2003) -- included in the
errata -- allows for a delayed start of growth by modifying the $t_b$ parameter.

$$ 
y = w_b + (w_{max} - w_b) \left (1 + \frac{t_e - t}{t_e - t_m} \right ) \left (\frac{t - t_b}{t_e - t_b} \right )^\frac{t_e - t_b}{t_e - t_m}
$$

$$
t_b < t_m < t_e 
$$

We include this as **bgf2** but not the selfStart version at this
point. We also fix the $w_b$ and the $t_b$ parameters, so they are not
part of the fitting process. There are good reasons for this: We know
the initial biomass is minimal (seed weight) and we know the day of
planting (it does not need to be optimized).

```{r bgf2}
fit.lis2 <- nlsList(Yield ~ bgf2(DOY, w.max, w.b = 0, t.e, t.m, t.b = 141),
                    data = smG,
                    start = c(w.max = 30, t.e=280, t.m=240))
```

```{r plot-bgf2, echo = FALSE}
print(plot(fit.lis2))
```

The previous figure shows a much lower bias at lower values.

We proceed to fit the non-linear mixed model and then we simplify the
variance-covariance random effects structure.

```{r nlme-update}
fit.me2 <- nlme(fit.lis2)
## Error message, but the next model is the one we care about
fit2.me2 <- update(fit.me2, random = pdDiag(w.max + t.e + t.m ~ 1))
anova(fit.me2, fit2.me2)
## The second model is simpler and it seems to be marginally better than 
## the orginial, but we need to keep in mind that the simpler model
## converges much more easily
```

Some of the covariances might be significant, but we'll look at this later. We
will next include the effects of Crop type and Input in the fixed part of the
model. We want to know how the parameters are affected by the treatment effects.

```{r nlme-update-two}
fe <- fixef(fit2.me2) ## Some starting values with visual help
fit3.me2 <- update(fit2.me2, fixed = list(w.max + t.e + t.m ~ Crop),
                  start = c(fe[1], -10, 20, fe[2], -40, 0, fe[3], -40, 0))
## We next include the Input
fe2 <- fixef(fit3.me2)
fit4.me2 <- update(fit3.me2, fixed = list(w.max + t.e + t.m
                               ~ Crop + Input),
                  start = c(fe2[1:3], 0, fe2[4:6], 0, fe2[7:9], 0))
## and the interaction
fe3 <- fixef(fit4.me2)
fit5.me2 <- update(fit4.me2,
                   fixed = list(w.max + t.e + t.m
                     ~ Crop + Input + Crop:Input),
                  start = c(fe3[1:4], 0, 0,
                            fe3[5:8], 0, 0,
                            fe3[9:12], 0, 0))
```

The current model displays some evidence of unequal variance as shown
in the figure. The amount of dispersion around zero is smaller for low fitted values and the amount for large fitted values is larger.

```{r fit5-plot, echo = FALSE}
print(plot(fit5.me2))
```

We fit two models one where the variance depends on the Crop (since visually the
crops are so different) and another one where it does not depend on the Crop.

```{r fit6-and-fit7}
fit6.me2 <- update(fit5.me2,
                   weights = varPower(form = ~ fitted(.) | Crop))

fit7.me2 <- update(fit6.me2, weights = varPower(form = ~ fitted(.)))

anova(fit6.me2, fit7.me2)
```

Model **fit6.me2** is better according to the AIC criteria and the likelihood
ratio test.

```{r fit6.me2}
fit6.me2
```

Since random effects are almost zero. We remove them from the model and use the
**gnls** function which is specifically written for models without random effects.

```{r gnls}
## Random effects are almost zero
fit8.me2 <- gnls(Yield ~ bgf2(DOY, w.max, t.e, t.m, w.b=0, t.b=141),
                 data = smG,
                 params = list(w.max + t.e + t.m ~ Crop + Input
                                                   + Crop:Input),
                 weights = varPower(form = ~ fitted(.) | Crop),
                 start = fixef(fit7.me2))
anova(fit6.me2, fit8.me2)
```

Model **fit8.me2** is better than **fit6.me2** according to AIC and BIC.

```{r anova-fit8}
anova(fit8.me2)
```

This shows that the Crop, Input and interaction are significant for
all terms except for the **t.m** parameter.

Residuals look good with much less overprediction at lower values. The
autocorrelation does not appear to be a concern (not shown).

```{r plot-fit8}
print(plot(fit8.me2))
```

We finalize the fitting exercise by plotting observed and predicted values.

```{r fit8-fitted}
smG$prds <- fitted(fit8.me2)

doys <- 168:303
ndat <- expand.grid(DOY=doys, Crop= unique(smG$Crop), Input=c(1,2))
ndat$preds <- predict(fit8.me2, newdata = ndat)

## Here I'm just removing prediction for maize that go beyond
## day of the year 270
ndat2 <- ndat
ndat2[ndat2$Crop == "M" & ndat2$DOY > 270,"preds"] <- NA
ndat2 <- na.omit(ndat2)
```

```{r fit8-fitted-plot, echo = FALSE}
 ggplot(data = smG, aes(y = Yield, x = DOY)) +
  facet_grid(. ~ Input) +
   geom_point(aes(fill=Crop, shape=Crop), size=2) +
   geom_line(aes(x = DOY, y = preds, linetype = Crop), data=ndat2) +
   scale_shape_manual(values=c(24,21,1)) +
   scale_fill_manual(values = c("grey","black","black")) +
   scale_x_continuous("Day of the Year") +
   scale_y_continuous("Dry biomass (Mg/ha)") +
   theme_bw()
```