-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
175 lines (114 loc) · 9.79 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
---
output: github_document
bibliography: inst/REFERENCES.bib
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
```{r packages, echo = FALSE, include=FALSE}
# packages
library(magrittr)
library(stringr)
library(purrr)
```
```{r metadata-manuscript, echo = FALSE, include=FALSE}
# Manuscript metadata
manuscript1_yaml <- rmarkdown::yaml_front_matter("analysis/paper/001-paper-main.Rmd")
manuscript2_yaml <- rmarkdown::yaml_front_matter("analysis/paper/002-reply-main.Rmd")
## info to display
# manuscript1
manuscript1 <- list()
manuscript1$authors <-
manuscript1_yaml$author %>%
purrr::map_chr(function(x) paste0(x$given_name, " ", x$surname)) %>%
knitr::combine_words()
manuscript1$year <-
paste0("(", eval(parse(text = stringr::str_extract(manuscript1_yaml$date, pattern = "^\\d+"))), "):")
manuscript1$title <-
manuscript1_yaml$title %>%
paste0(".")
manuscript1$journal <-
eval(parse(text = manuscript1_yaml$journal %>% stringr::str_remove("^`r ") %>% stringr::str_remove("`$"))) %>%
names()
manuscript1$doi <-
manuscript1_yaml$doi %>%
paste0(".")
# manuscript2
manuscript2 <- list()
manuscript2$authors <- "Henning Teickner and Klaus-Holger Knorr"
manuscript2$year <-
paste0("(", eval(parse(text = stringr::str_extract(manuscript2_yaml$date, pattern = "^\\d+"))), "):")
manuscript2$title <-
manuscript2_yaml$title %>%
paste0("_", ., "._")
manuscript2$journal <-
eval(parse(text = manuscript2_yaml$journal %>% stringr::str_remove("^`r ") %>% stringr::str_remove("`$"))) %>%
names()
manuscript2$doi <-
manuscript2_yaml$doi %>%
paste0(".")
```
[](https://zenodo.org/badge/latestdoi/465653387) )`-brightgreen.svg)
# hklmirs
This repository contains the data and code for our two manuscripts:
> Henning Teickner, and Klaus-Holger Knorr (2022): _Improving Models to Predict Holocellulose and Klason Lignin Contents for Peat Soil Organic Matter with Mid-Infrared Spectra_. SOIL 8 (2): 699–715. DOI: [10.5194/soil-8-699-2022](https://doi.org/10.5194/soil-8-699-2022).
> `r paste(manuscript2$authors, "(2023):", manuscript2$title, "Not peer-reviewed preprint.")`
### How to cite
Please cite this compendium as:
> Henning Teickner and Klaus-Holger Knorr, (`r format(Sys.Date(), "%Y")`). _Compendium of R code and data for "`r manuscript1_yaml$title`" and "`r manuscript2_yaml$title`"_. Accessed `r format(Sys.Date(), "%d %b %Y")`. Online at <https://doi.org/10.5281/zenodo.6325760>
## Contents
The **analysis** directory contains:
- [:file\_folder: paper](/analysis/paper): R Markdown source documents needed to reproduce the manuscript, including figures and tables. The main script is [001-paper-main.Rmd](analysis/paper/001-paper-main.Rmd). This script produces both manuscripts and the corresponding supplementary information. Additional scripts are:
- [002-paper-m-original-models.Rmd](analysis/paper/002-paper-m-original-models.Rmd): Computes the original models used in @Hodgkins.2018 and models with the same model structure, but as Bayesian models.
- [003-paper-m-gaussian-beta.Rmd](analysis/paper/003-paper-m-gaussian-beta.Rmd): Computes models assuming a Beta distribution for holocellulose and Klason lignin contents and compares them to the original models.
- [004-paper-m-reduce-underfitting.Rmd](analysis/paper/004-paper-m-reduce-underfitting.Rmd): Extents the Beta regression models by including additional variables (additional peaks) or using a different approach (using measured spectral intensities of binned spectra instead of extracted peaks), and validates these models using LOO-CV.
- [005-paper-m-minerals.Rmd](analysis/paper/005-paper-m-minerals.Rmd): Uses the models from `003-paper-m-gaussian-beta.Rmd` to test how accurate a model for holocellulose content is which is also calibrated on training samples with higher mineral contents.
- [006-paper-m-prediction-domain.Rmd](analysis/paper/006-paper-m-prediction-domain.Rmd): Analyzes the prediction domain [@Wadoux.2021] of the original models and the modified models and identifies under which conditions models extrapolate for peat and vegetation samples from @Hodgkins.2018.
- [007-paper-m-prediction-differences.Rmd](analysis/paper/007-paper-m-prediction-differences.Rmd): Compares predictions for the training data and the peat and vegetation data from @Hodgkins.2018 for the original models from @Hodgkins.2018 and the modified models from `004-paper-m-reduce-underfitting.Rmd`.
- [008-paper-supplementary.Rmd](analysis/paper/008-paper-supplementary.Rmd): Computes supplementary analyses and figures for the first manuscript.
- [001-reply-main.Rmd](analysis/paper/001-reply-main.Rmd): This is the main script for manuscript 2. It is compiled from within `001-paper-main.Rmd` and produces the supplementary information S1 for manuscript 2.
- [002-reply-main.Rmd](analysis/paper/002-reply-main.Rmd): This script produces the document for manuscript 2. It is compiled from within `001-reply-main.Rmd`.
- [:file\_folder: data](/analysis/data): Data used in the analysis. Note that raw data is not stored in [:file\_folder: raw_data](/analysis/data/raw_data) (empty folder), but in [:file\_folder: /inst/extdata](/inst/extdata). [:file\_folder: derived_data](/analysis/data/derived_data) contains derived data computed from the scripts. The raw data are derived from @Hodgkins.2018.
- [:file\_folder: stan_models](/analysis/stan_models): The Stan model used in `001-reply-main.Rmd`.
The other folders in this directory follow the standard naming scheme and function of folders in R packages. There are the following directories and files:
- `README.md`/`README.Rmd`: Readme for the compendium.
- `DESCRIPTION`: The R package DESCRIPTION file for the compendium.
- `NAMESPACE`: The R package NAMESPACE file for the compendium.
- `LICENSE.md`: Details on the license for the code in the compendium.
- `CONTRIBUTING.md` and `CONDUCT.md`: Files with information on how to contribute to the compendium.
- `Dockerfile`: Dockerfile to build a Docker image for the compendium.
- `.Rbuildignore`, `.gitignore`, `.dockerignore`: Files to ignore during R package building, to ignore by Git, and to ignore while building a Docker image, respectively.
- `renv.lock`: renv lock file (Lists all R package dependencies and versions and can be used to restore the R package library using renv). `renv.lock` was created by running `renv::snapshot()` in the R package directory and it uses the information included in the `DESCRIPTION` file.
- `.Rprofile`: Code to run upon opening the R-project.
- `R`, `man`, `inst`, `data-raw`, `data`, `src`: Default folders for making the R package run.
- Folder `inst/extdata`: Folder with the raw data used for the analyses. All files in this folder are derived from @Hodgkins.2018.
## How to run in your broswer or download and run locally
You can download the compendium as a zip from from these URLs: <https://github.com/henningte/hklmirs/> or <https://doi.org/10.5281/zenodo.6325760>
Or you can install this compendium as an R package, hklmirs, from GitHub with:
```{r installation, eval=FALSE}
remotes::install_github("henningte/hklmirs")
```
## How to use
#### Reproduce the analyses
To reproduce the analyses for the paper, open the RStudio project included in this research compendium and run the Rmarkdown script in `analysis/paper/001-paper-main.rmd`.
Running the whole script takes about 12 hours and occupies additional disk space of ~2 Gb.
Alternatively, the [Dockerfile](Dockerfile) can be used to build a Docker image from which all analyses can be reproduced. The Dockerfile ensures that all required dependencies are installed (e.g. specific R packages; this is managed using the R package [renv](https://rstudio.github.io/renv/articles/renv.html)).
The [Dockerfile](Dockerfile) provides instructions how to build a Docker image from the Dockerfile and how to run the image in a Docker container. It occupies disk space of ~7 Gb.
When the Docker image runs in a container, go to `localhost:8787` in your Browser. You will find an RStudio interface where you can log in with username `rstudio` and password `hkl`. Here you can find the Rmarkdown scripts (`hklmirs/analysis/paper/001-paper-main.rmd`) as described above.
### Licenses
**Text and figures :** [CC-BY-4.0](http://creativecommons.org/licenses/by/4.0/)
**Code :** See the [DESCRIPTION](DESCRIPTION) file
**Data :** [CC-0](http://creativecommons.org/publicdomain/zero/1.0/) attribution requested in reuse. See the sources section for licenses for data derived from external sources and how to give credit to the original author(s) and the source.
### Sources
All files in `inst/extdata` are derived from @Hodgkins.2018. These data are licensed under the [CC-BY 4.0](http://creativecommons.org/licenses/by/4.0/) license (see https://www.nature.com/articles/s41467-018-06050-2#rightslink).
The format of this research compendium is inspired by @Marwick.2018 and was created with rrtools [@Marwick.2019]. The Rmarkdown template for the main article is from the rticles package [@Allaire.2020].
### Contributions
We welcome contributions from everyone. Before you get started, please see our [contributor guidelines](CONTRIBUTING.md). Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.
### Funding
This study was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) grant no. KN 929/23-1 to Klaus-Holger Knorr and grant no. PE 1632/18-1 to Edzer Pebesma. We acknowledge support from the Open Access Publication Fund of the University of Münster.
### References