---
title: "Getting Started with rcdf"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{rcdf}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## What is RCDF?

Think of an RCDF file as a **locked cabinet** for your data. You can store multiple datasets inside it, and the cabinet can only be opened with the right key. When you share the file with a colleague, they cannot see the contents unless you also give them a copy of the key.

Under the hood, an RCDF file is a single portable archive (`.rcdf`) that holds your datasets as compressed, encrypted files along with a small metadata record. The encryption happens automatically — you never have to configure it yourself.

---

## Setup

```{r, eval=FALSE}
library(rcdf)
```

The package ships with a sample RCDF file and matching keys so you can try everything without creating your own files first:

```{r, eval=FALSE}
sample_dir  <- system.file("extdata", package = "rcdf")
sample_rcdf <- file.path(sample_dir, "mtcars.rcdf")
sample_key  <- file.path(sample_dir, "sample-private-key.pem")
```

---

## Step 1 — Create your encryption keys

Before saving data, you need a **key pair** — two files that work together:

| File | What it does | Who should keep it |
|---|---|---|
| `public-key.pem` | Locks (encrypts) your data | Safe to share |
| `private-key.pem` | Unlocks (decrypts) your data | Keep this secret |

Generate a new pair with one function call:

```{r, eval=FALSE}
pub_key <- generate_rsa_keys(
  path     = "~/my-keys",         # folder where both files are saved
  password = "a-strong-password"  # optional, but protects the private key
)
# pub_key holds the path to the public key — you will pass it to write_rcdf()
```

> **Important:** Your private key is the only way to open your files later. Back it up and store it securely. A password manager or encrypted drive is a good choice.

---

## Step 2 — Save your data

Organise your datasets into a named list, then call `write_rcdf()`:

```{r, eval=FALSE}
# Create a container for your tables
my_data <- rcdf_list()

# Add any data frames you want to store together
my_data$households <- data.frame(id = 1:100, region = sample(c("North", "South"), 100, TRUE))
my_data$survey     <- data.frame(id = 1:100, score  = rnorm(100))

# Save everything as a single encrypted file
write_rcdf(
  data    = my_data,
  path    = "~/data/survey-2024.rcdf",
  pub_key = pub_key              # path returned by generate_rsa_keys()
)
```

This creates `survey-2024.rcdf` — a single, portable file containing all your tables, fully encrypted and ready to share.

---

## Step 3 — Read the data back

To open an RCDF file, provide the matching private key:

```{r, eval=FALSE}
survey_data <- read_rcdf(
  path           = "~/data/survey-2024.rcdf",
  decryption_key = "~/my-keys/private-key.pem",
  password       = "a-strong-password"   # only needed if you set one
)

# Each table comes back as a data frame inside the list
head(survey_data$households)
head(survey_data$survey)
```

Try it right now using the bundled sample data:

```{r, eval=FALSE}
sample_data <- read_rcdf(
  path           = sample_rcdf,
  decryption_key = file.path(sample_dir, "sample-private-key-pw.pem"),
  password       = "1234"
)

head(sample_data$mtcars)
```

### Reading a whole folder at once

If you have many RCDF files in one folder, load them all in one call. Tables that share a name across files are automatically stacked into a single table:

```{r, eval=FALSE}
all_data <- read_rcdf(
  path           = "~/data/monthly-exports/",
  decryption_key = "~/my-keys/private-key.pem",
  password       = "a-strong-password",
  recursive      = FALSE   # set TRUE to also search sub-folders
)
```

---

## Step 4 — Export to other formats

Once you have the data in R, export it to whatever format your team needs:

```{r, eval=FALSE}
write_rcdf_as(
  data    = survey_data,
  path    = "~/exports/survey-2024",
  formats = c("csv", "xlsx")   # create both at once
)
# Result:
#   ~/exports/survey-2024/CSV/households.csv
#   ~/exports/survey-2024/CSV/survey.csv
#   ~/exports/survey-2024/Excel/households.xlsx
#   ~/exports/survey-2024/Excel/survey.xlsx
```

**All supported formats:**

| Format | Argument to `formats` | Output |
|---|---|---|
| CSV | `"csv"` | one `.csv` per table |
| TSV | `"tsv"` | one `.txt` per table |
| JSON | `"json"` | one `.json` per table |
| Parquet | `"parquet"` | one `.parquet` per table |
| Excel | `"xlsx"` | one `.xlsx` per table |
| Stata | `"dta"` | one `.dta` per table |
| SPSS | `"sav"` | one `.sav` per table |
| SQLite | `"sqlite"` | one `.db` database with all tables |

You can also call the individual functions directly (e.g. `write_rcdf_csv()`, `write_rcdf_xlsx()`) if you need more control. Run `?write_rcdf_as` to see all options.

---

## Checking a file without decrypting it

You can inspect when an RCDF file was created, which package version wrote it, and integrity checksums — all without the private key:

```{r, eval=FALSE}
meta <- get_attrs(sample_rcdf)
meta$created_at   # when the file was created
meta$version      # package version used to create it
meta$checksum     # per-table checksums for integrity verification
```

---

## Next steps

- **Working with multiple files** — see `vignette("merging-rcdf")` for loading or combining RCDF files from different sources.
- **Full function reference** — run `?write_rcdf`, `?read_rcdf`, or `?write_rcdf_as` in the R console for all available options.
