---
title: "gwas2crispr: From GWAS to CRISPR-ready Files"
pagetitle: "gwas2crispr: From GWAS to CRISPR-ready Files"
output: rmarkdown::html_vignette
vignette: >
    %\VignetteIndexEntry{gwas2crispr: From GWAS to CRISPR-ready Files}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE,
  message = FALSE,
  warning = FALSE
)
```

## Overview

`gwas2crispr` prepares genome-wide association study (GWAS) results for downstream clustered regularly interspaced short palindromic repeats (CRISPR) workflows.

The package retrieves significant single-nucleotide polymorphisms (SNPs) for supported GWAS Catalog trait identifiers from the EMBL-EBI GWAS Catalog REST API v2 and returns CRISPR-ready outputs for the GRCh38/hg38 human genome build.

The main outputs are:

* comma-separated values (CSV) tables,
* Browser Extensible Data (BED) files,
* optional FASTA sequence files.

The public argument name `efo_id` is retained for backward compatibility. In gwas2crispr 0.1.5, selected EFO, MONDO, and NCIT identifiers are supported when available through the GWAS Catalog API. HP, Orphanet, and ORPHA identifiers are accepted for compatibility with selected records.

Example accepted formats include `EFO_0001663`, `EFO:0001663`, `MONDO_0007254`, `MONDO:0007254`, `NCIT_C4872`, and `NCIT:C4872`.

## Installation

Install from CRAN:

```{r}
install.packages("gwas2crispr")
```

Optional packages for FASTA output:

```{r}
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install(c(
  "Biostrings",
  "GenomeInfoDb",
  "BSgenome.Hsapiens.UCSC.hg38"
))
```

Development version:

```{r}
if (!requireNamespace("devtools", quietly = TRUE))
  install.packages("devtools")

devtools::install_github("leopard0ly/gwas2crispr")
```

## Fetch GWAS associations

```{r}
library(gwas2crispr)

gwas_data <- fetch_gwas(
  efo_id  = "EFO_0000707",
  p_cut   = 1e-6,
  verbose = FALSE
)

names(gwas_data)
head(gwas_data$associations)
```

Selected non-EFO identifiers use the same argument name when supported by the GWAS Catalog API:

```{r, eval=FALSE}
fetch_gwas(efo_id = "MONDO_0007254", p_cut = 5e-8, verbose = FALSE)
fetch_gwas(efo_id = "NCIT_C4872", p_cut = 5e-8, verbose = FALSE)
```

## Run without writing files

By default, no files are written.

```{r}
res <- run_gwas2crispr(
  efo_id     = "EFO_0000707",
  p_cut      = 1e-6,
  flank_bp   = 300,
  out_prefix = NULL,
  verbose    = FALSE
)

res$summary
head(res$snps_full)
head(res$bed)
```

## Write files safely

To write output files, provide `out_prefix`. In examples, use `tempdir()`.

```{r}
out_prefix <- file.path(tempdir(), "lung")

res <- run_gwas2crispr(
  efo_id     = "EFO_0000707",
  p_cut      = 1e-6,
  flank_bp   = 300,
  out_prefix = out_prefix,
  verbose    = FALSE
)

res$written
```

Expected output paths:

```{r}
paste0(out_prefix, "_snps_full.csv")
paste0(out_prefix, "_snps_hg38.bed")
paste0(out_prefix, "_snps_flank300.fa")
```

The FASTA file is created only when the optional genome packages are available.

## Output structure

```{r}
names(res)
```

Common outputs:

```{r}
res$summary
res$snps_full
res$bed
res$fasta
res$written
```

## Session information

```{r, eval=TRUE}
sessionInfo()
```