--- title: "Parallelisation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Parallelisation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(dineR) ``` _This vignette covers parallel vs sequential execution in the estimation function._ *dineR* supports parallel execution of the ADMM optimisation across the regularisation path via the `cores` argument of the `estimation` function. By default, the function runs sequentially (`cores = 1`). Setting `cores` to a value greater than 1 distributes each lambda value across the specified number of worker processes, which can yield substantial reductions in wall-clock time for medium-to-large problems. ## Switching Between Sequential and Parallel Mode The `cores` argument is the only change required to move between modes. The value must be a positive integer no greater than `parallel::detectCores() - 1`. ```{r, eval = FALSE} # Check how many cores are available on your machine parallel::detectCores() ``` ```{r} # Generate data for the examples below data <- data_generator(n_X = 150, p = 100, seed = 42) X <- data$X Y <- data$Y ``` **Sequential** (default, `cores = 1`): ```{r, eval = FALSE} result_seq <- estimation(X, Y, nlambda = 15, cores = 1) result_seq$elapse # elapsed time in seconds ``` **Parallel** (`cores` set to the desired number of workers): ```{r, eval = FALSE} result_par <- estimation(X, Y, nlambda = 15, cores = 4) result_par$elapse # elapsed time in seconds ``` Both calls return identical results — `cores` affects only computation time, not the estimates or any other output field. ## Performance Benchmarks The table below reports wall-clock times and speed-up factors measured on an Apple M-series machine (8 logical cores) using 4 parallel workers. Each scenario uses the default LASSO loss and no tuning. | Scenario | Observations (n) | Dimensions (p) | Lambda values (nlambda) | Sequential | Parallel (4 cores) | Speed-up | |:---------|:----------------:|:--------------:|:-----------------------:|:----------:|:------------------:|:--------:| | Small | 100 | 50 | 10 | 0.68 s | 0.70 s | 1.0× | | Medium | 150 | 100 | 15 | 3.89 s | 1.76 s | 2.2× | | Large | 200 | 150 | 20 | 22.4 s | 6.98 s | 3.2× | | High-dim | 100 | 200 | 20 | 17.9 s | 5.71 s | 3.1× | | Many-λ | 150 | 100 | 30 | 7.72 s | 2.77 s | 2.8× | The code used to produce these results is provided below for reproducibility: ```{r, eval = FALSE} library(dineR) run_bench <- function(label, n_X, p, nlambda, cores_par) { data <- data_generator(n_X = n_X, p = p, seed = 42) X <- data$X; Y <- data$Y r_seq <- estimation(X, Y, nlambda = nlambda, cores = 1) r_par <- estimation(X, Y, nlambda = nlambda, cores = cores_par) cat(sprintf( "[%s] seq: %.3fs | par(%d cores): %.3fs | speed-up: %.2fx\n", label, r_seq$elapse, cores_par, r_par$elapse, r_seq$elapse / r_par$elapse )) } run_bench("Small", n_X = 100, p = 50, nlambda = 10, cores_par = 4) run_bench("Medium", n_X = 150, p = 100, nlambda = 15, cores_par = 4) run_bench("Large", n_X = 200, p = 150, nlambda = 20, cores_par = 4) run_bench("High-dim", n_X = 100, p = 200, nlambda = 20, cores_par = 4) run_bench("Many-lam", n_X = 150, p = 100, nlambda = 30, cores_par = 4) ``` ## Interpreting the Results Several patterns emerge from the benchmarks: - **Small problems** (`p = 50`, `nlambda = 10`): parallelisation provides no benefit. The overhead of spawning worker processes and distributing work exceeds the per-lambda computation time. Sequential execution is preferred here. - **Medium-to-large problems** (`p ≥ 100` or `nlambda ≥ 15`): consistent 2–3× speed-ups are observed with 4 cores. Gains are driven by both the dimensionality (`p` increases the per-lambda solve time) and the number of lambda values (`nlambda` increases the number of independent tasks that can be distributed). - **Speed-up scales with workload**: the largest absolute savings occur for high-dimensional or fine-grained regularisation paths, where each lambda solve is computationally expensive. ## Choosing the Number of Cores A practical rule of thumb: - Use `cores = 1` when `p < 100` and `nlambda < 15`. - Use `cores = min(nlambda, parallel::detectCores() - 1)` for larger problems, reserving one core for the main R process. ```{r, eval = FALSE} # Recommended cores selection for larger problems n_cores <- min(nlambda, parallel::detectCores() - 1) result <- estimation(X, Y, nlambda = nlambda, cores = n_cores) ``` Note that `estimation` will automatically reduce `cores` to `nlambda` if more cores are requested than there are lambda values, and will warn accordingly.