A modern implementation of the Super Learner algorithm for ensemble learning and model stacking

## What’s sl3?

sl3 is a modern implementation of the Super Learner algorithm of @vdl2007super. The Super Learner algorithm performs ensemble learning in one of two fashions:

1. The “discrete” Super Learner can be used to select the best prediction algorithm among a supplied library of learning algorithms (“learners” in the sl3 nomenclature) – that is, that algorithm which minimizes the cross-validated risk with respect to some appropriate loss function.
2. The “ensemble” Super Learner can be used to assign weights to specified learning algorithms (in a user-supplied library) in order to create a combination of these learners that minimizes the cross-validated risk with respect to an appropriate loss function. This notion of weighted combinations has also been called stacked regression [@breiman1996stacked].

## Installation

Install the most recent stable release from GitHub via devtools:

devtools::install_github("jeremyrcoyle/sl3")

## Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

## Documentation

The best places to start are the vignettes:

## Examples

sl3 makes the process of applying screening algorithms, learning algorithms, combining both types of algorithms into a stacked regression model, and cross-validating this whole process essentially trivial. The best way to understand this is to see the sl3 package in action:

set.seed(49753)

# packages we'll be using
library(data.table)
library(SuperLearner)
#> Super Learner
#> Version: 2.0-22
#> Package created on 2017-07-18
library(origami)
library(sl3)

data(cpp_imputed)

# here are the covariates we are interested in and, of course, the outcome
covars <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
"sexn")
outcome <- "haz"

outcome = outcome, outcome_type="continuous")

# set up screeners and learners via built-in functions and pipelines
slscreener <- make_learner(Lrnr_pkg_SuperLearner_screener, "screen.glmnet")
glm_learner <- make_learner(Lrnr_glm)
screen_and_glm <- make_learner(Pipeline, slscreener, glm_learner)
lrnr_glmnet <- make_learner(Lrnr_glmnet)

# stack learners into a model (including screeners and pipelines)
learner_stack <- make_learner(Stack, lrnr_glmnet, glm_learner, screen_and_glm)
stack_fit <- learner_stack$train(task) #> Loading required package: glmnet #> Loading required package: Matrix #> Loading required package: foreach #> Loaded glmnet 2.0-13 preds <- stack_fit$predict()
#>    Lrnr_glmnet_NULL_deviance_10_1_100   Lrnr_glm
#> 1:                         0.35345519 0.36298498
#> 2:                         0.35345519 0.36298498
#> 3:                         0.24554305 0.25993072
#> 4:                         0.24554305 0.25993072
#> 5:                         0.24554305 0.25993072
#> 6:                         0.02953193 0.05680264
#>    Lrnr_pkg_SuperLearner_screener_screen.glmnet___Lrnr_glm
#> 1:                                              0.36228209
#> 2:                                              0.36228209
#> 3:                                              0.25870995
#> 4:                                              0.25870995
#> 5:                                              0.25870995
#> 6:                                              0.05600958

## Contributions

It is our hope that sl3 will grow to be widely used for creating stacked regression models and the cross-validation of pipelines that make up such models, as well as the variety of other applications in which the Super Learner algorithm plays a role. To that end, contributions are very welcome, though we ask that interested contributors consult our contribution guidelines prior to submitting a pull request.

After using the sl3 R package, please cite the following:

    @misc{coyle2017sl3,
author = {Coyle, Jeremy R and Hejazi, Nima S and Malenica, Ivana and
Sofrygin, Oleg},
title = {{sl3}: Modern Pipelines for Machine Learning and {Super
Learning}},
year  = {2017},
howpublished = {\url{https://github.com/jeremyrcoyle/sl3}},
url = {http://dx.doi.org/DOI_HERE},
doi = {DOI_HERE}
}

The contents of this repository are distributed under the GPL-3 license. See file LICENSE for details.