| Title: | Targeted Inference |
|---|---|
| Description: | Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>). |
| Authors: | Klaus K. Holst [aut, cre], Benedikt Sommer [aut], Andreas Nordland [aut], Christian B. Pipper [ctb] |
| Maintainer: | Klaus K. Holst <[email protected]> |
| License: | Apache License (== 2.0) |
| Version: | 0.8 |
| Built: | 2026-05-25 12:14:55 UTC |
| Source: | https://github.com/kkholst/targeted |
AIPW for the mean (and linear projections of the EIF) with missing observations
aipw( response.model, propensity.model, formula = ~1, data, response_model = deprecated, propensity_model = deprecated, ... )aipw( response.model, propensity.model, formula = ~1, data, response_model = deprecated, propensity_model = deprecated, ... )
response.model |
Model for the response given covariates (learner or formula) |
propensity.model |
Optional missing data mechanism model (propensity model) (learner or formula) |
formula |
design specifying the OLS estimator with outcome given by the
EIF (see |
data |
data.frame |
response_model |
Deprecated. Use response.model instead. |
propensity_model |
Deprecated. Use treatment.model instead. |
... |
additional arguments (see |
m <- lava::lvm(y ~ x+z, r ~ x) |> lava::distribution(~ r, value = lava::binomial.lvm()) |> transform(y0~r+y, value = \(x) { x[x[,1]==0,2] <- NA; x[,2] }) d <- lava::sim(m,1e3,seed=1) aipw(y0 ~ x, data=d)m <- lava::lvm(y ~ x+z, r ~ x) |> lava::distribution(~ r, value = lava::binomial.lvm()) |> transform(y0~r+y, value = \(x) { x[x[,1]==0,2] <- NA; x[,2] }) d <- lava::sim(m,1e3,seed=1) aipw(y0 ~ x, data=d)
Assumption lean inference via cross-fitting (Double ML). See <doi:10.1111/rssb.12504
alean( response_model, exposure_model, data, link = "identity", g_model, nfolds = 1, silent = FALSE, mc.cores, ... )alean( response_model, exposure_model, data, link = "identity", g_model, nfolds = 1, silent = FALSE, mc.cores, ... )
response_model |
formula or learner object (formula => glm) |
exposure_model |
model for the exposure |
data |
data.frame |
link |
Link function (g) |
g_model |
Model for |
nfolds |
Number of folds |
silent |
supress all messages and progressbars |
mc.cores |
mc.cores Optional number of cores. parallel::mcmapply used instead of future |
... |
additional arguments to future.apply::future_mapply |
Let be the response variable, the exposure and
covariates. The target parameter is:
The response_model is the model for , and
exposure_model is the model for .
link specifies .
alean.targeted object
Klaus Kähler Holst
sim1 <- function(n, family=gaussian(), ...) { m <- lava::lvm() |> lava::distribution(~y, value=lava::binomial.lvm()) |> lava::regression('a', value=function(l) l) |> lava::regression('y', value=function(a,l) a + l) if (family$family=="binomial") lava::distribution(m, ~a) <- lava::binomial.lvm() lava::sim(m, n) } library(splines) f <- binomial() d <- sim1(1e4, family=f) e <- alean( response_model=learner_glm(y ~ a + bs(l, df=3), family=binomial), exposure_model=learner_glm(a ~ bs(l, df=3), family=f), data=d, link = "logit", mc.cores=1, nfolds=1 ) e e <- alean(response_model=learner_glm(y ~ a + l, family=binomial), exposure_model=learner_glm(a ~ l), data=d, link = "logit", mc.cores=1, nfolds=1) esim1 <- function(n, family=gaussian(), ...) { m <- lava::lvm() |> lava::distribution(~y, value=lava::binomial.lvm()) |> lava::regression('a', value=function(l) l) |> lava::regression('y', value=function(a,l) a + l) if (family$family=="binomial") lava::distribution(m, ~a) <- lava::binomial.lvm() lava::sim(m, n) } library(splines) f <- binomial() d <- sim1(1e4, family=f) e <- alean( response_model=learner_glm(y ~ a + bs(l, df=3), family=binomial), exposure_model=learner_glm(a ~ bs(l, df=3), family=f), data=d, link = "logit", mc.cores=1, nfolds=1 ) e e <- alean(response_model=learner_glm(y ~ a + l, family=binomial), exposure_model=learner_glm(a ~ l), data=d, link = "logit", mc.cores=1, nfolds=1) e
Augmented Inverse Probability Weighting estimator for the Average (Causal)
Treatment Effect. All nuisance models are here parametric (glm). For a more
general approach see the cate implementation. In this implementation
the standard errors are correct even when the nuisance models are
mis-specified (the influence curve is calculated including the term coming
from the parametric nuisance models). The estimate is consistent if either
the propensity model or the outcome model / Q-model is correctly specified.
ate( formula, data = parent.frame(), weights, offset, family = stats::gaussian(identity), nuisance = NULL, propensity = nuisance, all, labels = NULL, adjust.nuisance = TRUE, adjust.propensity = TRUE, ... )ate( formula, data = parent.frame(), weights, offset, family = stats::gaussian(identity), nuisance = NULL, propensity = nuisance, all, labels = NULL, adjust.nuisance = TRUE, adjust.propensity = TRUE, ... )
formula |
formula (see details below) |
data |
data.frame |
weights |
optional frequency weights |
offset |
optional offset (character or vector). can also be specified in the formula. |
family |
Exponential family argument for outcome model |
nuisance |
outcome regression formula (Q-model) |
propensity |
propensity model formula |
all |
when TRUE all standard errors are calculated (default TRUE when exposure only has two levels) |
labels |
optional treatment labels |
adjust.nuisance |
adjust for uncertainty due to estimation of outcome regression model parameters |
adjust.propensity |
adjust for uncertainty due to estimation of propensity regression model parameters |
... |
additional arguments to lower level functions |
The formula may either be specified as: response ~ treatment | nuisance-formula | propensity-formula
For example: ate(y~a | x+z+a | x*z, data=...)
Alternatively, as a list: ate(list(y~a, ~x+z, ~x*z), data=...)
Or using the nuisance (and propensity argument):
ate(y~a, nuisance=~x+z, ...)
An object of class 'ate.targeted' is returned. See
targeted-class for more details about this class and its
generic functions.
Klaus K. Holst
cate
m <- lava::lvm(y ~ a+x, a~x) |> lava::distribution(~y, value = lava::binomial.lvm()) |> lava::ordinal(K=4, ~a) |> transform(~a, value = factor) d <- lava::sim(m, 1e3, seed=1) # (a <- ate(y~a|a*x|x, data=d)) (a <- ate(y~a, nuisance=~a*x, propensity=~x, data = d)) # Comparison with randomized experiment m0 <- lava::cancel(m, a~x) lm(y~a-1, lava::sim(m0,2e4)) # Choosing a different contrast for the association measures summary(a, contrast=c(2,4))m <- lava::lvm(y ~ a+x, a~x) |> lava::distribution(~y, value = lava::binomial.lvm()) |> lava::ordinal(K=4, ~a) |> transform(~a, value = factor) d <- lava::sim(m, 1e3, seed=1) # (a <- ate(y~a|a*x|x, data=d)) (a <- ate(y~a, nuisance=~a*x, propensity=~x, data = d)) # Comparison with randomized experiment m0 <- lava::cancel(m, a~x) lm(y~a-1, lava::sim(m0,2e4)) # Choosing a different contrast for the association measures summary(a, contrast=c(2,4))
Calibration for multiclassication methods
calibration( pr, cl, weights = NULL, threshold = 10, method = "bin", breaks = nclass.Sturges, df = 3, ... )calibration( pr, cl, weights = NULL, threshold = 10, method = "bin", breaks = nclass.Sturges, df = 3, ... )
pr |
matrix with probabilities for each class |
cl |
class variable |
weights |
counts |
threshold |
do not calibrate if less then 'threshold' events |
method |
either 'isotonic' (pava), 'logistic', 'mspline' (monotone spline), 'bin' (local constant) |
breaks |
optional number of bins (only for method 'bin') |
df |
degrees of freedom (only for spline methods) |
... |
additional arguments to lower level functions |
...
An object of class 'calibration' is returned.
See calibration-class
for more details about this class and its generic functions.
Klaus K. Holst
sim1 <- function(n, beta=c(-3, rep(.5,10)), rho=.5) { p <- length(beta)-1 xx <- lava::rmvn0(n,sigma=diag(nrow=p)*(1-rho)+rho) y <- rbinom(n, 1, lava::expit(cbind(1,xx)%*%beta)) d <- data.frame(y=y, xx) names(d) <- c("y",paste0("x",1:p)) return(d) } set.seed(1) beta <- c(-2,rep(1,10)) d <- sim1(1e4, beta=beta) a1 <- naivebayes(y ~ ., data=d) a2 <- glm(y ~ ., data=d, family=binomial) ## a3 <- randomForest(factor(y) ~ ., data=d, family=binomial) d0 <- sim1(1e4, beta=beta) p1 <- predict(a1, newdata=d0) p2 <- predict(a2, newdata=d0, type="response") ## p3 <- predict(a3, newdata=d0, type="prob") c2 <- calibration(p2, d0$y, method="isotonic") c1 <- calibration(p1, d0$y, breaks=100) if (interactive()) { plot(c1) plot(c2,col="red",add=TRUE) abline(a=0,b=1) with(c1$xy[[1]], points(pred,freq,type="b", col="red")) } set.seed(1) beta <- c(-2,rep(1,10)) dd <- lava::csplit(sim1(1e4, beta=beta), k=3) mod <- naivebayes(y ~ ., data=dd[[1]]) p1 <- predict(mod, newdata=dd[[2]]) cal <- calibration(p1, dd[[2]]$y) p2 <- predict(mod, newdata=dd[[3]]) pp <- predict(c1, p2) cc <- calibration(pp, dd[[3]]$y) if (interactive()) {#' plot(cal) plot(cc, add=TRUE, col="blue") }sim1 <- function(n, beta=c(-3, rep(.5,10)), rho=.5) { p <- length(beta)-1 xx <- lava::rmvn0(n,sigma=diag(nrow=p)*(1-rho)+rho) y <- rbinom(n, 1, lava::expit(cbind(1,xx)%*%beta)) d <- data.frame(y=y, xx) names(d) <- c("y",paste0("x",1:p)) return(d) } set.seed(1) beta <- c(-2,rep(1,10)) d <- sim1(1e4, beta=beta) a1 <- naivebayes(y ~ ., data=d) a2 <- glm(y ~ ., data=d, family=binomial) ## a3 <- randomForest(factor(y) ~ ., data=d, family=binomial) d0 <- sim1(1e4, beta=beta) p1 <- predict(a1, newdata=d0) p2 <- predict(a2, newdata=d0, type="response") ## p3 <- predict(a3, newdata=d0, type="prob") c2 <- calibration(p2, d0$y, method="isotonic") c1 <- calibration(p1, d0$y, breaks=100) if (interactive()) { plot(c1) plot(c2,col="red",add=TRUE) abline(a=0,b=1) with(c1$xy[[1]], points(pred,freq,type="b", col="red")) } set.seed(1) beta <- c(-2,rep(1,10)) dd <- lava::csplit(sim1(1e4, beta=beta), k=3) mod <- naivebayes(y ~ ., data=dd[[1]]) p1 <- predict(mod, newdata=dd[[2]]) cal <- calibration(p1, dd[[2]]$y) p2 <- predict(mod, newdata=dd[[3]]) pp <- predict(c1, p2) cc <- calibration(pp, dd[[3]]$y) if (interactive()) {#' plot(cal) plot(cc, add=TRUE, col="blue") }
The functions calibration returns an object
of the class calibration.
An object of class 'calibration' is a list with at least the
following components:
estimated step-functions
(see stepfun) for each class
the unique classes
model/method type (string)
list of data.frame's with predictions (pr) and estimated probabilities of success (only for 'bin' method)
objects of the S3 class 'calibration'
The following S3 generic functions are available for an object
of class calibration:
predictApply calibration to new data.
plotPlot the calibration curves (reliability plot).
printBasic print method.
## See example(calibration) for examples## See example(calibration) for examples
Conditional Average Treatment Effect estimation with cross-fitting.
cate( response.model, treatment.model, cate.model = ~1, calibration.model = NULL, data, contrast, nfolds = 1, rep = 1, silent = FALSE, stratify = FALSE, mc.cores = NULL, var.type = "IC", second.order = TRUE, response_model = deprecated, cate_model = deprecated, propensity_model = deprecated, propensity.model = deprecated, treatment = deprecated, ... )cate( response.model, treatment.model, cate.model = ~1, calibration.model = NULL, data, contrast, nfolds = 1, rep = 1, silent = FALSE, stratify = FALSE, mc.cores = NULL, var.type = "IC", second.order = TRUE, response_model = deprecated, cate_model = deprecated, propensity_model = deprecated, propensity.model = deprecated, treatment = deprecated, ... )
response.model |
formula or learner object (formula => learner_glm) |
treatment.model |
formula or learner object (formula => learner_glm) |
cate.model |
formula specifying regression design for conditional average treatment effects |
calibration.model |
linear calibration model. Specify covariates in addition to predicted potential outcomes to include in the calibration. |
data |
data.frame |
contrast |
treatment contrast (default 1 vs 0) |
nfolds |
number of folds (positive integer), or a pre-specified list of
fold indices where each element is an integer vector of observation indices
forming a partition of |
rep |
number of replications of cross-fitting procedure by averaging estimates and influence functions from each replication |
silent |
suppress all messages and progressbars |
stratify |
if TRUE the response.model will be stratified by treatment |
mc.cores |
(optional) number of cores. parallel::mcmapply used instead of future |
var.type |
when equal to "IC" the asymptotic variance is derived from
the influence function. Otherwise, based on expressions in Bannick et al.
(2025) valid under different covariate-adaptive randomization schemes (only
available for ATE and when |
second.order |
add seconder order term to IF to handle misspecification of outcome models |
response_model |
Deprecated. Use response.model instead. |
cate_model |
Deprecated. Use cate.model instead. |
propensity_model |
Deprecated. Use treatment.model instead. |
propensity.model |
Deprecated. Use treatment.model instead. |
treatment |
Deprecated. Use cate.model instead. |
... |
additional arguments to future.apply::future_mapply |
We have observed data where is the response variable,
the binary treatment, and covariates. We further let
be a subset of the covariates. Define the conditional potential mean outcome
and let denote a parametric working model, then the target parameter is the
mean-squared error
cate.targeted object
Klaus Kähler Holst, Andreas Nordland
Mark J. van der Laan (2006) Statistical Inference for Variable Importance, The International Journal of Biostatistics.
Bannick, Shao & Liu et al. (2025) A General Form of Covariate Adjustment in Clinical Trials under Covariate-Adaptive Randomization, Biometrika.
sim1 <- function(n=1000, ...) { w1 <- rnorm(n) w2 <- rnorm(n) a <- rbinom(n, 1, plogis(-1 + w1)) y <- cos(w1) + w2*a + 0.2*w2^2 + a + rnorm(n) data.frame(y, a, w1, w2) } d <- sim1(5000) ## ATE cate(cate.model=~1, response.model=y~a*(w1+w2), treatment.model=a~w1+w2, data=d) ## CATE cate(cate.model=~1+w2, response.model=y~a*(w1+w2), treatment.model=a~w1+w2, data=d) ## Not run: ## superlearner example mod1 <- list( glm = learner_glm(y~w1+w2), gam = learner_gam(y~s(w1) + s(w2)) ) s1 <- learner_sl(mod1, nfolds=5) cate(cate.model=~1, response.model=s1, treatment.model=learner_glm(a~w1+w2, family=binomial), data=d, stratify=TRUE) ## End(Not run)sim1 <- function(n=1000, ...) { w1 <- rnorm(n) w2 <- rnorm(n) a <- rbinom(n, 1, plogis(-1 + w1)) y <- cos(w1) + w2*a + 0.2*w2^2 + a + rnorm(n) data.frame(y, a, w1, w2) } d <- sim1(5000) ## ATE cate(cate.model=~1, response.model=y~a*(w1+w2), treatment.model=a~w1+w2, data=d) ## CATE cate(cate.model=~1+w2, response.model=y~a*(w1+w2), treatment.model=a~w1+w2, data=d) ## Not run: ## superlearner example mod1 <- list( glm = learner_glm(y~w1+w2), gam = learner_gam(y~s(w1) + s(w2)) ) s1 <- learner_sl(mod1, nfolds=5) cate(cate.model=~1, response.model=s1, treatment.model=learner_glm(a~w1+w2, family=binomial), data=d, stratify=TRUE) ## End(Not run)
Conditional average treatment effect estimation via Double Machine Learning
cate_link( treatment, link = "identity", response_model, propensity_model, importance_model, contrast = c(1, 0), data, nfolds = 5, type = "dml1", ... )cate_link( treatment, link = "identity", response_model, propensity_model, importance_model, contrast = c(1, 0), data, nfolds = 5, type = "dml1", ... )
treatment |
formula specifying treatment and variables to condition on |
link |
Link function |
response_model |
SL object |
propensity_model |
SL object |
importance_model |
SL object |
contrast |
treatment contrast (default 1 vs 0) |
data |
data.frame |
nfolds |
Number of folds |
type |
'dml1' or 'dml2' |
... |
additional arguments to SuperLearner |
cate.targeted object
Klaus Kähler Holst & Andreas Nordland
# Example 1: sim1 <- function(n=1e4, seed=NULL, return_model=FALSE, ...){ suppressPackageStartupMessages(require("lava")) if (!is.null(seed)) set.seed(seed) m <- lava::lvm() distribution(m, ~x) <- gaussian.lvm() distribution(m, ~v) <- gaussian.lvm(mean = 10) distribution(m, ~a) <- binomial.lvm("logit") regression(m, "a") <- function(v, x){.1*v + x} distribution(m, "y") <- gaussian.lvm() regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v} if (return_model) return(m) lava::sim(m, n = n) } if (require("SuperLearner",quietly=TRUE)) { d <- sim1(n = 1e3, seed = 1) e <- cate_link(data=d, type = "dml2", treatment = a ~ v, response_model = y~ a*(x + v + I(v^2)), importance_model = SL(D_ ~ v + I(v^2)), nfolds = 10) summary(e) # the true parameters are c(1,1) }# Example 1: sim1 <- function(n=1e4, seed=NULL, return_model=FALSE, ...){ suppressPackageStartupMessages(require("lava")) if (!is.null(seed)) set.seed(seed) m <- lava::lvm() distribution(m, ~x) <- gaussian.lvm() distribution(m, ~v) <- gaussian.lvm(mean = 10) distribution(m, ~a) <- binomial.lvm("logit") regression(m, "a") <- function(v, x){.1*v + x} distribution(m, "y") <- gaussian.lvm() regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v} if (return_model) return(m) lava::sim(m, n = n) } if (require("SuperLearner",quietly=TRUE)) { d <- sim1(n = 1e3, seed = 1) e <- cate_link(data=d, type = "dml2", treatment = a ~ v, response_model = y~ a*(x + v + I(v^2)), importance_model = SL(D_ ~ v + I(v^2)), nfolds = 10) summary(e) # the true parameters are c(1,1) }
The functions cv returns an object of the type
cross_validated.
An object of class 'cross_validated' is a list with at least the
following components:
An array with the model score(s) evaluated for each fold,
repetition, and model estimates
(see estimate.default)
Names (character vector) of the models
number of repetitions of the CV
Number of folds of the CV
objects of the S3 class 'cross_validated'
The following S3 generic functions are available for an object of
class cross_validated:
coefExtract average model scores from the cross-validation procedure.
printBasic print method.
summarySummary of the cross-validation procedure.
'
# See example(cv) for examples# See example(cv) for examples
Conditional Relative Risk estimation via Double Machine Learning
crr( treatment, response_model, propensity_model, importance_model, contrast = c(1, 0), data, nfolds = 5, type = "dml1", ... )crr( treatment, response_model, propensity_model, importance_model, contrast = c(1, 0), data, nfolds = 5, type = "dml1", ... )
treatment |
formula specifying treatment and variables to condition on |
response_model |
SL object |
propensity_model |
SL object |
importance_model |
SL object |
contrast |
treatment contrast (default 1 vs 0) |
data |
data.frame |
nfolds |
Number of folds |
type |
'dml1' or 'dml2' |
... |
additional arguments to SuperLearner |
cate.targeted object
Klaus Kähler Holst & Andreas Nordland
sim1 <- function(n=1e4, seed=NULL, return_model=FALSE, ...){ suppressPackageStartupMessages(require("lava")) if (!is.null(seed)) set.seed(seed) m <- lava::lvm() distribution(m, ~x) <- gaussian.lvm() distribution(m, ~v) <- gaussian.lvm(mean = 10) distribution(m, ~a) <- binomial.lvm("logit") regression(m, "a") <- function(v, x){.1*v + x} distribution(m, "y") <- gaussian.lvm() regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v} if (return_model) return(m) lava::sim(m, n = n) } d <- sim1(n = 2e3, seed = 1) if (require("SuperLearner",quietly=TRUE)) { e <- crr(data=d, type = "dml2", treatment = a ~ v, response_model = learner_glm(y~ a*(x + v + I(v^2))), importance_model = learner_glm(D_ ~ v + I(v^2)), propensity_model = learner_glm(a ~ x + v + I(v^2), family=binomial), nfolds = 2) summary(e) # the true parameters are c(1,1) }sim1 <- function(n=1e4, seed=NULL, return_model=FALSE, ...){ suppressPackageStartupMessages(require("lava")) if (!is.null(seed)) set.seed(seed) m <- lava::lvm() distribution(m, ~x) <- gaussian.lvm() distribution(m, ~v) <- gaussian.lvm(mean = 10) distribution(m, ~a) <- binomial.lvm("logit") regression(m, "a") <- function(v, x){.1*v + x} distribution(m, "y") <- gaussian.lvm() regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v} if (return_model) return(m) lava::sim(m, n = n) } d <- sim1(n = 2e3, seed = 1) if (require("SuperLearner",quietly=TRUE)) { e <- crr(data=d, type = "dml2", treatment = a ~ v, response_model = learner_glm(y~ a*(x + v + I(v^2))), importance_model = learner_glm(D_ ~ v + I(v^2)), propensity_model = learner_glm(a ~ x + v + I(v^2), family=binomial), nfolds = 2) summary(e) # the true parameters are c(1,1) }
Predict the cumulative hazard/survival function for a survival model
cumhaz( object, newdata, times = NULL, individual.time = FALSE, extend = FALSE, ... )cumhaz( object, newdata, times = NULL, individual.time = FALSE, extend = FALSE, ... )
object |
Survival model object: phreg, coxph, rfsrc, ranger, survSuperLearner |
newdata |
data.frame |
times |
numeric vector: Time points at which the survival model is evaluated. If NULL, the time points associated with the survival model is used. |
individual.time |
logical: If TRUE the survival object is evaluated at different time points for each row in newdata. The number of rows in newdata and the length of times must be the same. |
extend |
if TRUE, prints information for all specified 'times’, even if there are no subjects left at the end of the specified ‘times’ (see survival::summary.survfit). |
... |
Additional arguments. |
List with elements:
time: numeric vector
chf: cumulative hazard function. If individual.time = FALSE, matrix with dimension (nrow(newdata), length(times)). If individual.time = TRUE, vector of length length(times).
surv: survival function, exp(-chf).
dchf: t(diff(rbind(0, t(chf))))
Klaus K. Holst, Andreas Nordland
Generic cross-validation function
## Default S3 method: cv( object, data, response = NULL, nfolds = 5, rep = 1, weights = NULL, model.score = scoring, seed = NULL, shared = NULL, args.pred = NULL, args.future = list(), mc.cores, silent = FALSE, ... )## Default S3 method: cv( object, data, response = NULL, nfolds = 5, rep = 1, weights = NULL, model.score = scoring, seed = NULL, shared = NULL, args.pred = NULL, args.future = list(), mc.cores, silent = FALSE, ... )
object |
List of learner objects |
data |
data.frame or matrix |
response |
Response variable (vector or name of column in |
nfolds |
Number of folds (nfolds=0 simple test/train split into two folds 1:([n]/2), ([n]+1/2):n with last part used for testing) |
rep |
Number of repetitions (default 1) |
weights |
Optional frequency weights |
model.score |
Model scoring metric (default: MSE / Brier score). Must be a function with arguments response and prediction, and may optionally include weights, object and newdata arguments |
seed |
Random seed (argument parsed to future_Apply::future_lapply) |
shared |
Function applied to each fold with results send to each model |
args.pred |
Optional arguments to prediction function (see details below) |
args.future |
Arguments to future.apply::future_mapply |
mc.cores |
Optional number of cores. parallel::mcmapply used instead of future |
silent |
suppress all messages and progressbars |
... |
Additional arguments parsed to elements in |
object should be list of objects of class learner.
Alternatively, each element of models should be a list with a fitting
function and a prediction function.
The response argument can optionally be a named list where the name is
then used as the name of the response argument in models. Similarly, if data
is a named list with a single data.frame/matrix then this name will be used
as the name of the data/design matrix argument in models.
An object of class 'cross_validated' is returned. See
cross_validated-class for more details about this class and
its generic functions.
Klaus K. Holst
m <- list(learner_glm(Sepal.Length~1), learner_glm(Sepal.Length~Species), learner_glm(Sepal.Length~Species + Petal.Length)) x <- cv(m, rep=10, data=iris) xm <- list(learner_glm(Sepal.Length~1), learner_glm(Sepal.Length~Species), learner_glm(Sepal.Length~Species + Petal.Length)) x <- cv(m, rep=10, data=iris) x
Cross-validation estimation of the generalization error of the super learner and each of the separate models in the ensemble. Both the chosen model scoring metrics as well as the model weights of the stacked ensemble.
## S3 method for class 'learner_sl' cv(object, data, nfolds = 5, rep = 1, model.score = scoring, ...)## S3 method for class 'learner_sl' cv(object, data, nfolds = 5, rep = 1, model.score = scoring, ...)
object |
(learner_sl) Instantiated learner_sl object. |
data |
data.frame or matrix |
nfolds |
Number of folds (nfolds=0 simple test/train split into two folds 1:([n]/2), ([n]+1/2):n with last part used for testing) |
rep |
Number of repetitions (default 1) |
model.score |
Model scoring metric (default: MSE / Brier score). Must be a function with arguments response and prediction, and may optionally include weights, object and newdata arguments |
... |
Additional arguments parsed to elements in |
sim1 <- function(n = 5e2) { x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) data.frame(y, x1, x2) } sl <- learner_sl(list( "mean" = learner_glm(y ~ 1), "glm" = learner_glm(y ~ x1), "glm2" = learner_glm(y ~ x1 + x2) )) cv(sl, data = sim1(), rep = 2)sim1 <- function(n = 5e2) { x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) data.frame(y, x1, x2) } sl <- learner_sl(list( "mean" = learner_glm(y ~ 1), "glm" = learner_glm(y ~ x1), "glm2" = learner_glm(y ~ x1 + x2) )) cv(sl, data = sim1(), rep = 2)
Cast warning for deprecated function argument names
deprecate_arg_warn(old, new, fun, vers)deprecate_arg_warn(old, new, fun, vers)
old |
deprecated argument name |
new |
argument that should be used instead |
fun |
function name where arguments are deprecated |
vers |
version when argument is deprecated |
Deprecated argument names
response_model |
Deprecated. Use response.model instead. |
propensity_model |
Deprecated. Use treatment.model instead. |
cate_model |
Deprecated. Use cate.model instead. |
treatment |
Deprecated. Use cate.model instead. |
propensity.model |
Deprecated. Use treatment.model instead. |
Extract design matrix from data.frame and formula
design( formula, data, ..., intercept = FALSE, response = TRUE, rm_envir = FALSE, specials = NULL, specials.call = NULL, levels = NULL, design.matrix = TRUE )design( formula, data, ..., intercept = FALSE, response = TRUE, rm_envir = FALSE, specials = NULL, specials.call = NULL, levels = NULL, design.matrix = TRUE )
formula |
formula |
data |
data.frame |
... |
additional arguments (e.g, specials such weights, offsets, ...) |
intercept |
(logical) If FALSE an intercept is not included in the design matrix |
response |
(logical) if FALSE the response variable is dropped |
rm_envir |
Remove environment |
specials |
character vector specifying functions in the formula that should be marked as special in the terms object |
specials.call |
(call) specials optionally defined as a call-type |
levels |
a named list of character vectors giving the full set of levels to be assumed for each factor |
design.matrix |
(logical) if FALSE then only response and specials are
returned. Otherwise, the design.matrix |
An object of class 'design'
Klaus Kähler Holst
Let denote the clinical outcome, the binary treatment
variable, baseline covariates, the failure time,
and the cause of failure.
The following are our two target parameters
estimate_truncatedscore( data, mod.y, mod.r, mod.a, mod.event, time, cause = NULL, cens.code = 0, naive = FALSE, control = list(), ... )estimate_truncatedscore( data, mod.y, mod.r, mod.a, mod.event, time, cause = NULL, cens.code = 0, naive = FALSE, control = list(), ... )
data |
(data.frame) |
mod.y |
(formula or learner) Model for clinical outcome given T>time. Using a formula specifies a glm with an identity link (see example). |
mod.r |
(formula or learner) Model for missing data mechanism for clinical outcome at T=time. Using a formula specifies a glm with a log link. |
mod.a |
(formula or learner) Treatment model (in RCT should just be 'a ~ 1'). Using a formula specifies a glm with a log link. |
mod.event |
(formula) Model for time-to-event process ('Event(time,status) ~ x'). |
time |
(numeric) Landmark time. |
cause |
(integer) Primary event (in the 'status' variable of the 'Event' statement). |
cens.code |
(integer) Censoring code. |
naive |
(logical) If TRUE, the unadjusted estimates ignoring baseline covariates is returned as the attribute 'naive'. |
control |
(list) optimization routine parameters. |
... |
Additional arguments passed to mets::binregATE. |
lava::estimate.default object
Klaus Kähler Holst
data(truncatedscore) mod1 <- learner_glm(y ~ a * (x1 + x2)) mod2 <- learner_glm(r ~ a * (x1 + x2), family = binomial) a <- estimate_truncatedscore( data = truncatedscore, mod.y = mod1, mod.r = mod2, mod.a = a ~ 1, mod.event = mets::Event(time, status) ~ x1+x2, time = 2 ) s <- summary(a, noninf.t = -0.1) print(s) parameter(s) # the above is equivalent to # a <- estimate_truncatedscore( # data = truncatedscore, # mod.y = y ~ a * (x1 + x2), # mod.r = r ~ a * (x1 + x2), # mod.a = a ~ 1, # mod.event = mets::Event(time, status) ~ x1+x2, # time = 2 # )data(truncatedscore) mod1 <- learner_glm(y ~ a * (x1 + x2)) mod2 <- learner_glm(r ~ a * (x1 + x2), family = binomial) a <- estimate_truncatedscore( data = truncatedscore, mod.y = mod1, mod.r = mod2, mod.a = a ~ 1, mod.event = mets::Event(time, status) ~ x1+x2, time = 2 ) s <- summary(a, noninf.t = -0.1) print(s) parameter(s) # the above is equivalent to # a <- estimate_truncatedscore( # data = truncatedscore, # mod.y = y ~ a * (x1 + x2), # mod.r = r ~ a * (x1 + x2), # mod.a = a ~ 1, # mod.event = mets::Event(time, status) ~ x1+x2, # time = 2 # )
Similar to expand.grid function, this function creates all combinations
of the input arguments but returns the result as a list.
expand.list(..., INPUT = NULL, envir = NULL)expand.list(..., INPUT = NULL, envir = NULL)
... |
input variables |
INPUT |
optional list of variables |
envir |
environment environment to evalute formulas in |
list
Klaus Kähler Holst
expand.list(x = 2:4, z = c("a", "b"))expand.list(x = 2:4, z = c("a", "b"))
Fit survival nuisance models
fit_survival_models( data, response, censoring, response_call = "phreg", response_args = list(), censoring_call = "phreg", censoring_args = list() )fit_survival_models( data, response, censoring, response_call = "phreg", response_args = list(), censoring_call = "phreg", censoring_args = list() )
data |
data.frame |
response |
Response formula (e.g., Surv(time, event) ~ A + W) |
censoring |
Censoring formula (e.g., Surv(time, event == 0) ~ A + W)) |
response_call |
Model call for the response model (e.g. "mets::phreg") |
response_args |
Additional arguments passed to the response model |
censoring_call |
Similar to response_callb |
censoring_args |
Similar to response_args |
List with elements T_model and C_model
Andreas Nordland, Klaus K. Holst
Computes an approximation of , where
is a survival function, for a selection of start and stop time
points.
int_surv(times, surv, start = 0, stop = max(times), extend = FALSE)int_surv(times, surv, start = 0, stop = max(times), extend = FALSE)
times |
Numeric vector, sorted time points. |
surv |
Numeric vector, values of a survival function evaluated at time
points given by |
start |
Numeric vector, start of the integral. |
stop |
Numeric vector, end of the integral. |
extend |
should the integral be extended beyond the last observed time point |
Numeric vector, value of the integral.
Andreas Nordland
Interface for statistical and machine learning models to be used for nuisance model estimation in targeted learning.
The following list provides an overview of constructors for many commonly used models.
Regression and classification: learner_glm, learner_gam, learner_grf,
learner_hal, learner_glmnet_cv, learner_svm, learner_xgboost,
learner_mars
Regression: learner_isoreg
Classification: learner_naivebayes
Ensemble (super learner): learner_sl
infoOptional information/name of the model
clearRemove fitted model from the learner object
fitReturn estimated model object.
formulaReturn model formula. Use learner$update() to update the formula.
predict.filterReturn instantiated prediction filter function
predict.filter.generatorReturn prediction filter generator function
learner$new()Create a new prediction model object
learner$new( formula = NULL, estimate, predict = stats::predict, predict.args = NULL, estimate.args = NULL, info = NULL, specials = c(), formula.keep.specials = FALSE, predict.filter = function(data, ...) function(pred, newdata, ...) pred, intercept = FALSE )
formulaformula specifying outcome and design matrix
estimatefunction for fitting the model. This must be a function with response, 'y', and design matrix, 'x'. Alternatively, a function with a formula and data argument. See the examples section.
predictprediction function (must be a function of model object, 'object', and new design matrix, 'newdata')
predict.argsoptional arguments to prediction function
estimate.argsoptional arguments to estimate function
infooptional description of the model
specialsoptional specials terms (weights, offset, id, subset, ...) passed on to design
formula.keep.specialsif TRUE then special terms defined by
specials will be removed from the formula before it is being passed to
the estimate print.function()
predict.filterfunction to post-process predictions. Useful to bound predictions or handle NAs. The argument is experimental and its behavior may change in the future.
intercept(logical) include intercept in design matrix
learner$estimate()Estimation method
learner$estimate(data, ..., store = TRUE)
datadata.frame
...Additional arguments to estimation and prediction filter generator function
storeLogical determining if estimated model should be stored inside the class.
learner$predict()Prediction method
learner$predict(newdata, ..., object = NULL)
newdatadata.frame
...Additional arguments to prediction method and prediction filter function
objectOptional model fit object
learner$update()Update formula
learner$update(formula)
formulaformula or character which defines the new response
learner$print()Print method
learner$print()
learner$summary()Summary method to provide more extensive information than learner$print().
learner$summary()
summarized_learner object, which is a list with the following elements:
description of the learner
formula specifying outcome and design matrix
function for fitting the model
arguments to estimate function
function for making predictions from fitted model
arguments to predict function
provided special terms
include intercept in design matrix
lr <- learner_glm(y ~ x, family = "nb") lr$summary() lr_sum <- lr$summary() # store returned summary in new object names(lr_sum) print(lr_sum)
learner$response()Extract response from data
learner$response(data, eval = TRUE, ...)
datadata.frame
evalwhen FALSE return the untransformed outcome (i.e., return 'a' if formula defined as I(a==1) ~ ...)
...additional arguments to design
learner$design()Generate design object (design matrix and response) from data
learner$design(data, ...)
datadata.frame
...additional arguments to design
learner$opt()Get options
learner$opt(arg)
argname of option to get value of
learner$clone()The objects of this class are cloneable with this method.
learner$clone(deep = FALSE)
deepWhether to make a deep clone.
Klaus Kähler Holst, Benedikt Sommer
data(iris) rf <- function(formula, ...) { learner$new(formula, info = "grf::probability_forest", estimate = function(x, y, ...) { grf::probability_forest(X = x, Y = y, ...) }, predict = function(object, newdata) { predict(object, newdata)$predictions }, estimate.args = list(...) ) } args <- expand.list( num.trees = c(100, 200), mtry = 1:3, formula = c(Species ~ ., Species ~ Sepal.Length + Sepal.Width) ) models <- lapply(args, function(par) do.call(rf, par)) x <- models[[1]]$clone() x$estimate(iris) predict(x, newdata = head(iris)) # Reduce Ex. timing a <- targeted::cv(models, data = iris) cbind(coef(a), attr(args, "table")) # defining learner via function with arguments y (response) # and x (design matrix) f1 <- learner$new( estimate = function(y, x) lm.fit(x = x, y = y), predict = function(object, newdata) newdata %*% object$coefficients ) # defining the learner via arguments formula and data f2 <- learner$new( estimate = function(formula, data, ...) glm(formula, data, ...) ) # generic learner defined from function (predict method derived per default # from stats::predict f3 <- learner$new( estimate = function(dt, ...) { lm(y ~ x, data = dt) } ) ## ------------------------------------------------ ## Method `learner$summary()` ## ------------------------------------------------ lr <- learner_glm(y ~ x, family = "nb") lr$summary() lr_sum <- lr$summary() # store returned summary in new object names(lr_sum) print(lr_sum)data(iris) rf <- function(formula, ...) { learner$new(formula, info = "grf::probability_forest", estimate = function(x, y, ...) { grf::probability_forest(X = x, Y = y, ...) }, predict = function(object, newdata) { predict(object, newdata)$predictions }, estimate.args = list(...) ) } args <- expand.list( num.trees = c(100, 200), mtry = 1:3, formula = c(Species ~ ., Species ~ Sepal.Length + Sepal.Width) ) models <- lapply(args, function(par) do.call(rf, par)) x <- models[[1]]$clone() x$estimate(iris) predict(x, newdata = head(iris)) # Reduce Ex. timing a <- targeted::cv(models, data = iris) cbind(coef(a), attr(args, "table")) # defining learner via function with arguments y (response) # and x (design matrix) f1 <- learner$new( estimate = function(y, x) lm.fit(x = x, y = y), predict = function(object, newdata) newdata %*% object$coefficients ) # defining the learner via arguments formula and data f2 <- learner$new( estimate = function(formula, data, ...) glm(formula, data, ...) ) # generic learner defined from function (predict method derived per default # from stats::predict f3 <- learner$new( estimate = function(dt, ...) { lm(y ~ x, data = dt) } ) ## ------------------------------------------------ ## Method `learner$summary()` ## ------------------------------------------------ lr <- learner_glm(y ~ x, family = "nb") lr$summary() lr_sum <- lr$summary() # store returned summary in new object names(lr_sum) print(lr_sum)
Construct learners from a grid of parameters
learner_expand_grid(fun, args, names = TRUE, params = FALSE)learner_expand_grid(fun, args, names = TRUE, params = FALSE)
fun |
(function) A function that returns a learner. |
args |
(list) Parameters that generate a grid of parameters with
expand.list, where the set of parameters are then passed on to |
names |
(logical or character) If FALSE, then return a list without names. If TRUE, a named list is returned (see details). |
params |
(logical) If FALSE, then no information about the parameters
defined by |
list
lrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)) ) lrs # use info of constructed learner as names lrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)), names = "xgboost" ) names(lrs) # use xgboost instead of info field for names lrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)), names = "xgboost", params = TRUE ) names(lrs) # also add parameters to names lrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)), names = FALSE ) names(lrs) # unnamed list since names = FALSElrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)) ) lrs # use info of constructed learner as names lrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)), names = "xgboost" ) names(lrs) # use xgboost instead of info field for names lrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)), names = "xgboost", params = TRUE ) names(lrs) # also add parameters to names lrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)), names = FALSE ) names(lrs) # unnamed list since names = FALSE
Constructs learner class object for fitting generalized additive models with mgcv::gam.
learner_gam( formula, info = "mgcv::gam", family = gaussian(), select = FALSE, gamma = 1, learner.args = NULL, ... )learner_gam( formula, info = "mgcv::gam", family = gaussian(), select = FALSE, gamma = 1, learner.args = NULL, ... )
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
family |
This is a family object specifying the distribution and link to use in
fitting etc (see |
select |
If this is |
gamma |
Increase this beyond 1 to produce smoother models. |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to mgcv::gam. |
learner object.
n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) d0 <- data.frame(y, x1, x2) lr <- learner_gam(y ~ s(x1) + x2) lr$estimate(d0) if (interactive()) { plot(lr$fit) }n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) d0 <- data.frame(y, x1, x2) lr <- learner_gam(y ~ s(x1) + x2) lr$estimate(d0) if (interactive()) { plot(lr$fit) }
Constructs a learner class object for fitting generalized
linear models with stats::glm and MASS::glm.nb. Negative binomial
regression is supported with family = "nb" (or alternatively family = "negbin").
learner_glm( formula, info = "glm", family = gaussian(), learner.args = NULL, ... )learner_glm( formula, info = "glm", family = gaussian(), learner.args = NULL, ... )
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
family |
a description of the error distribution and link
function to be used in the model. For |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to stats::glm or MASS::glm.nb. |
learner object.
n <- 5e2 x <- rnorm(n) w <- 50 + rexp(n, rate = 1 / 5) y <- rpois(n, exp(2 + 0.5 * x + log(w)) * rgamma(n, 1 / 2, 1 / 2)) d0 <- data.frame(y, x, w) lr <- learner_glm(y ~ x) # linear Gaussian model lr$estimate(d0) coef(lr$fit) # negative binomial regression model with offset (using MASS::glm.nb) lr <- learner_glm(y ~ x + offset(log(w)), family = "nb") lr$estimate(d0) coef(lr$fit) lr$predict(data.frame(x = 1, w = c(1, 5))) # response scale lr$predict(data.frame(x = 1, w = c(1, 5)), type = "link") # link scalen <- 5e2 x <- rnorm(n) w <- 50 + rexp(n, rate = 1 / 5) y <- rpois(n, exp(2 + 0.5 * x + log(w)) * rgamma(n, 1 / 2, 1 / 2)) d0 <- data.frame(y, x, w) lr <- learner_glm(y ~ x) # linear Gaussian model lr$estimate(d0) coef(lr$fit) # negative binomial regression model with offset (using MASS::glm.nb) lr <- learner_glm(y ~ x + offset(log(w)), family = "nb") lr$estimate(d0) coef(lr$fit) lr$predict(data.frame(x = 1, w = c(1, 5))) # response scale lr$predict(data.frame(x = 1, w = c(1, 5)), type = "link") # link scale
Constructs a learner class object for fitting entire lasso or
elastic-net regularization paths for various linear and non-linear regression
models with glmnet::cv.glmnet. Predictions are returned for the value of
lambda that gives minimum cvm. That is, glmnet::predict.cv.glmnet is
called with s = "lambda.min".
learner_glmnet_cv( formula, info = "glmnet::cv.glmnet", family = gaussian(), lambda = NULL, alpha = 1, nfolds = 10, learner.args = NULL, ... )learner_glmnet_cv( formula, info = "glmnet::cv.glmnet", family = gaussian(), lambda = NULL, alpha = 1, nfolds = 10, learner.args = NULL, ... )
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
family |
Either a character string representing
one of the built-in families, or else a |
lambda |
Optional user-supplied lambda sequence; default is
|
alpha |
The elasticnet mixing parameter, with
|
nfolds |
number of folds - default is 10. Although |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Other arguments that can be passed to glmnet, for example |
learner object.
# continuous outcome n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) lp <- x1 + x2*x1 + cos(x1) y <- rnorm(n, lp, sd = 2) d0 <- data.frame(y, x1, x2) lr <- learner_glmnet_cv(y ~ x1 + x2) lr$estimate(d0, nfolds = 3) lr$predict(data.frame(x1 = c(0, 1), x2 = 1)) # count outcome with different exposure time w <- 50 + rexp(n, rate = 1 / 5) y <- rpois(n, exp(0.5 * x1 - 1 * x2 + log(w)) * rgamma(n, 1 / 2, 1 / 2)) d0 <- data.frame(y, x1, x2, w) lr <- learner_glmnet_cv(y ~ x1 + x2 + offset(log(w)), family = "poisson") lr$estimate(d0, nfolds = 3) lr$predict(data.frame(x1 = 1, x2 = 1, w = c(1, 5)))# continuous outcome n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) lp <- x1 + x2*x1 + cos(x1) y <- rnorm(n, lp, sd = 2) d0 <- data.frame(y, x1, x2) lr <- learner_glmnet_cv(y ~ x1 + x2) lr$estimate(d0, nfolds = 3) lr$predict(data.frame(x1 = c(0, 1), x2 = 1)) # count outcome with different exposure time w <- 50 + rexp(n, rate = 1 / 5) y <- rpois(n, exp(0.5 * x1 - 1 * x2 + log(w)) * rgamma(n, 1 / 2, 1 / 2)) d0 <- data.frame(y, x1, x2, w) lr <- learner_glmnet_cv(y ~ x1 + x2 + offset(log(w)), family = "poisson") lr$estimate(d0, nfolds = 3) lr$predict(data.frame(x1 = 1, x2 = 1, w = c(1, 5)))
Constructs a learner class object for fitting generalized
random forest models with grf::regression_forest or
grf::probability_forest. As shown in the examples, the constructed learner
returns predicted class probabilities of class 2 in case of binary
classification. A n times p matrix, with n being the number of
observations and p the number of classes, is returned for multi-class
classification.
learner_grf( formula, num.trees = 2000, min.node.size = 5, alpha = 0.05, sample.fraction = 0.5, num.threads = 1, model = "grf::regression_forest", info = model, learner.args = NULL, ... )learner_grf( formula, num.trees = 2000, min.node.size = 5, alpha = 0.05, sample.fraction = 0.5, num.threads = 1, model = "grf::regression_forest", info = model, learner.args = NULL, ... )
formula |
(formula) Formula specifying response and design matrix. |
num.trees |
Number of trees grown in the forest. Note: Getting accurate confidence intervals generally requires more trees than getting accurate predictions. Default is 2000. |
min.node.size |
A target for the minimum number of observations in each tree leaf. Note that nodes with size smaller than min.node.size can occur, as in the original randomForest package. Default is 5. |
alpha |
A tuning parameter that controls the maximum imbalance of a split. Default is 0.05. |
sample.fraction |
Fraction of the data used to build each tree. Note: If honesty = TRUE, these subsamples will further be cut by a factor of honesty.fraction. Default is 0.5. |
num.threads |
Number of threads used in training. By default, the number of threads is set to the maximum hardware concurrency. |
model |
(character) grf model to estimate. Usually regression_forest (grf::regression_forest) or probability_forest (grf::probability_forest). |
info |
(character) Optional information to describe the instantiated learner object. |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to |
learner object.
n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) lp <- x2*x1 + cos(x1) yb <- rbinom(n, 1, lava::expit(lp)) y <- lp + rnorm(n, sd = 0.5**.5) d <- data.frame(y, yb, x1, x2) # regression lr <- learner_grf(y ~ x1 + x2) lr$estimate(d) lr$predict(head(d)) # binary classification lr <- learner_grf(as.factor(yb) ~ x1 + x2, model = "probability_forest") lr$estimate(d) lr$predict(head(d)) # predict class probabilities of class 2 # multi-class classification lr <- learner_grf(Species ~ ., model = "probability_forest") lr$estimate(iris) lr$predict(head(iris))n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) lp <- x2*x1 + cos(x1) yb <- rbinom(n, 1, lava::expit(lp)) y <- lp + rnorm(n, sd = 0.5**.5) d <- data.frame(y, yb, x1, x2) # regression lr <- learner_grf(y ~ x1 + x2) lr$estimate(d) lr$predict(head(d)) # binary classification lr <- learner_grf(as.factor(yb) ~ x1 + x2, model = "probability_forest") lr$estimate(d) lr$predict(head(d)) # predict class probabilities of class 2 # multi-class classification lr <- learner_grf(Species ~ ., model = "probability_forest") lr$estimate(iris) lr$predict(head(iris))
Constructs a learner class object for fitting a highly adaptive lasso model with hal9001::fit_hal.
learner_hal( formula, info = "hal9001::fit_hal", smoothness_orders = 0, reduce_basis = NULL, family = "gaussian", learner.args = NULL, ... )learner_hal( formula, info = "hal9001::fit_hal", smoothness_orders = 0, reduce_basis = NULL, family = "gaussian", learner.args = NULL, ... )
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
smoothness_orders |
An |
reduce_basis |
Am optional |
family |
A |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to hal9001::fit_hal. |
learner object.
## Not run: n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) d <- data.frame(y, x1, x2) lr <- learner_hal(y ~ x1 + x2, smoothness_orders = 0.5, reduce_basis = 1) lr$estimate(d) lr$predict(data.frame(x1 = 0, x2 = c(-1, 1))) ## End(Not run)## Not run: n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) d <- data.frame(y, x1, x2) lr <- learner_hal(y ~ x1 + x2, smoothness_orders = 0.5, reduce_basis = 1) lr$estimate(d) lr$predict(data.frame(x1 = 0, x2 = c(-1, 1))) ## End(Not run)
Constructs a learner class object for isotonic regression with isoregw.
learner_isoreg(formula, info = "targeted::isoregw", learner.args = NULL, ...)learner_isoreg(formula, info = "targeted::isoregw", learner.args = NULL, ...)
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to isoregw. |
learner object.
x <- runif(5e3, -5, 5) pr <- lava::expit(-1 + x) y <- rbinom(length(pr), 1, pr) d <- data.frame(y, x) lr <- learner_isoreg(y ~ x) lr$estimate(d) pr_iso <- lr$predict(d) if (interactive()) { plot(pr ~ x, cex=0.3) lines(sort(x), pr_iso[order(x)], col="red", type="s") }x <- runif(5e3, -5, 5) pr <- lava::expit(-1 + x) y <- rbinom(length(pr), 1, pr) d <- data.frame(y, x) lr <- learner_isoreg(y ~ x) lr$estimate(d) pr_iso <- lr$predict(d) if (interactive()) { plot(pr ~ x, cex=0.3) lines(sort(x), pr_iso[order(x)], col="red", type="s") }
Constructs a learner class object for fitting multivariate adaptive regression splines with earth::earth.
learner_mars( formula, info = "earth::earth", degree = 1, nprune = NULL, glm = NULL, learner.args = NULL, ... )learner_mars( formula, info = "earth::earth", degree = 1, nprune = NULL, glm = NULL, learner.args = NULL, ... )
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
degree |
Maximum degree of interaction (Friedman's |
nprune |
Maximum number of terms (including intercept) in the pruned model.
Default is NULL, meaning all terms created by the forward pass
(but typically not all terms will remain after pruning).
Use this to enforce an upper bound on the model size (that is less than |
glm |
NULL (default) or a list of arguments to pass on to |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to earth::earth. |
learner object.
# poisson regression n <- 5e2 x <- rnorm(n) w <- 50 + rexp(n, rate = 1 / 5) y <- rpois(n, exp(2 + 0.5 * x + log(w)) * rgamma(n, 1 / 2, 1 / 2)) d0 <- data.frame(y, x, w) lr <- learner_mars(y ~ x + offset(log(w)), degree = 2, glm = list(family = poisson()) ) lr$estimate(d0) lr$predict(data.frame(x = 0, w = c(1, 2)))# poisson regression n <- 5e2 x <- rnorm(n) w <- 50 + rexp(n, rate = 1 / 5) y <- rpois(n, exp(2 + 0.5 * x + log(w)) * rgamma(n, 1 / 2, 1 / 2)) d0 <- data.frame(y, x, w) lr <- learner_mars(y ~ x + offset(log(w)), degree = 2, glm = list(family = poisson()) ) lr$estimate(d0) lr$predict(data.frame(x = 0, w = c(1, 2)))
Constructs a learner class object for fitting a naive bayes
classifier with naivebayes. As shown in the examples, the constructed
learner returns predicted class probabilities of class 2 in case of binary
classification. A n times p matrix, with n being the number of
observations and p the number of classes, is returned for multi-class
classification.
learner_naivebayes( formula, info = "Naive Bayes", laplace.smooth = 0, kernel = FALSE, learner.args = NULL, ... )learner_naivebayes( formula, info = "Naive Bayes", laplace.smooth = 0, kernel = FALSE, learner.args = NULL, ... )
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
laplace.smooth |
Laplace smoothing |
kernel |
If TRUE a kernel estimator is used for numeric predictors (otherwise a gaussian model is used) |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to naivebayes. |
learner object.
n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- rbinom(n, 1, lava::expit(x2*x1 + cos(x1))) d <- data.frame(y, x1, x2) # binary classification lr <- learner_naivebayes(y ~ x1 + x2) lr$estimate(d) lr$predict(head(d)) # multi-class classification lr <- learner_naivebayes(Species ~ .) lr$estimate(iris) lr$predict(head(iris))n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- rbinom(n, 1, lava::expit(x2*x1 + cos(x1))) d <- data.frame(y, x1, x2) # binary classification lr <- learner_naivebayes(y ~ x1 + x2) lr$estimate(d) lr$predict(head(d)) # multi-class classification lr <- learner_naivebayes(Species ~ .) lr$estimate(iris) lr$predict(head(iris))
Constructs a learner class object for fitting a superlearner.
learner_sl( learners, info = NULL, nfolds = 5L, meta.learner = metalearner_nnls, model.score = mse, learner.args = NULL, ... )learner_sl( learners, info = NULL, nfolds = 5L, meta.learner = metalearner_nnls, model.score = mse, learner.args = NULL, ... )
learners |
(list) List of learner objects (i.e. learner_glm) |
info |
(character) Optional information to describe the instantiated learner object. |
nfolds |
(integer) Number of folds to use in cross-validation to estimate the ensemble weights. |
meta.learner |
(function) Algorithm to learn the ensemble weights
(default non-negative least squares). Must be a function of the response
(nx1 vector), |
model.score |
(function) Model scoring method (see learner) |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to superlearner |
learner object.
sim1 <- function(n = 5e2) { x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) data.frame(y, x1, x2) } d <- sim1() m <- list( "mean" = learner_glm(y ~ 1), "glm" = learner_glm(y ~ x1 + x2), "iso" = learner_isoreg(y ~ x1) ) s <- learner_sl(m, nfolds = 10) s$estimate(d) pr <- s$predict(d) if (interactive()) { plot(y ~ x1, data = d) points(d$x1, pr, col = 2, cex = 0.5) lines(cos(x1) + x1 ~ x1, data = d[order(d$x1), ], lwd = 4, col = lava::Col("darkblue", 0.3)) } print(s) # weights(s$fit) # score(s$fit) cvres <- cv(s, data = d, nfolds = 3, rep = 2) cvres # coef(cvres) # score(cvres)sim1 <- function(n = 5e2) { x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) data.frame(y, x1, x2) } d <- sim1() m <- list( "mean" = learner_glm(y ~ 1), "glm" = learner_glm(y ~ x1 + x2), "iso" = learner_isoreg(y ~ x1) ) s <- learner_sl(m, nfolds = 10) s$estimate(d) pr <- s$predict(d) if (interactive()) { plot(y ~ x1, data = d) points(d$x1, pr, col = 2, cex = 0.5) lines(cos(x1) + x1 ~ x1, data = d[order(d$x1), ], lwd = 4, col = lava::Col("darkblue", 0.3)) } print(s) # weights(s$fit) # score(s$fit) cvres <- cv(s, data = d, nfolds = 3, rep = 2) cvres # coef(cvres) # score(cvres)
This function creates a stratified learner from an existing
learner wrapper function such as learner_glm or learner_xgboost. The
stratification variable can be specified either using the stratify
argument (which can be given as a string "a" or a formula , for example ~
I(a==0)), or it can be defined as a special term directly in the formula, y
~ ... + stratify(a). The formula will subsequently be passed to the
learner_ wrapper without the stratify special term.
learner_stratify( formula, learner, stratify = NULL, info = NULL, learner.args = list(), ... )learner_stratify( formula, learner, stratify = NULL, info = NULL, learner.args = list(), ... )
formula |
formula specifying outcome and design matrix |
learner |
(learner) learner object |
stratify |
(character,formula) variables to stratify by |
info |
optional description of the model |
learner.args |
(list) optional arguments to the learner constructor |
... |
additional arguments passed to the learner constructor |
learner object
simdata <- function(n=1000) { a <- rbinom(n, 1, 0.5) x <- rnorm(n) y <- rbinom(n, 1, plogis(-1 + a + a * x)) data.frame(y, a, x) } d <- simdata() lr <- learner_stratify( y ~ x + stratify(a), learner_glm, family=binomial() ) lr$estimate(d) lr$predict(head(d))simdata <- function(n=1000) { a <- rbinom(n, 1, 0.5) x <- rnorm(n) y <- rbinom(n, 1, plogis(-1 + a + a * x)) data.frame(y, a, x) } d <- simdata() lr <- learner_stratify( y ~ x + stratify(a), learner_glm, family=binomial() ) lr$estimate(d) lr$predict(head(d))
Constructs a learner class object for fitting Cox proportional hazards models.
learner_surv_cox(formula, info = "mets::phreg", learner.args = NULL, ...)learner_surv_cox(formula, info = "mets::phreg", learner.args = NULL, ...)
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to lower level funtions |
learner object.
Klaus Kähler Holst
data(sTRACE, package="mets") mod <- learner_surv_cox(Surv(time, status>0) ~ sex + strata(age)) mod$estimate(sTRACE) mod$predict(head(sTRACE), times=5) # P(T>t|X)data(sTRACE, package="mets") mod <- learner_surv_cox(Surv(time, status>0) ~ sex + strata(age)) mod$estimate(sTRACE) mod$predict(head(sTRACE), times=5) # P(T>t|X)
Constructs a learner class object for random survival forests
learner_surv_rf( formula, info = "survival forest (ranger)", num.threads = 1L, learner.args = NULL, ... )learner_surv_rf( formula, info = "survival forest (ranger)", num.threads = 1L, learner.args = NULL, ... )
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
num.threads |
Number of threads. Use 0 for all available cores. Default is 2 if not set by options/environment variables (see below). |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Further arguments passed to or from other methods (currently ignored). |
learner object.
Klaus Kähler Holst
data(sTRACE, package="mets") mod <- learner_surv_rf(Surv(time, status>0) ~ sex + age) mod$estimate(sTRACE) mod$predict(head(sTRACE), times=5) # P(T>t|X)data(sTRACE, package="mets") mod <- learner_surv_rf(Surv(time, status>0) ~ sex + age) mod$estimate(sTRACE) mod$predict(head(sTRACE), times=5) # P(T>t|X)
Constructs a learner class object for fitting support vector
machines with e1071::svm. As shown in the examples, the constructed learner
returns predicted class probabilities of class 2 in case of binary
classification. A n times p matrix, with n being the number of
observations and p the number of classes, is returned for multi-class
classification.
learner_svm( formula, info = "e1071::svm", cost = 1, epsilon = 0.1, kernel = "radial", learner.args = NULL, ... )learner_svm( formula, info = "e1071::svm", cost = 1, epsilon = 0.1, kernel = "radial", learner.args = NULL, ... )
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
cost |
cost of constraints violation (default: 1)—it is the ‘C’-constant of the regularization term in the Lagrange formulation. |
epsilon |
epsilon in the insensitive-loss function (default: 0.1) |
kernel |
the kernel used in training and predicting. You
might consider changing some of the following parameters, depending
on the kernel type.
|
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to e1071::svm. |
learner object.
n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) lp <- x2*x1 + cos(x1) yb <- rbinom(n, 1, lava::expit(lp)) y <- lp + rnorm(n, sd = 0.5**.5) d <- data.frame(y, yb, x1, x2) # regression lr <- learner_svm(y ~ x1 + x2) lr$estimate(d) lr$predict(head(d)) # binary classification lr <- learner_svm(as.factor(yb) ~ x1 + x2) # alternative to transforming response variable to factor # lr <- learner_svm(yb ~ x1 + x2, type = "C-classification") lr$estimate(d) lr$predict(head(d)) # predict class probabilities of class 2 lr$predict(head(d), probability = FALSE) # predict labels # multi-class classification lr <- learner_svm(Species ~ .) lr$estimate(iris) lr$predict(head(iris))n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) lp <- x2*x1 + cos(x1) yb <- rbinom(n, 1, lava::expit(lp)) y <- lp + rnorm(n, sd = 0.5**.5) d <- data.frame(y, yb, x1, x2) # regression lr <- learner_svm(y ~ x1 + x2) lr$estimate(d) lr$predict(head(d)) # binary classification lr <- learner_svm(as.factor(yb) ~ x1 + x2) # alternative to transforming response variable to factor # lr <- learner_svm(yb ~ x1 + x2, type = "C-classification") lr$estimate(d) lr$predict(head(d)) # predict class probabilities of class 2 lr$predict(head(d), probability = FALSE) # predict labels # multi-class classification lr <- learner_svm(Species ~ .) lr$estimate(iris) lr$predict(head(iris))
Constructs a learner class object for xgboost::xgboost.
learner_xgboost( formula, max_depth = 2L, learning_rate = 1, nrounds = 2L, subsample = 1, reg_lambda = 1, objective = "reg:squarederror", info = paste("xgboost", objective), learner.args = NULL, ... )learner_xgboost( formula, max_depth = 2L, learning_rate = 1, nrounds = 2L, subsample = 1, reg_lambda = 1, objective = "reg:squarederror", info = paste("xgboost", objective), learner.args = NULL, ... )
formula |
(formula) Formula specifying response and design matrix. |
max_depth |
(integer) Maximum depth of a tree. |
learning_rate |
(numeric) Learning rate. |
nrounds |
Number of boosting iterations / rounds. Note that the number of default boosting rounds here is not automatically tuned, and different problems will have vastly different optimal numbers of boosting rounds. |
subsample |
(numeric) Subsample ratio of the training instance. |
reg_lambda |
(numeric) L2 regularization term on weights. |
objective |
(character) Specify the learning task and the corresponding learning objective. See xgboost::xgboost for all available options. |
info |
(character) Optional information to describe the instantiated learner object. |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to xgboost::xgboost. |
learner object.
n <- 1e3 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) lp <- x2*x1 + cos(x1) yb <- rbinom(n, 1, lava::expit(lp)) y <- lp + rnorm(n, sd = 0.5**.5) d0 <- data.frame(y, yb, x1, x2) # regression lr <- learner_xgboost(y ~ x1 + x2, nrounds = 5) lr$estimate(d0) lr$predict(head(d0)) # binary classification lr <- learner_xgboost(yb ~ x1 + x2, nrounds = 5, objective = "binary:logistic" ) lr$estimate(d0) lr$predict(head(d0)) # multi-class classification d0 <- iris d0$y <- as.numeric(d0$Species)- 1 lr <- learner_xgboost(y ~ ., objective = "multi:softprob", num_class = 3) lr$estimate(d0) lr$predict(head(d0))n <- 1e3 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) lp <- x2*x1 + cos(x1) yb <- rbinom(n, 1, lava::expit(lp)) y <- lp + rnorm(n, sd = 0.5**.5) d0 <- data.frame(y, yb, x1, x2) # regression lr <- learner_xgboost(y ~ x1 + x2, nrounds = 5) lr$estimate(d0) lr$predict(head(d0)) # binary classification lr <- learner_xgboost(yb ~ x1 + x2, nrounds = 5, objective = "binary:logistic" ) lr$estimate(d0) lr$predict(head(d0)) # multi-class classification d0 <- iris d0$y <- as.numeric(d0$Species)- 1 lr <- learner_xgboost(y ~ ., objective = "multi:softprob", num_class = 3) lr$estimate(d0) lr$predict(head(d0))
Naive Bayes Classifier
naivebayes( formula, data, weights = NULL, kernel = FALSE, laplace.smooth = 0, prior = NULL, ... )naivebayes( formula, data, weights = NULL, kernel = FALSE, laplace.smooth = 0, prior = NULL, ... )
formula |
Formula with syntax: response ~ predictors | weights |
data |
data.frame |
weights |
optional frequency weights |
kernel |
If TRUE a kernel estimator is used for numeric predictors (otherwise a gaussian model is used) |
laplace.smooth |
Laplace smoothing |
prior |
optional prior probabilities (default estimated from data) |
... |
additional arguments to lower level functions |
An object of class 'naivebayes' is returned. See
naivebayes-class for more details about this class and
its generic functions.
Klaus K. Holst
library(data.table) data(iris) m <- naivebayes(Species ~ Sepal.Width + Petal.Length, data = iris) pr <- predict(m, newdata = iris) # using weights to reduce the size of the dataset n <- 5e2 x <- rnorm(n, sd = 2) > 0 y <- rbinom(n, 1, lava::expit(x)) # full data set d1 <- data.frame(y, x = as.factor(x > 0)) m1 <- naivebayes(y ~ x, data = d1) # reduced data set d2 <- data.table(d1)[, .(.N), by = .(y, x)] m2 <- naivebayes(y ~ x, data = d2, weights = d2$N) all(predict(m1, d1) == predict(m2, d1))library(data.table) data(iris) m <- naivebayes(Species ~ Sepal.Width + Petal.Length, data = iris) pr <- predict(m, newdata = iris) # using weights to reduce the size of the dataset n <- 5e2 x <- rnorm(n, sd = 2) > 0 y <- rbinom(n, 1, lava::expit(x)) # full data set d1 <- data.frame(y, x = as.factor(x > 0)) m1 <- naivebayes(y ~ x, data = d1) # reduced data set d2 <- data.table(d1)[, .(.N), by = .(y, x)] m2 <- naivebayes(y ~ x, data = d2, weights = d2$N) all(predict(m1, d1) == predict(m2, d1))
The functions naivebayes returns an object of the type
naivebayes.
An object of class 'naivebayes' is a list with at least
the following components:
Matrix with prior probabilities, i.e. marginal class probabilities Pr(class)
list of matrices with conditional probabilities of the features given the classes (one list element per class), Pr(x|class)
Names (character vector) of the classes
Names of predictors
Conditional model for each predictor
Model design object
The function call which instantiated the object
objects of the S3 class 'naivebayes'
The following S3 generic functions are available for an object
of class naivebayes:
predictPredict class probabilities for new features data.
printBasic print method.
## See example(naivebayes) for examples## See example(naivebayes) for examples
Find the non-dominated point of a set (minima of a point set).
nondom(x, ...)nondom(x, ...)
x |
matrix |
... |
additional arguments to lower level functions |
A point x dominates y if it is never worse and at least in one case strictly better. Formally, let f_i denote the ith coordinate of the condition (objective) function, then for all i: f_i(x)<=f_i(y) and there exists j: f_j(x)<f_j(y).
Based on the algorithm of Kung et al. 1975.
matrix
Klaus Kähler Holst
rbind( c(1.0, 0.5), c(0.0, 1.0), c(1.0, 0.0), c(0.5, 1.0), c(1.0, 1.0), c(0.8, 0.8)) |> nondom()rbind( c(1.0, 0.5), c(0.0, 1.0), c(1.0, 0.0), c(0.5, 1.0), c(1.0, 1.0), c(0.8, 0.8)) |> nondom()
Pooled Adjacent Violators Algorithm
pava(y, x = numeric(0), weights = numeric(0))pava(y, x = numeric(0), weights = numeric(0))
y |
response variable |
x |
(optional) predictor vector (otherwise y is assumed to be a priori sorted according to relevant predictor) |
weights |
weights (optional) weights |
List with index (idx) of jump points and values (value) at each jump point.
Klaus K. Holst
x <- runif(5e3, -5, 5) pr <- lava::expit(-1 + x) y <- rbinom(length(pr), 1, pr) pv <- pava(y, x) plot(pr ~ x, cex=0.3) with(pv, lines(sort(x)[index], value, col="red", type="s"))x <- runif(5e3, -5, 5) pr <- lava::expit(-1 + x) y <- rbinom(length(pr), 1, pr) pv <- pava(y, x) plot(pr ~ x, cex=0.3) with(pv, lines(sort(x)[index], value, col="red", type="s"))
Kernel density estimator predictions
## S3 method for class 'density' predict(object, xnew, ...)## S3 method for class 'density' predict(object, xnew, ...)
object |
density object |
xnew |
New data on which to make predictions for |
... |
additional arguments to lower level functions |
Klaus K. Holst
Naive Bayes Classifier predictions
## S3 method for class 'naivebayes' predict(object, newdata, expectation = NULL, threshold = c(0.001, 0.001), ...)## S3 method for class 'naivebayes' predict(object, newdata, expectation = NULL, threshold = c(0.001, 0.001), ...)
object |
density object |
newdata |
new data on which to make predictions |
expectation |
Variable to calculate conditional expectation wrt probabilities from naivebayes classifier |
threshold |
Threshold parameters. First element defines the threshold on the probabilities and the second element the value to set those truncated probabilities to. |
... |
Additional arguments to lower level functions |
Klaus K. Holst
Obtains predictions for ensemble model or individual learners.
## S3 method for class 'superlearner' predict(object, newdata, all.learners = FALSE, ...)## S3 method for class 'superlearner' predict(object, newdata, all.learners = FALSE, ...)
object |
(superlearner) Fitted superlearner object. |
newdata |
(data.frame) Data in which to look for variables with which to predict. |
all.learners |
(logical) If FALSE (default), then return the predictions from the ensemble model. Otherwise, return predictions of from all individual learners. |
... |
Not used. |
numeric (all.learners = FALSE) or matrix (all.learners = TRUE)
Estimation of the Average Treatment Effect among Responders
RATE( response, post.treatment, treatment, data, M = 5, pr.treatment, treatment.level, SL.args.response = list(family = gaussian(), SL.library = c("SL.mean", "SL.glm")), SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")), preprocess = NULL, efficient = TRUE, ... )RATE( response, post.treatment, treatment, data, M = 5, pr.treatment, treatment.level, SL.args.response = list(family = gaussian(), SL.library = c("SL.mean", "SL.glm")), SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")), preprocess = NULL, efficient = TRUE, ... )
response |
Response formula (e.g, Y ~ D*A) |
post.treatment |
Post treatment marker formula (e.g., D ~ W) |
treatment |
Treatment formula (e.g, A ~ 1) |
data |
data.frame |
M |
Number of folds in cross-fitting (M=1 is no cross-fitting) |
pr.treatment |
(optional) Randomization probability of treatment. |
treatment.level |
Treatment level in binary treatment (default 1) |
SL.args.response |
Arguments to SuperLearner for the response model |
SL.args.post.treatment |
Arguments to SuperLearner for the post treatment indicator |
preprocess |
(optional) Data preprocessing function |
efficient |
If TRUE, the estimate will be efficient. If FALSE, the estimate will be a simple plug-in estimate. |
... |
Additional arguments to lower level functions |
estimate object
Andreas Nordland, Klaus K. Holst
Estimation of the Average Treatment Effect among Responders for Survival Outcomes
RATE.surv( response, post.treatment, treatment, censoring, tau, data, M = 5, pr.treatment, call.response, args.response = list(), SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")), call.censoring, args.censoring = list(), preprocess = NULL, ... )RATE.surv( response, post.treatment, treatment, censoring, tau, data, M = 5, pr.treatment, call.response, args.response = list(), SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")), call.censoring, args.censoring = list(), preprocess = NULL, ... )
response |
Response formula (e.g., Surv(time, event) ~ D + W). |
post.treatment |
Post treatment marker formula (e.g., D ~ W) |
treatment |
Treatment formula (e.g, A ~ 1) |
censoring |
Censoring formula (e.g., Surv(time, event == 0) ~ D + A + W)). |
tau |
Time-point of interest, see Details. |
data |
data.frame |
M |
Number of folds in cross-fitting (M=1 is no cross-fitting) |
pr.treatment |
(optional) Randomization probability of treatment. |
call.response |
Model call for the response model (e.g. "mets::phreg"). |
args.response |
Additional arguments to the response model. |
SL.args.post.treatment |
Arguments to SuperLearner for the post treatment indicator |
call.censoring |
Similar to call.response. |
args.censoring |
Similar to args.response. |
preprocess |
(optional) Data preprocessing function |
... |
Additional arguments to lower level functions |
Estimation of
under right censoring based on plug-in estimates of
and .
An efficient one-step estimator of is constructed
using the efficient influence function
An efficient one-step estimator of is constructed using the
efficient influence function
estimate object
Andreas Nordland, Klaus K. Holst
For a user defined function , computes the integral
dM^c(u|X), where $S^c$ is the censoring
time survival function and $M^c$ is the censoring is the right censoring
martingale with the Doob-Meyer decomposition , where
is the counting process and is the compensator .
rcai( T_model, C_model, data, time, event, tau, H_constructor, sample = 0, blocksize = 0, return_all = FALSE, ... )rcai( T_model, C_model, data, time, event, tau, H_constructor, sample = 0, blocksize = 0, return_all = FALSE, ... )
T_model |
model for event time |
C_model |
model for censoring |
data |
data.frame |
time |
time variable |
event |
event variable |
tau |
stopping time |
H_constructor |
function H(u|X) |
sample |
approximate integral by subsampling jump-times |
blocksize |
evaluate cumhaz in chunks of size blocksize |
return_all |
if TRUE then bot counting process N and compensator term L are returned |
... |
additional arguments passed to lower level functions |
vector with integral from 0 to all jump-times
Andreas Nordland
Risk regression with binary exposure and nuisance model for the odds-product.
Let be the binary exposure, the set of covariates, and
the binary response variable, and define .
The target parameter is either the relative risk
or the risk difference
We assume a target parameter model given by either
or
and similarly a working linear nuisance model for the odds-product
.
A propensity model for is also fitted using a logistic
regression working model
If both the odds-product model and the propensity model are correct the estimator is efficient. Further, the estimator is consistent in the union model, i.e., the estimator is double-robust in the sense that only one of the two models needs to be correctly specified to get a consistent estimate.
riskreg( formula, nuisance = ~1, propensity = ~1, target = ~1, data, weights, type = "rr", optimal = TRUE, std.err = TRUE, start = NULL, mle = FALSE, ... )riskreg( formula, nuisance = ~1, propensity = ~1, target = ~1, data, weights, type = "rr", optimal = TRUE, std.err = TRUE, start = NULL, mle = FALSE, ... )
formula |
formula (see details below) |
nuisance |
nuisance model (formula) |
propensity |
propensity model (formula) |
target |
(optional) target model (formula) |
data |
data.frame |
weights |
optional weights |
type |
type of association measure (rd og rr) |
optimal |
If TRUE optimal weights are calculated |
std.err |
If TRUE standard errors are calculated |
start |
optional starting values |
mle |
Semi-parametric (double-robust) estimate or MLE (TRUE gives MLE) |
... |
additional arguments to unconstrained optimization routine (nlminb) |
The 'formula' argument should be given as response ~ exposure
| target-formula | nuisance-formula or response ~ exposure | target
| nuisance | propensity
E.g., riskreg(y ~ a | 1 | x+z | x+z, data=...)
Alternatively, the model can specifed using the target, nuisance and
propensity arguments: riskreg(y ~ a, target=~1, nuisance=~x+z, ...)
The riskreg_fit function can be used with matrix inputs rather than
formulas.
An object of class 'riskreg.targeted' is returned. See
targeted-class for more details about this class and its
generic functions.
Klaus K. Holst
Richardson, T. S., Robins, J. M., & Wang, L. (2017). On modeling and estimation for the relative risk and risk difference. Journal of the American Statistical Association, 112(519), 1121–1130. http://dx.doi.org/10.1080/01621459.2016.1192546
m <- lava::lvm(a[-2] ~ x, z ~ 1, lp.target[1] ~ 1, lp.nuisance[-1] ~ 2*x) |> lava::distribution(~a, value=lava::binomial.lvm("logit")) |> lava::binomial.rr("y","a","lp.target","lp.nuisance") d <- sim(m,5e2,seed=1) I <- model.matrix(~1, d) X <- model.matrix(~1+x, d) with(d, riskreg_mle(y, a, I, X, type="rr")) with(d, riskreg_fit(y, a, nuisance=X, propensity=I, type="rr")) riskreg(y ~ a | 1, nuisance=~x , data=d, type="rr") ## Model with same design matrix for nuisance and propensity model: with(d, riskreg_fit(y, a, nuisance=X, type="rr")) ## a <- riskreg(y ~ a, target=~z, nuisance=~x, ## propensity=~x, data=d, type="rr") a <- riskreg(y ~ a | z, nuisance=~x, propensity=~x, data=d, type="rr") a predict(a, d[1:5,]) riskreg(y ~ a, nuisance=~x, data=d, type="rr", mle=TRUE)m <- lava::lvm(a[-2] ~ x, z ~ 1, lp.target[1] ~ 1, lp.nuisance[-1] ~ 2*x) |> lava::distribution(~a, value=lava::binomial.lvm("logit")) |> lava::binomial.rr("y","a","lp.target","lp.nuisance") d <- sim(m,5e2,seed=1) I <- model.matrix(~1, d) X <- model.matrix(~1+x, d) with(d, riskreg_mle(y, a, I, X, type="rr")) with(d, riskreg_fit(y, a, nuisance=X, propensity=I, type="rr")) riskreg(y ~ a | 1, nuisance=~x , data=d, type="rr") ## Model with same design matrix for nuisance and propensity model: with(d, riskreg_fit(y, a, nuisance=X, type="rr")) ## a <- riskreg(y ~ a, target=~z, nuisance=~x, ## propensity=~x, data=d, type="rr") a <- riskreg(y ~ a | z, nuisance=~x, propensity=~x, data=d, type="rr") a predict(a, d[1:5,]) riskreg(y ~ a, nuisance=~x, data=d, type="rr", mle=TRUE)
Binary regression models with right censored outcomes
riskreg_cens( response, censoring, treatment = NULL, prediction = NULL, data, newdata, tau, type = "risk", M = 1, call.response = "phreg", args.response = list(), call.censoring = "phreg", args.censoring = list(), preprocess = NULL, efficient = TRUE, control = list(), ... )riskreg_cens( response, censoring, treatment = NULL, prediction = NULL, data, newdata, tau, type = "risk", M = 1, call.response = "phreg", args.response = list(), call.censoring = "phreg", args.censoring = list(), preprocess = NULL, efficient = TRUE, control = list(), ... )
response |
Response formula (e.g., Surv(time, event) ~ D + W). |
censoring |
Censoring formula (e.g., Surv(time, event == 0) ~ D + A + W)). |
treatment |
Optional treatment model (learner) |
prediction |
Optional prediction model (learner) |
data |
data.frame. |
newdata |
Optional data.frame. In this case the uncentered influence function evalued in 'newdata' is returned with nuisance parameters obtained from 'data'. |
tau |
Time-point of interest, see Details. |
type |
"risk", "treatment", "brier" |
M |
Number of folds in cross-fitting (M=1 is no cross-fitting). |
call.response |
Model call for the response model (e.g. "mets::phreg"). |
args.response |
Additional arguments to the response model. |
call.censoring |
Similar to call.response. |
args.censoring |
Similar to args.response. |
preprocess |
(optional) Data pre-processing function. |
efficient |
If FALSE an IPCW estimator is returned |
control |
See details |
... |
Additional arguments to lower level data pre-processing functions. |
The one-step estimator depends on the calculation of an integral
wrt. the martingale process corresponding to the counting process N(t) =
I(C>min(T,tau)). This can be decomposed into an integral wrt the counting
process, and the compensator where the
latter term can be computational intensive to calculate. Rather than
calculating this integral in all observed time points, we can make a
coarser evaluation which can be controlled by setting
control=(sample=N). With N=0 the (computational intensive)
standard evaluation is used.
estimate object
Klaus K. Holst, Andreas Nordland
Extract average cross-validated score of individual learners
## S3 method for class 'superlearner' score(x, ...)## S3 method for class 'superlearner' score(x, ...)
x |
(superlearner) Fitted model. |
... |
Not used. |
Predictive model scoring
scoring( response, ..., type = "quantitative", levels = NULL, metrics = NULL, weights = NULL, names = NULL, object = NULL, newdata = NULL, messages = 1 )scoring( response, ..., type = "quantitative", levels = NULL, metrics = NULL, weights = NULL, names = NULL, object = NULL, newdata = NULL, messages = 1 )
response |
Observed response |
... |
model predictions (continuous predictions or class probabilities (matrices)) |
type |
continuous or categorical response (the latter is automatically chosen if response is a factor, otherwise a continuous response is assumed) |
levels |
(optional) unique levels in response variable |
metrics |
which metrics to report |
weights |
optional frequency weights |
names |
(optional) character vector of the model names in the output. If omitted these will be taken from the names of the ellipsis argument (...) |
object |
optional model object |
newdata |
(optional) data.frame on which to evaluate the model performance |
messages |
controls amount of messages/warnings (0: none) |
Numeric matrix of dimension m x p, where m is the number of different models and p is the number of model metrics
data(iris) set.seed(1) dat <- lava::csplit(iris,2) g1 <- naivebayes(Species ~ Sepal.Width + Petal.Length, data=dat[[1]]) g2 <- naivebayes(Species ~ Sepal.Width, data=dat[[1]]) pr1 <- predict(g1, newdata=dat[[2]], wide=TRUE) pr2 <- predict(g2, newdata=dat[[2]], wide=TRUE) table(colnames(pr1)[apply(pr1,1,which.max)], dat[[2]]$Species) table(colnames(pr2)[apply(pr2,1,which.max)], dat[[2]]$Species) scoring(dat[[2]]$Species, pr1=pr1, pr2=pr2) ## quantitative response: scoring(response=1:10, prediction=rnorm(1:10))data(iris) set.seed(1) dat <- lava::csplit(iris,2) g1 <- naivebayes(Species ~ Sepal.Width + Petal.Length, data=dat[[1]]) g2 <- naivebayes(Species ~ Sepal.Width, data=dat[[1]]) pr1 <- predict(g1, newdata=dat[[2]], wide=TRUE) pr2 <- predict(g2, newdata=dat[[2]], wide=TRUE) table(colnames(pr1)[apply(pr1,1,which.max)], dat[[2]]$Species) table(colnames(pr2)[apply(pr2,1,which.max)], dat[[2]]$Species) scoring(dat[[2]]$Species, pr1=pr1, pr2=pr2) ## quantitative response: scoring(response=1:10, prediction=rnorm(1:10))
SuperLearner wrapper for learner
SL( formula = ~., ..., SL.library = c("SL.mean", "SL.glm"), binomial = FALSE, data = NULL, info = "SuperLearner" )SL( formula = ~., ..., SL.library = c("SL.mean", "SL.glm"), binomial = FALSE, data = NULL, info = "SuperLearner" )
formula |
Model design |
... |
Additional arguments for SuperLearner::SuperLearner |
SL.library |
character vector of prediction algorithms |
binomial |
boolean specifying binomial or gaussian family (default FALSE) |
data |
Optional data.frame |
info |
model information (optional) |
learner object
Klaus Kähler Holst
Softmax transformation
softmax(x, log = FALSE, ref = TRUE, ...)softmax(x, log = FALSE, ref = TRUE, ...)
x |
Input matrix (e.g., linear predictors of multinomial logistic model) |
log |
Return on log-scale (default FALSE) |
ref |
Add reference level (add 0 column to x) |
... |
Additional arguments to lower level functions |
Numeric matrix of dimension n x p, where n= nrow(x) and
p = ncol(x) + (ref==TRUE)
Solve ODE with Runge-Kutta method (RK4)
solve_ode(ode_ptr, input, init, par = 0)solve_ode(ode_ptr, input, init, par = 0)
ode_ptr |
pointer (externalptr) to C++ function or an R function |
input |
Input matrix. 1st column specifies the time points |
init |
Initial conditions |
par |
Parameters defining the ODE (parsed to ode_ptr) |
The external point should be created with the function
targeted::specify_ode.
Matrix with solution
Klaus Kähler Holst
specify_ode
example(specify_ode)example(specify_ode)
Define compiled code for ordinary differential equation.
specify_ode(code, fname = NULL, pname = c("dy", "x", "y", "p"))specify_ode(code, fname = NULL, pname = c("dy", "x", "y", "p"))
code |
string with the body of the function definition (see details) |
fname |
Optional name of the exported C++ function |
pname |
Vector of variable names (results, inputs, states, parameters) |
The model (code) should be specified as the body of of C++ function.
The following variables are defined bye default
(see the argument pname)
Vector with derivatives, i.e. the rhs of the ODE (the result).
Vector with the first element being the time, and the following elements additional exogenous input variables,
Vector with the dependent variable
Parameter vector
All variables are treated as Armadillo (http://arma.sourceforge.net/)
vectors/matrices.
As an example consider the Lorenz Equations
We can specify this model as
ode <- 'dy(0) = p(0)*(y(1)-y(0));
dy(1) = y(0)*(p(1)-y(2));
dy(2) = y(0)*y(1)-p(2)*y(2);'
dy <- specify_ode(ode)
As an example of model with exogenous inputs consider the following ODE:
This could be specified as
mod <- 'double t = x(0);
dy = p(0) + p(1)*y + p(2)*x(1)*y + p(3)*x(1)*t;'
dy <- specify_ode(mod)
pointer (externalptr) to C++ function
Klaus Kähler Holst
solve_ode
ode <- paste0( "dy(0) = p(0)*(y(1)-y(0));", "dy(1) = y(0)*(p(1)-y(2));", "dy(2) = y(0)*y(1)-p(2)*y(2);", collapse="\n" ) # Reduce test time dy <- specify_ode(ode) tt <- seq(0, 100, length.out=2e4) yy <- solve_ode(dy, input=tt, init=c(1, 1, 1), par=c(10, 28, 8/3))ode <- paste0( "dy(0) = p(0)*(y(1)-y(0));", "dy(1) = y(0)*(p(1)-y(2));", "dy(2) = y(0)*y(1)-p(2)*y(2);", collapse="\n" ) # Reduce test time dy <- specify_ode(ode) tt <- seq(0, 100, length.out=2e4) yy <- solve_ode(dy, input=tt, init=c(1, 1, 1), par=c(10, 28, 8/3))
This is a special function that identifies stratification variables when they appear on the right hand side of a formula.
stratify(..., na.group = FALSE, shortlabel, sep = ", ")stratify(..., na.group = FALSE, shortlabel, sep = ", ")
... |
any number of variables. All must be the same length. |
na.group |
a logical variable, if |
shortlabel |
if |
sep |
the character used to separate groups, in the created label |
When used outside of a coxph formula the result of the function
is essentially identical to the interaction function,
though the labels from strata are often more verbose.
a new factor, whose levels are all possible combinations of the factors supplied as arguments.
survival::strata, learner_stratify, interaction
a <- factor(rep(1:3, 4), labels=c("low", "medium", "high")) b <- factor(rep(1:4, 3)) levels(stratify(b)) levels(stratify(a, b, shortlabel=TRUE))a <- factor(rep(1:3, 4), labels=c("low", "medium", "high")) b <- factor(rep(1:4, 3)) levels(stratify(b)) levels(stratify(a, b, shortlabel=TRUE))
This function creates a predictor object (class learner) from a list of existing learner objects. When estimating this model a stacked prediction will be created by weighting together the predictions of each of the initial learners The weights are learned using cross-validation.
superlearner( learners, data, nfolds = 10, meta.learner = metalearner_nnls, model.score = mse, mc.cores = NULL, future.seed = TRUE, silent = TRUE, name.prefix = NULL, ... )superlearner( learners, data, nfolds = 10, meta.learner = metalearner_nnls, model.score = mse, mc.cores = NULL, future.seed = TRUE, silent = TRUE, name.prefix = NULL, ... )
learners |
(list) List of learner objects (i.e. learner_glm) |
data |
(data.frame) Data containing the response variable and covariates. |
nfolds |
(integer) Number of folds to use in cross-validation to estimate the ensemble weights. |
meta.learner |
(function) Algorithm to learn the ensemble weights
(default non-negative least squares). Must be a function of the response
(nx1 vector), |
model.score |
(function) Model scoring method (see learner) |
mc.cores |
(integer) If not NULL, then parallel::mcmapply is used with
|
future.seed |
(logical or integer) Argument passed on to future.apply::future_lapply. If TRUE, then .Random.seed is used if it holds a L'Ecuyer-CMRG RNG seed, otherwise one is created randomly. |
silent |
(logical) Suppress all messages and progressbars |
name.prefix |
(character) Prefix used to name learner objects in
|
... |
Additional arguments to parallel::mclapply or future.apply::future_lapply. |
Luedtke & van der Laan (2016) Super-Learning of an Optimal Dynamic Treatment Rule, The International Journal of Biostatistics.
predict.superlearner weights.superlearner score.superlearner
sim1 <- function(n = 5e2) { x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) data.frame(y, x1, x2) } m <- list( "mean" = learner_glm(y ~ 1), "glm" = learner_glm(y ~ x1 + x2) ) sl <- superlearner(m, data = sim1(), nfolds = 2) predict(sl, newdata = sim1(n = 5)) predict(sl, newdata = sim1(n = 5), all.learners = TRUE)sim1 <- function(n = 5e2) { x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) data.frame(y, x1, x2) } m <- list( "mean" = learner_glm(y ~ 1), "glm" = learner_glm(y ~ x1 + x2) ) sl <- superlearner(m, data = sim1(), nfolds = 2) predict(sl, newdata = sim1(n = 5)) predict(sl, newdata = sim1(n = 5), all.learners = TRUE)
Treatment level estimating functions for survival outcomes under right censoring
survival_treatment_level_estfun( type = "risk", data, tau, survival_models, treatment_model, control )survival_treatment_level_estfun( type = "risk", data, tau, survival_models, treatment_model, control )
type |
Character string, outcome of interest: "risk": P(T <= tau|A=a), "surv": P(T > tau|A=a) |
data |
data.frame |
tau |
Numeric, time-point of interest |
survival_models |
List of survival models, see fit_survival_models() |
treatment_model |
Treatment model, see fit_treatment_model() |
control |
List of control parameters, list(sample, blocksize) |
List with matrix elements estfun, or, and ipw.
Andreas Nordland
The functions riskreg and ate
returns an object of the type targeted.
An object of class 'targeted' is a list with at least the
following components:
An estimate object with the target parameter
estimates (see estimate.default)
Object returned from the applied optimization routine
number of parameters of the model (target and nuisance)
String describing the model
objects of the S3 class 'targeted'
The following S3 generic functions are available for an object of
class targeted:
coefExtract target coefficients of the estimated model.
vcovExtract the variance-covariance matrix of the target parameters.
ICExtract the estimated influence function.
printPrint estimates of the target parameters.
summaryExtract information on both target parameeters and estimated nuisance model.
'
## See example(riskreg) for examples## See example(riskreg) for examples
Extract model component from design object
## S3 method for class 'design' terms(x, specials, ...)## S3 method for class 'design' terms(x, specials, ...)
x |
design object |
specials |
extract variables marked as special (e.g., "offset", "weights", ...) |
... |
Additional arguments to lower level functions |
Calculating test statistics and p-values for the signed Wald intersection test given by
with individual hypotheses for each
coordinate of given by for some
non-inferiority margin , . #
test_intersection_sw( par, vcov, noninf = 0, weights = 1, nsim.null = 10000, index = NULL, control = list(), par.name = "theta" )test_intersection_sw( par, vcov, noninf = 0, weights = 1, nsim.null = 10000, index = NULL, control = list(), par.name = "theta" )
par |
(numeric) parameter estimates or |
vcov |
(matrix) asymptotic variance estimate |
noninf |
(numeric) non-inferiority margins |
weights |
(numeric) optional weights |
nsim.null |
(integer) number of sample used in Monte-Carlo simulation |
index |
(integer) subset of parameters to test |
control |
(list) arguments to alternating projection algorithm. See details section. |
par.name |
(character) parameter names in output |
The constrained least squares problem is solved using Dykstra's
algorithm. The following parameters for the optimization can be controlled
via the control list argument: dykstra_niter sets the maximum number of
iterations (default 500), dykstra_tol convergence tolerance of the
alternating projection algorithm (default 1e-7), pinv_tol tolerance for
calculating the pseudo-inverse matrix (default
length(par).Machine$double.epsmax(eigenvalue)).
htest object
Klaus Kähler Holst, Christian Bressen Pipper
Christian Bressen Pipper, Andreas Nordland & Klaus Kähler Holst (2025) A general approach to construct powerful tests for intersections of one-sided null-hypotheses based on influence functions. arXiv: https://arxiv.org/abs/2511.07096.
test_zmax_onesided lava::test_wald lava::closed_testing
S <- matrix(c(1, 0.5, 0.5, 2), 2, 2) thetahat <- c(0.5, -0.2) test_intersection_sw(thetahat, S, nsim.null = 1e5) test_intersection_sw(thetahat, S, weights = NULL) ## Not run: # only on 'lava' >= 1.8.2 e <- estimate(coef = thetahat, vcov = S, labels = c("p1", "p2")) lava::closed_testing(e, test_intersection_sw, noninf = c(-0.1, -0.1)) |> summary() ## End(Not run)S <- matrix(c(1, 0.5, 0.5, 2), 2, 2) thetahat <- c(0.5, -0.2) test_intersection_sw(thetahat, S, nsim.null = 1e5) test_intersection_sw(thetahat, S, weights = NULL) ## Not run: # only on 'lava' >= 1.8.2 e <- estimate(coef = thetahat, vcov = S, labels = c("p1", "p2")) lava::closed_testing(e, test_intersection_sw, noninf = c(-0.1, -0.1)) |> summary() ## End(Not run)
Calculating test statistics and p-values for the onesided Zmax / minP test.z
Given parameter estimates with approximate assymptotic covariance matrix
, let , where
. The Zmax
test statistic is then , and the
null-hypothesis is with
non-inferiority margin , for which the p-value
is calculated as where is the CDF
of the multivariate normal distribution with mean zero and correlation
matrix .
test_zmax_onesided(par, vcov, noninf = 0, index = NULL, par.name = "theta")test_zmax_onesided(par, vcov, noninf = 0, index = NULL, par.name = "theta")
par |
(numeric) parameter estimates or |
vcov |
(matrix) asymptotic variance estimate |
noninf |
(numeric) non-inferiority margins |
index |
(integer) subset of parameters to test |
par.name |
(character) parameter names in output |
htest object
Christian Bressen Pipper, Klaus Kähler Holst
test_intersection_sw() lava::test_wald()
lava::closed_testing()
Simulated data inspired by the FLOW study (Perkovic 2024)...elt() The following variables are considered in this simulated data set
time: time of first event in years (first major irreversible kidney
event or non-related death)
status: event type at first major irreversible kidney event (=1),
non-related death (=2), or right censoring (=0)
y: clinical outcome measurement (eGFR) at landmark time (:=2)
r: missing indicator for y (1 if observed, 0 if either t<2 or if the
outcome was not measured for other reasons)
a: binary treatment (1 := active, 0 := placebo)
x1: covariate, clinical outcome at baseline (eGFR)
x2: covariate, binary treatment usage indicator (1: SGLT2 treatment,
0: none).
The actual failure times and censoring times are also included
(failure.time, cens.time), and the full-data outcome (y0) given t>2.
Simulated data
Perkovic, V., Tuttle, K. R., Rossing, P., Mahaffey, K. W., Mann, J. F., Bakris, G., Baeres, F. M., Idorn, T., Bosch-Traberg, H., Lausvig, N. L., and Pratley, R. (2024). Effects of semaglutide on chronic kidney disease in patients with type 2 diabetes. New England Journal of Medicine, 391(2):109–121.
data(truncatedscore)data(truncatedscore)
Extract ensemble weights
## S3 method for class 'superlearner' weights(object, ...)## S3 method for class 'superlearner' weights(object, ...)
object |
(superlearner) Fitted model. |
... |
Not used. |