Title: | Targeted Inference |
---|---|
Description: | Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>). |
Authors: | Klaus K. Holst [aut, cre], Benedikt Sommer [aut], Andreas Nordland [aut] |
Maintainer: | Klaus K. Holst <[email protected]> |
License: | Apache License (== 2.0) |
Version: | 0.6 |
Built: | 2025-01-21 10:25:53 UTC |
Source: | https://github.com/kkholst/targeted |
AIPW for the mean (and linear projections of the EIF) with missing observations
aipw(response_model, propensity_model, formula = ~1, data, ...)
aipw(response_model, propensity_model, formula = ~1, data, ...)
response_model |
Model for the response given covariates (ml_model or formula) |
propensity_model |
Optional missing data mechanism model (propensity model) (ml_model or formula) |
formula |
design specifying the OLS estimator with outcome given by the EIF |
data |
data.frame |
... |
additional arguments (see |
m <- lvm(y ~ x+z, r ~ x) distribution(m,~ r) <- binomial.lvm() transform(m, y0~r+y) <- function(x) { x[x[,1]==0,2] <- NA; x[,2] } d <- sim(m,1e3,seed=1) aipw(y0 ~ x, data=d)
m <- lvm(y ~ x+z, r ~ x) distribution(m,~ r) <- binomial.lvm() transform(m, y0~r+y) <- function(x) { x[x[,1]==0,2] <- NA; x[,2] } d <- sim(m,1e3,seed=1) aipw(y0 ~ x, data=d)
Assumption lean inference via cross-fitting (Double ML). See <doi:10.1111/rssb.12504
alean( response_model, exposure_model, data, link = "identity", g_model, nfolds = 1, silent = FALSE, mc.cores, ... )
alean( response_model, exposure_model, data, link = "identity", g_model, nfolds = 1, silent = FALSE, mc.cores, ... )
response_model |
formula or ml_model object (formula => glm) |
exposure_model |
model for the exposure |
data |
data.frame |
link |
Link function (g) |
g_model |
Model for |
nfolds |
Number of folds |
silent |
supress all messages and progressbars |
mc.cores |
mc.cores Optional number of cores. parallel::mcmapply used instead of future |
... |
additional arguments to future.apply::future_mapply |
Let be the response variable,
the exposure and
covariates. The target parameter is:
The response_model
is the model for , and
exposure_model
is the model for .
link
specifies .
alean.targeted object
Klaus Kähler Holst
sim1 <- function(n, family=gaussian(), ...) { m <- lvm() |> distribution(~ y, binomial.lvm()) |> regression('a', value=function(l) l) |> regression('y', value=function(a,l) a + l) if (family$family=="binomial") distribution(m, ~a) <- binomial.lvm() sim(m, n) } library(splines) f <- binomial() d <- sim1(1e4, family=f) e <- alean(response_model=ML(y ~ a + bs(l, df=3), family=binomial), exposure_model=ML(a ~ bs(l, df=3), family=f), data=d, link = "logit", mc.cores=1, nfolds=1) e e <- alean(response_model=ML(y ~ a + l, family=binomial), exposure_model=ML(a ~ l), data=d, link = "logit", mc.cores=1, nfolds=1) e
sim1 <- function(n, family=gaussian(), ...) { m <- lvm() |> distribution(~ y, binomial.lvm()) |> regression('a', value=function(l) l) |> regression('y', value=function(a,l) a + l) if (family$family=="binomial") distribution(m, ~a) <- binomial.lvm() sim(m, n) } library(splines) f <- binomial() d <- sim1(1e4, family=f) e <- alean(response_model=ML(y ~ a + bs(l, df=3), family=binomial), exposure_model=ML(a ~ bs(l, df=3), family=f), data=d, link = "logit", mc.cores=1, nfolds=1) e e <- alean(response_model=ML(y ~ a + l, family=binomial), exposure_model=ML(a ~ l), data=d, link = "logit", mc.cores=1, nfolds=1) e
Augmented Inverse Probability Weighting estimator for the Average (Causal)
Treatment Effect. All nuisance models are here parametric (glm). For a more
general approach see the cate
implementation. In this implementation
the standard errors are correct even when the nuisance models are
misspecified (the influence curve is calculated including the term coming
from the parametric nuisance models). The estimate is consistent if either
the propensity model or the outcome model / Q-model is correctly specified.
ate( formula, data = parent.frame(), weights, offset, family = stats::gaussian(identity), nuisance = NULL, propensity = nuisance, all, labels = NULL, ... )
ate( formula, data = parent.frame(), weights, offset, family = stats::gaussian(identity), nuisance = NULL, propensity = nuisance, all, labels = NULL, ... )
formula |
Formula (see details below) |
data |
data.frame |
weights |
optional frequency weights |
offset |
optional offset (character or vector). can also be specified in the formula. |
family |
Exponential family argument for outcome model |
nuisance |
outcome regression formula (Q-model) |
propensity |
propensity model formula |
all |
If TRUE all standard errors are calculated (default TRUE when exposure only has two levels) |
labels |
Optional treatment labels |
... |
Additional arguments to lower level functions |
The formula may either be specified as: response ~ treatment | nuisance-formula | propensity-formula
For example: ate(y~a | x+z+a | x*z, data=...)
Alternatively, as a list: ate(list(y~a, ~x+z, ~x*z), data=...)
Or using the nuisance (and propensity argument):
ate(y~a, nuisance=~x+z, ...)
An object of class 'ate.targeted
' is returned. See
targeted-class
for more details about this class and its
generic functions.
Klaus K. Holst
cate
m <- lvm(y ~ a+x, a~x) distribution(m, ~y) <- binomial.lvm() m <- ordinal(m, K=4, ~a) transform(m, ~a) <- factor d <- sim(m, 1e3, seed=1) (a <- ate(y~a|a*x|x, data=d)) ## ate(y~a, nuisance=~a*x, propensity=~x, ...) # Comparison with randomized experiment m0 <- cancel(m, a~x) lm(y~a-1, sim(m0,2e4)) # Choosing a different contrast for the association measures summary(a, contrast=c(2,4))
m <- lvm(y ~ a+x, a~x) distribution(m, ~y) <- binomial.lvm() m <- ordinal(m, K=4, ~a) transform(m, ~a) <- factor d <- sim(m, 1e3, seed=1) (a <- ate(y~a|a*x|x, data=d)) ## ate(y~a, nuisance=~a*x, propensity=~x, ...) # Comparison with randomized experiment m0 <- cancel(m, a~x) lm(y~a-1, sim(m0,2e4)) # Choosing a different contrast for the association measures summary(a, contrast=c(2,4))
Calibration for multiclassication methods
calibration( pr, cl, weights = NULL, threshold = 10, method = "bin", breaks = nclass.Sturges, df = 3, ... )
calibration( pr, cl, weights = NULL, threshold = 10, method = "bin", breaks = nclass.Sturges, df = 3, ... )
pr |
matrix with probabilities for each class |
cl |
class variable |
weights |
counts |
threshold |
do not calibrate if less then 'threshold' events |
method |
either 'isotonic' (pava), 'logistic', 'mspline' (monotone spline), 'bin' (local constant) |
breaks |
optional number of bins (only for method 'bin') |
df |
degrees of freedom (only for spline methods) |
... |
additional arguments to lower level functions |
...
An object of class 'calibration
' is returned. See calibration-class
for more details about this class and its generic functions.
Klaus K. Holst
sim1 <- function(n, beta=c(-3, rep(.5,10)), rho=.5) { p <- length(beta)-1 xx <- lava::rmvn0(n,sigma=diag(nrow=p)*(1-rho)+rho) y <- rbinom(n, 1, lava::expit(cbind(1,xx)%*%beta)) d <- data.frame(y=y, xx) names(d) <- c("y",paste0("x",1:p)) return(d) } set.seed(1) beta <- c(-2,rep(1,10)) d <- sim1(1e4, beta=beta) a1 <- NB(y ~ ., data=d) a2 <- glm(y ~ ., data=d, family=binomial) ## a3 <- randomForest(factor(y) ~ ., data=d, family=binomial) d0 <- sim1(1e4, beta=beta) p1 <- predict(a1, newdata=d0) p2 <- predict(a2, newdata=d0, type="response") ## p3 <- predict(a3, newdata=d0, type="prob") c2 <- calibration(p2, d0$y, method="isotonic") c1 <- calibration(p1, d0$y, breaks=100) if (interactive()) { plot(c1) plot(c2,col="red",add=TRUE) abline(a=0,b=1)##' with(c1$xy[[1]], points(pred,freq,type="b", col="red")) } set.seed(1) beta <- c(-2,rep(1,10)) dd <- lava::csplit(sim1(1e4, beta=beta), k=3) mod <- NB(y ~ ., data=dd[[1]]) p1 <- predict(mod, newdata=dd[[2]]) cal <- calibration(p1, dd[[2]]$y) p2 <- predict(mod, newdata=dd[[3]]) pp <- predict(c1, p2) cc <- calibration(pp, dd[[3]]$y) if (interactive()) {##' plot(cal) plot(cc, add=TRUE, col="blue") }
sim1 <- function(n, beta=c(-3, rep(.5,10)), rho=.5) { p <- length(beta)-1 xx <- lava::rmvn0(n,sigma=diag(nrow=p)*(1-rho)+rho) y <- rbinom(n, 1, lava::expit(cbind(1,xx)%*%beta)) d <- data.frame(y=y, xx) names(d) <- c("y",paste0("x",1:p)) return(d) } set.seed(1) beta <- c(-2,rep(1,10)) d <- sim1(1e4, beta=beta) a1 <- NB(y ~ ., data=d) a2 <- glm(y ~ ., data=d, family=binomial) ## a3 <- randomForest(factor(y) ~ ., data=d, family=binomial) d0 <- sim1(1e4, beta=beta) p1 <- predict(a1, newdata=d0) p2 <- predict(a2, newdata=d0, type="response") ## p3 <- predict(a3, newdata=d0, type="prob") c2 <- calibration(p2, d0$y, method="isotonic") c1 <- calibration(p1, d0$y, breaks=100) if (interactive()) { plot(c1) plot(c2,col="red",add=TRUE) abline(a=0,b=1)##' with(c1$xy[[1]], points(pred,freq,type="b", col="red")) } set.seed(1) beta <- c(-2,rep(1,10)) dd <- lava::csplit(sim1(1e4, beta=beta), k=3) mod <- NB(y ~ ., data=dd[[1]]) p1 <- predict(mod, newdata=dd[[2]]) cal <- calibration(p1, dd[[2]]$y) p2 <- predict(mod, newdata=dd[[3]]) pp <- predict(c1, p2) cc <- calibration(pp, dd[[3]]$y) if (interactive()) {##' plot(cal) plot(cc, add=TRUE, col="blue") }
The functions calibration
returns an object of the class calibration
.
An object of class 'calibration
' is a list with at least the following components:
estimated step-functions (see stepfun
) for each class
the unique classes
model/method type (string)
list of data.frame's with predictions (pr) and estimated probabilities of success (only for 'bin' method)
objects of the S3 class 'calibration
'
The following S3 generic functions are available for an object of class targeted
:
predict
Apply calibration to new data.
plot
Plot the calibration curves (reliability plot).
print
Basic print method.
## See example(calibration) for examples
## See example(calibration) for examples
Conditional Average Treatment Effect estimation with cross-fitting.
cate( response.model, propensity.model, cate.model = ~1, contrast = c(1, 0), data, nfolds = 1, rep = 1, silent = FALSE, stratify = FALSE, mc.cores = NULL, ... )
cate( response.model, propensity.model, cate.model = ~1, contrast = c(1, 0), data, nfolds = 1, rep = 1, silent = FALSE, stratify = FALSE, mc.cores = NULL, ... )
response.model |
formula or ml_model object (formula => glm) |
propensity.model |
formula or ml_model object (formula => glm) |
cate.model |
formula specifying regression design for conditional average treatment effects |
contrast |
treatment contrast (default 1 vs 0) |
data |
data.frame |
nfolds |
Number of folds |
rep |
Number of replications of cross-fitting procedure |
silent |
supress all messages and progressbars |
stratify |
If TRUE the response.model will be stratified by treatment |
mc.cores |
mc.cores Optional number of cores. parallel::mcmapply used instead of future |
... |
additional arguments to future.apply::future_mapply |
We have observed data where
is the response variable,
the binary treatment, and
covariates. We further let
be a subset of the covariates. Define the conditional potential mean outcome
and let denote a parametric working model, then the target parameter is the
mean-squared error
cate.targeted object
Klaus Kähler Holst, Andreas Nordland
Mark J. van der Laan (2006) Statistical Inference for Variable Importance, The International Journal of Biostatistics.
sim1 <- function(n=1000, ...) { w1 <- rnorm(n) w2 <- rnorm(n) a <- rbinom(n, 1, expit(-1 + w1)) y <- cos(w1) + w2*a + 0.2*w2^2 + a + rnorm(n) data.frame(y, a, w1, w2) } d <- sim1(5000) ## ATE cate(cate.model=~1, response.model=y~a*(w1+w2), propensity.model=a~w1+w2, data=d) ## CATE cate(cate.model=~1+w2, response.model=y~a*(w1+w2), propensity.model=a~w1+w2, data=d) ## Not run: ## superlearner example mod1 <- list( glm=predictor_glm(y~w1+w2), gam=predictor_gam(y~s(w1) + s(w2)) ) s1 <- predictor_sl(mod1, nfolds=5) cate(cate.model=~1, response.model=s1, propensity.model=predictor_glm(a~w1+w2, family=binomial), data=d, stratify=TRUE) ## End(Not run)
sim1 <- function(n=1000, ...) { w1 <- rnorm(n) w2 <- rnorm(n) a <- rbinom(n, 1, expit(-1 + w1)) y <- cos(w1) + w2*a + 0.2*w2^2 + a + rnorm(n) data.frame(y, a, w1, w2) } d <- sim1(5000) ## ATE cate(cate.model=~1, response.model=y~a*(w1+w2), propensity.model=a~w1+w2, data=d) ## CATE cate(cate.model=~1+w2, response.model=y~a*(w1+w2), propensity.model=a~w1+w2, data=d) ## Not run: ## superlearner example mod1 <- list( glm=predictor_glm(y~w1+w2), gam=predictor_gam(y~s(w1) + s(w2)) ) s1 <- predictor_sl(mod1, nfolds=5) cate(cate.model=~1, response.model=s1, propensity.model=predictor_glm(a~w1+w2, family=binomial), data=d, stratify=TRUE) ## End(Not run)
Conditional average treatment effect estimation via Double Machine Learning
cate_link( treatment, link = "identity", response_model, propensity_model, importance_model, contrast = c(1, 0), data, nfolds = 5, type = "dml1", ... )
cate_link( treatment, link = "identity", response_model, propensity_model, importance_model, contrast = c(1, 0), data, nfolds = 5, type = "dml1", ... )
treatment |
formula specifying treatment and variables to condition on |
link |
Link function |
response_model |
SL object |
propensity_model |
SL object |
importance_model |
SL object |
contrast |
treatment contrast (default 1 vs 0) |
data |
data.frame |
nfolds |
Number of folds |
type |
'dml1' or 'dml2' |
... |
additional arguments to SuperLearner |
cate.targeted object
Klaus Kähler Holst & Andreas Nordland
# Example 1: sim1 <- function(n=1e4, seed=NULL, return_model=FALSE, ...){ suppressPackageStartupMessages(require("lava")) if (!is.null(seed)) set.seed(seed) m <- lava::lvm() distribution(m, ~x) <- gaussian.lvm() distribution(m, ~v) <- gaussian.lvm(mean = 10) distribution(m, ~a) <- binomial.lvm("logit") regression(m, "a") <- function(v, x){.1*v + x} distribution(m, "y") <- gaussian.lvm() regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v} if (return_model) return(m) lava::sim(m, n = n) } if (require("SuperLearner",quietly=TRUE)) { d <- sim1(n = 1e3, seed = 1) e <- cate_link(data=d, type = "dml2", treatment = a ~ v, response_model = y~ a*(x + v + I(v^2)), importance_model = SL(D_ ~ v + I(v^2)), nfolds = 10) summary(e) # the true parameters are c(1,1) }
# Example 1: sim1 <- function(n=1e4, seed=NULL, return_model=FALSE, ...){ suppressPackageStartupMessages(require("lava")) if (!is.null(seed)) set.seed(seed) m <- lava::lvm() distribution(m, ~x) <- gaussian.lvm() distribution(m, ~v) <- gaussian.lvm(mean = 10) distribution(m, ~a) <- binomial.lvm("logit") regression(m, "a") <- function(v, x){.1*v + x} distribution(m, "y") <- gaussian.lvm() regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v} if (return_model) return(m) lava::sim(m, n = n) } if (require("SuperLearner",quietly=TRUE)) { d <- sim1(n = 1e3, seed = 1) e <- cate_link(data=d, type = "dml2", treatment = a ~ v, response_model = y~ a*(x + v + I(v^2)), importance_model = SL(D_ ~ v + I(v^2)), nfolds = 10) summary(e) # the true parameters are c(1,1) }
The functions cv
returns an object of the type
cross_validated
.
An object of class 'cross_validated
' is a list with at least the
following components:
An array with the model score(s) evaluated for each fold,
repetition, and model estimates
(see estimate.default
)
Names (character vector) of the models
number of repetitions of the CV
Number of folds of the CV
objects of the S3 class 'cross_validated
'
The following S3 generic functions are available for an object of
class cross_validated
:
coef
Extract average model scores from the cross-validation procedure.
print
Basic print method.
summary
Summary of the cross-validation procedure.
'
# See example(cv) for examples
# See example(cv) for examples
Conditional Relative Risk estimation via Double Machine Learning
crr( treatment, response_model, propensity_model, importance_model, contrast = c(1, 0), data, nfolds = 5, type = "dml1", ... )
crr( treatment, response_model, propensity_model, importance_model, contrast = c(1, 0), data, nfolds = 5, type = "dml1", ... )
treatment |
formula specifying treatment and variables to condition on |
response_model |
SL object |
propensity_model |
SL object |
importance_model |
SL object |
contrast |
treatment contrast (default 1 vs 0) |
data |
data.frame |
nfolds |
Number of folds |
type |
'dml1' or 'dml2' |
... |
additional arguments to SuperLearner |
cate.targeted object
Klaus Kähler Holst & Andreas Nordland
sim1 <- function(n=1e4, seed=NULL, return_model=FALSE, ...){ suppressPackageStartupMessages(require("lava")) if (!is.null(seed)) set.seed(seed) m <- lava::lvm() distribution(m, ~x) <- gaussian.lvm() distribution(m, ~v) <- gaussian.lvm(mean = 10) distribution(m, ~a) <- binomial.lvm("logit") regression(m, "a") <- function(v, x){.1*v + x} distribution(m, "y") <- gaussian.lvm() regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v} if (return_model) return(m) lava::sim(m, n = n) } d <- sim1(n = 2e3, seed = 1) if (require("SuperLearner",quietly=TRUE)) { e <- crr(data=d, type = "dml2", treatment = a ~ v, response_model = ML(y~ a*(x + v + I(v^2))), importance_model = ML(D_ ~ v + I(v^2)), propensity_model = ML(a ~ x + v + I(v^2), family=binomial), nfolds = 2) summary(e) # the true parameters are c(1,1) }
sim1 <- function(n=1e4, seed=NULL, return_model=FALSE, ...){ suppressPackageStartupMessages(require("lava")) if (!is.null(seed)) set.seed(seed) m <- lava::lvm() distribution(m, ~x) <- gaussian.lvm() distribution(m, ~v) <- gaussian.lvm(mean = 10) distribution(m, ~a) <- binomial.lvm("logit") regression(m, "a") <- function(v, x){.1*v + x} distribution(m, "y") <- gaussian.lvm() regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v} if (return_model) return(m) lava::sim(m, n = n) } d <- sim1(n = 2e3, seed = 1) if (require("SuperLearner",quietly=TRUE)) { e <- crr(data=d, type = "dml2", treatment = a ~ v, response_model = ML(y~ a*(x + v + I(v^2))), importance_model = ML(D_ ~ v + I(v^2)), propensity_model = ML(a ~ x + v + I(v^2), family=binomial), nfolds = 2) summary(e) # the true parameters are c(1,1) }
Generic cross-validation function
cv( models, data, response = NULL, nfolds = 5, rep = 1, weights = NULL, model.score = scoring, seed = NULL, shared = NULL, args.pred = NULL, args.future = list(), mc.cores, ... )
cv( models, data, response = NULL, nfolds = 5, rep = 1, weights = NULL, model.score = scoring, seed = NULL, shared = NULL, args.pred = NULL, args.future = list(), mc.cores, ... )
models |
List of fitting functions |
data |
data.frame or matrix |
response |
Response variable (vector or name of column in |
nfolds |
Number of folds (default 5. K=0 splits in 1:n/2, n/2:n with last part used for testing) |
rep |
Number of repetitions (default 1) |
weights |
Optional frequency weights |
model.score |
Model scoring metric (default: MSE / Brier score). Must be a function with arguments response and prediction, and may optionally include weights, object and newdata arguments |
seed |
Random seed (argument parsed to future_Apply::future_lapply) |
shared |
Function applied to each fold with results send to each model |
args.pred |
Optional arguments to prediction function (see details below) |
args.future |
Arguments to future.apply::future_mapply |
mc.cores |
Optional number of cores. parallel::mcmapply used instead of future |
... |
Additional arguments parsed to models in models |
models should be list of objects of class ml_model. Alternatively, each element of models should be a list with a fitting function and a prediction function.
The response
argument can optionally be a named list where the name is
then used as the name of the response argument in models. Similarly, if data
is a named list with a single data.frame/matrix then this name will be used
as the name of the data/design matrix argument in models.
An object of class 'cross_validated
' is returned. See
cross_validated-class
for more details about this class and
its generic functions.
Klaus K. Holst
f0 <- function(data,...) lm(...,data=data) f1 <- function(data,...) lm(Sepal.Length~Species,data=data) f2 <- function(data,...) lm(Sepal.Length~Species+Petal.Length,data=data) x <- cv(list(m0=f0,m1=f1,m2=f2),rep=10, data=iris, formula=Sepal.Length~.) x
f0 <- function(data,...) lm(...,data=data) f1 <- function(data,...) lm(Sepal.Length~Species,data=data) f2 <- function(data,...) lm(Sepal.Length~Species+Petal.Length,data=data) x <- cv(list(m0=f0,m1=f1,m2=f2),rep=10, data=iris, formula=Sepal.Length~.) x
Extract design matrix from data.frame and formula
design( formula, data, intercept = FALSE, rm_envir = FALSE, ..., specials = c("weights", "offset") )
design( formula, data, intercept = FALSE, rm_envir = FALSE, ..., specials = c("weights", "offset") )
formula |
formula |
data |
data.frame |
intercept |
If FALSE (default) an intercept is not included |
rm_envir |
Remove environment |
... |
additional arguments (e.g, specials such weights, offsets, subset) |
specials |
character vector specifying functions in the formula that should be marked as special in the terms object |
An object of class 'design'
Klaus Kähler Holst
Let denote the clinical outcome,
the binary treatment
variable,
baseline covariates,
the failure time,
and
the cause of failure.
The following are our two target parameters
estimate_truncatedscore( data, mod.y, mod.r, mod.a, mod.event, time, cause = 1, cens.code = 0, naive = FALSE, ... )
estimate_truncatedscore( data, mod.y, mod.r, mod.a, mod.event, time, cause = 1, cens.code = 0, naive = FALSE, ... )
data |
(data.frame) |
mod.y |
(formula or ml_model) Model for clinical outcome given T>time. Using a formula specifies a glm with an identity link (see example). |
mod.r |
(formula or ml_model) Model for missing data mechanism for clinical outcome at T=time. Using a formula specifies a glm with a log link. |
mod.a |
(formula or ml_model) Treatment model (in RCT should just be 'a ~ 1'). Using a formula specifies a glm with a log link. |
mod.event |
(formula) Model for time-to-event process ('Event(time,status) ~ x'). |
time |
(numeric) Landmark time. |
cause |
(integer) Primary event (in the 'status' variable of the 'Event' statement). |
cens.code |
(integer) Censoring code. |
naive |
(logical) If TRUE, the unadjusted estimates ignoring baseline covariates is returned as the attribute 'naive'. |
... |
Additional arguments passed to mets::binregATE. |
lava::estimate.default object
Klaus Kähler Holst
## Not run: mod1 <- predictor_glm(y ~ a * (x1 + x2)) mod2 <- predictor_glm(r ~ a * (x1 + x2), family = binomial) a <- estimate_truncatedscore( data = dat, mod.y = mod1, mod.r = mod2, mod.a = a ~ 1, mod.event = mets::Event(time, status) ~ a * (x1+x2), time = 2 ) s <- summary(a, noninf.t = -0.1) print(s) parameter(s) # the above is equivalent to a <- estimate_truncatedscore( data = dat, mod.y = y ~ a * (x1 + x2), mod.r = r ~ a * (x1 + x2), mod.a = a ~ 1, mod.event = mets::Event(time, status) ~ a * (x1+x2), time = 2 ) ## End(Not run)
## Not run: mod1 <- predictor_glm(y ~ a * (x1 + x2)) mod2 <- predictor_glm(r ~ a * (x1 + x2), family = binomial) a <- estimate_truncatedscore( data = dat, mod.y = mod1, mod.r = mod2, mod.a = a ~ 1, mod.event = mets::Event(time, status) ~ a * (x1+x2), time = 2 ) s <- summary(a, noninf.t = -0.1) print(s) parameter(s) # the above is equivalent to a <- estimate_truncatedscore( data = dat, mod.y = y ~ a * (x1 + x2), mod.r = r ~ a * (x1 + x2), mod.a = a ~ 1, mod.event = mets::Event(time, status) ~ a * (x1+x2), time = 2 ) ## End(Not run)
Similar to expand.grid
function, this function creates all combinations
of the input arguments but returns the result as a list.
expand.list(...)
expand.list(...)
... |
input variables |
list
Klaus Kähler Holst
expand.list(x=2:4, z=c("a","b"))
expand.list(x=2:4, z=c("a","b"))
Wrapper for ml_model
ML(formula, model = "glm", ...)
ML(formula, model = "glm", ...)
formula |
formula |
model |
model (sl, rf, pf, glm, ...) |
... |
additional arguments to model object |
model 'sl' (SuperLearner::SuperLearner) args: SL.library, cvControl, family, method example:
model 'grf' (grf::regression_forest) args: num.trees, mtry, sample.weights, sample.fraction, min.node.size, ... example:
model 'grf.binary' (grf::probability_forest) args: num.trees, mtry, sample.weights, ... example:
model 'glm' args: family, weights, offset, ...
Provides standardized estimation and prediction methods
info
Optional information/name of the model
formals
List with formal arguments of estimation and prediction functions
formula
Formula specifying response and design matrix
args
additional arguments specified during initialization
description
optional description field
fit
Active binding returning estimated model object
new()
Create a new prediction model object
ml_model$new( formula = NULL, estimate, predict = stats::predict, predict.args = NULL, info = NULL, specials = c(), response.arg = "y", x.arg = "x", ... )
formula
formula specifying outcome and design matrix
estimate
function for fitting the model (must be a function response, 'y', and design matrix, 'x'. Alternatively, a function with a single 'formula' argument)
predict
prediction function (must be a function of model object, 'object', and new design matrix, 'newdata')
predict.args
optional arguments to prediction function
info
optional description of the model
specials
optional additional terms (weights, offset, id, subset, ...) passed to 'estimate'
response.arg
name of response argument
x.arg
name of design matrix argument
...
optional arguments to fitting function
estimate()
Estimation method
ml_model$estimate(data, ..., store = TRUE)
data
data.frame
...
Additional arguments to estimation method
store
Logical determining if estimated model should be stored inside the class.
predict()
Prediction method
ml_model$predict(newdata, ..., object = NULL)
newdata
data.frame
...
Additional arguments to prediction method
object
Optional model fit object
update()
Update formula
ml_model$update(formula, ...)
formula
formula or character which defines the new response
...
Additional arguments to lower level functions
print()
Print method
ml_model$print(...)
...
Additional arguments to lower level functions
response()
Extract response from data
ml_model$response(data, eval = TRUE, ...)
data
data.frame
eval
when FALSE return the untransformed outcome (i.e., return 'a' if formula defined as I(a==1) ~ ...)
...
additional arguments to 'design'
design()
Extract design matrix (features) from data
ml_model$design(data, ...)
data
data.frame
...
additional arguments to 'design'
opt()
Get options
ml_model$opt(arg, ...)
arg
name of option to get value of
...
additional arguments to lower level functions
clone()
The objects of this class are cloneable with this method.
ml_model$clone(deep = FALSE)
deep
Whether to make a deep clone.
Klaus Kähler Holst
predictor_sl
data(iris) rf <- function(formula, ...) ml_model$new(formula, info="grf::probability_forest", estimate=function(x,y, ...) grf::probability_forest(X=x, Y=y, ...), predict=function(object, newdata) predict(object, newdata)$predictions, ...) args <- expand.list(num.trees=c(100,200), mtry=1:3, formula=c(Species ~ ., Species ~ Sepal.Length + Sepal.Width)) models <- lapply(args, function(par) do.call(rf, par)) x <- models[[1]]$clone() x$estimate(iris) predict(x, newdata=head(iris)) # Reduce Ex. timing a <- targeted::cv(models, data=iris) cbind(coef(a), attr(args, "table")) ff <- ml_model$new(estimate=function(y,x) lm.fit(x=x, y=y), predict=function(object, newdata) newdata%*%object$coefficients) ## tmp <- ff$estimate(y, x=x) ## ff$predict(x)
data(iris) rf <- function(formula, ...) ml_model$new(formula, info="grf::probability_forest", estimate=function(x,y, ...) grf::probability_forest(X=x, Y=y, ...), predict=function(object, newdata) predict(object, newdata)$predictions, ...) args <- expand.list(num.trees=c(100,200), mtry=1:3, formula=c(Species ~ ., Species ~ Sepal.Length + Sepal.Width)) models <- lapply(args, function(par) do.call(rf, par)) x <- models[[1]]$clone() x$estimate(iris) predict(x, newdata=head(iris)) # Reduce Ex. timing a <- targeted::cv(models, data=iris) cbind(coef(a), attr(args, "table")) ff <- ml_model$new(estimate=function(y,x) lm.fit(x=x, y=y), predict=function(object, newdata) newdata%*%object$coefficients) ## tmp <- ff$estimate(y, x=x) ## ff$predict(x)
Naive Bayes Classifier
NB( formula, data, weights = NULL, kernel = FALSE, laplace.smooth = 0, prior = NULL, ... )
NB( formula, data, weights = NULL, kernel = FALSE, laplace.smooth = 0, prior = NULL, ... )
formula |
Formula with syntax: response ~ predictors | weights |
data |
data.frame |
weights |
optional frequency weights |
kernel |
If TRUE a kernel estimator is used for numeric predictors (otherwise a gaussian model is used) |
laplace.smooth |
Laplace smoothing |
prior |
optional prior probabilities (default estimated from data) |
... |
additional arguments to lower level functions |
An object of class 'NB
' is returned. See
NB-class
for more details about this class and
its generic functions.
Klaus K. Holst
data(iris) m2 <- NB(Species ~ Sepal.Width + Petal.Length, data=iris) pr2 <- predict(m2, newdata=iris)
data(iris) m2 <- NB(Species ~ Sepal.Width + Petal.Length, data=iris) pr2 <- predict(m2, newdata=iris)
The functions NB
returns an object of the type
NB
.
An object of class 'NB
' is a list with at least the following components:
Matrix with prior probabilities, i.e. marginal class probabilities Pr(class)
list of matrices with conditional probabilities of the features given the classes (one list element per class), Pr(x|class)
Names (character vector) of the classes
number of repetitions of the CV
Number of folds of the CV
Number of folds of the CV
objects of the S3 class 'NB
'
The following S3 generic functions are available for an object of class NB
:
predict
Predict class probabilities for new features data.
print
Basic print method.
## See example(NB) for examples
## See example(NB) for examples
Find the non-dominated point of a set (minima of a point set).
nondom(x, ...)
nondom(x, ...)
x |
matrix |
... |
additional arguments to lower level functions |
A point x dominates y if it is never worse and at least in one case strictly better. Formally, let f_i denote the ith coordinate of the condition (objective) function, then for all i: f_i(x)<=f_i(y) and there exists j: f_j(x)<f_j(y).
Based on the algorithm of Kung et al. 1975.
matrix
Klaus Kähler Holst
rbind( c(1.0, 0.5), c(0.0, 1.0), c(1.0, 0.0), c(0.5, 1.0), c(1.0, 1.0), c(0.8, 0.8)) |> nondom()
rbind( c(1.0, 0.5), c(0.0, 1.0), c(1.0, 0.0), c(0.5, 1.0), c(1.0, 1.0), c(0.8, 0.8)) |> nondom()
Pooled Adjacent Violators Algorithm
pava(y, x = numeric(0), weights = numeric(0))
pava(y, x = numeric(0), weights = numeric(0))
y |
response variable |
x |
(optional) predictor vector (otherwise y is assumed to be a priori sorted according to relevant predictor) |
weights |
weights (optional) weights |
List with index (idx) of jump points and values (value) at each jump point.
Klaus K. Holst
x <- runif(5e3, -5, 5) pr <- lava::expit(-1 + x) y <- rbinom(length(pr), 1, pr) pv <- pava(y, x) plot(pr ~ x, cex=0.3) with(pv, lines(sort(x)[index], value, col="red", type="s"))
x <- runif(5e3, -5, 5) pr <- lava::expit(-1 + x) y <- rbinom(length(pr), 1, pr) pv <- pava(y, x) plot(pr ~ x, cex=0.3) with(pv, lines(sort(x)[index], value, col="red", type="s"))
Kernel density estimator predictions
## S3 method for class 'density' predict(object, xnew, ...)
## S3 method for class 'density' predict(object, xnew, ...)
object |
density object |
xnew |
New data on which to make predictions for |
... |
additional arguments to lower level functions |
Klaus K. Holst
Naive Bayes Classifier predictions
## S3 method for class 'NB' predict(object, newdata, expectation = NULL, threshold = c(0.001, 0.001), ...)
## S3 method for class 'NB' predict(object, newdata, expectation = NULL, threshold = c(0.001, 0.001), ...)
object |
density object |
newdata |
new data on which to make predictions |
expectation |
Variable to calculate conditional expectation wrt probabilities from NB classifier |
threshold |
Threshold parameters. First element defines the threshold on the probabilities and the second element the value to set those truncated probabilities to. |
... |
Additional arguments to lower level functions |
Klaus K. Holst
This function creates a predictor object (class ml_model) from a list of existing ml_model objects. When estimating this model a stacked prediction will be created by weighting together the predictions of each of the initial models. The weights are learned using cross-validation.
predictor_sl( model.list, info = NULL, nfolds = 5L, meta.learner = metalearner_nnls, model.score = mse, ... )
predictor_sl( model.list, info = NULL, nfolds = 5L, meta.learner = metalearner_nnls, model.score = mse, ... )
model.list |
List of ml_model objects (i.e. predictor_glm) |
info |
Optional model description to store in model object |
nfolds |
Number of folds to use in cross validation |
meta.learner |
meta.learner function (default non-negative least
squares). Must be a function of the response (nx1 vector), |
model.score |
model scoring method (see ml_model) |
... |
additional argument to |
Luedtke & van der Laan (2016) Super-Learning of an Optimal Dynamic Treatment Rule, The International Journal of Biostatistics.
ml_model predictor_glm predictor_xgboost
sim1 <- function(n = 5e2) { n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) d <- data.frame(y, x1, x2) d } d <- sim1() |> mets::dsort(~x1) m <- list( "mean" = predictor_glm(y ~ 1), "glm" = predictor_glm(y ~ x1 + x2), "iso" = predictor_isoreg(y ~ x1) ) s <- predictor_sl(m, nfolds=10) s$estimate(d) pr <- s$predict(d) if (interactive()) { plot(y ~ x1, data = d) points(d$x1, pr, col = 2, cex = 0.5) lines(cos(x1) + x1 ~ x1, data = d, lwd = 4, col = lava::Col("darkblue", 0.3)) } print(s) ## weights(s) ## score(s) cvres <- summary(s, data=d, nfolds=3, rep=2) cvres ## coef(cvres) ## score(cvres)
sim1 <- function(n = 5e2) { n <- 5e2 x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) d <- data.frame(y, x1, x2) d } d <- sim1() |> mets::dsort(~x1) m <- list( "mean" = predictor_glm(y ~ 1), "glm" = predictor_glm(y ~ x1 + x2), "iso" = predictor_isoreg(y ~ x1) ) s <- predictor_sl(m, nfolds=10) s$estimate(d) pr <- s$predict(d) if (interactive()) { plot(y ~ x1, data = d) points(d$x1, pr, col = 2, cex = 0.5) lines(cos(x1) + x1 ~ x1, data = d, lwd = 4, col = lava::Col("darkblue", 0.3)) } print(s) ## weights(s) ## score(s) cvres <- summary(s, data=d, nfolds=3, rep=2) cvres ## coef(cvres) ## score(cvres)
Estimation of the Average Treatment Effect among Responders
RATE( response, post.treatment, treatment, data, family = gaussian(), M = 5, pr.treatment, treatment.level, SL.args.response = list(family = gaussian(), SL.library = c("SL.mean", "SL.glm")), SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")), preprocess = NULL, efficient = TRUE, ... )
RATE( response, post.treatment, treatment, data, family = gaussian(), M = 5, pr.treatment, treatment.level, SL.args.response = list(family = gaussian(), SL.library = c("SL.mean", "SL.glm")), SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")), preprocess = NULL, efficient = TRUE, ... )
response |
Response formula (e.g, Y~D*A) |
post.treatment |
Post treatment marker formula (e.g., D~W) |
treatment |
Treatment formula (e.g, A~1) |
data |
data.frame |
family |
Exponential family for response (default gaussian) |
M |
Number of folds in cross-fitting (M=1 is no cross-fitting) |
pr.treatment |
(optional) Randomization probability of treatment. |
treatment.level |
Treatment level in binary treatment (default 1) |
SL.args.response |
Arguments to SuperLearner for the response model |
SL.args.post.treatment |
Arguments to SuperLearner for the post treatment indicator |
preprocess |
(optional) Data preprocessing function |
efficient |
If TRUE, the estimate will be efficient. If FALSE, the estimate will be a simple plug-in estimate. |
... |
Additional arguments to lower level functions |
estimate object
Andreas Nordland, Klaus K. Holst
Estimation of the Average Treatment Effect among Responders for Survival Outcomes
RATE.surv( response, post.treatment, treatment, censoring, tau, data, M = 5, pr.treatment, call.response, args.response = list(), SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")), call.censoring, args.censoring = list(), preprocess = NULL, ... )
RATE.surv( response, post.treatment, treatment, censoring, tau, data, M = 5, pr.treatment, call.response, args.response = list(), SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")), call.censoring, args.censoring = list(), preprocess = NULL, ... )
response |
Response formula (e.g., Surv(time, event) ~ D + W). |
post.treatment |
Post treatment marker formula (e.g., D ~ W). |
treatment |
Treatment formula (e.g., A ~ 1). |
censoring |
Censoring formula (e.g., Surv(time, event == 0) ~ D + A + W)). |
tau |
Time-point of interest, see Details. |
data |
data.frame. |
M |
Number of folds in cross-fitting (M=1 is no cross-fitting). |
pr.treatment |
(optional) Randomization probability of treatment. |
call.response |
Model call for the response model (e.g. "mets::phreg"). |
args.response |
Additional arguments to the response model. |
SL.args.post.treatment |
Additional arguments to SuperLearner for the post treatment indicator model. |
call.censoring |
Similar to call.response. |
args.censoring |
Similar to args.response. |
preprocess |
(optional) Data pre-processing function. |
... |
Additional arguments to lower level data pre-processing functions. |
Estimation of
under right censoring based on plug-in estimates of and
.
An efficient one-step estimator of is constructed using
the efficient influence function
An efficient one-step estimator of is constructed using the efficient influence function
estimate object
Andreas Nordland, Klaus K. Holst
Risk regression with binary exposure and nuisance model for the odds-product.
Let be the binary exposure,
the set of covariates, and
the binary response variable, and define
.
The target parameter is either the relative risk
or the risk difference
We assume a target parameter model given by either
or
and similarly a working linear nuisance model for the odds-product
.
A propensity model for is also fitted using a logistic
regression working model
If both the odds-product model and the propensity model are correct the estimator is efficient. Further, the estimator is consistent in the union model, i.e., the estimator is double-robust in the sense that only one of the two models needs to be correctly specified to get a consistent estimate.
riskreg( formula, nuisance = ~1, propensity = ~1, target = ~1, data, weights, type = "rr", optimal = TRUE, std.err = TRUE, start = NULL, mle = FALSE, ... )
riskreg( formula, nuisance = ~1, propensity = ~1, target = ~1, data, weights, type = "rr", optimal = TRUE, std.err = TRUE, start = NULL, mle = FALSE, ... )
formula |
formula (see details below) |
nuisance |
nuisance model (formula) |
propensity |
propensity model (formula) |
target |
(optional) target model (formula) |
data |
data.frame |
weights |
optional weights |
type |
type of association measure (rd og rr) |
optimal |
If TRUE optimal weights are calculated |
std.err |
If TRUE standard errors are calculated |
start |
optional starting values |
mle |
Semi-parametric (double-robust) estimate or MLE (TRUE gives MLE) |
... |
additional arguments to unconstrained optimization routine (nlminb) |
The 'formula' argument should be given as response ~ exposure
| target-formula | nuisance-formula
or response ~ exposure | target
| nuisance | propensity
E.g., riskreg(y ~ a | 1 | x+z | x+z, data=...)
Alternatively, the model can specifed using the target, nuisance and
propensity arguments: riskreg(y ~ a, target=~1, nuisance=~x+z, ...)
The riskreg_fit
function can be used with matrix inputs rather than
formulas.
An object of class 'riskreg.targeted
' is returned. See
targeted-class
for more details about this class and its
generic functions.
Klaus K. Holst
Richardson, T. S., Robins, J. M., & Wang, L. (2017). On modeling and estimation for the relative risk and risk difference. Journal of the American Statistical Association, 112(519), 1121–1130. http://dx.doi.org/10.1080/01621459.2016.1192546
m <- lvm(a[-2] ~ x, z ~ 1, lp.target[1] ~ 1, lp.nuisance[-1] ~ 2*x) distribution(m,~a) <- binomial.lvm("logit") m <- binomial.rr(m, "y","a","lp.target","lp.nuisance") d <- sim(m,5e2,seed=1) I <- model.matrix(~1, d) X <- model.matrix(~1+x, d) with(d, riskreg_mle(y, a, I, X, type="rr")) with(d, riskreg_fit(y, a, nuisance=X, propensity=I, type="rr")) riskreg(y ~ a | 1, nuisance=~x , data=d, type="rr") ## Model with same design matrix for nuisance and propensity model: with(d, riskreg_fit(y, a, nuisance=X, type="rr")) ## a <- riskreg(y ~ a, target=~z, nuisance=~x, propensity=~x, data=d, type="rr") a <- riskreg(y ~ a | z, nuisance=~x, propensity=~x, data=d, type="rr") a predict(a, d[1:5,]) riskreg(y ~ a, nuisance=~x, data=d, type="rr", mle=TRUE)
m <- lvm(a[-2] ~ x, z ~ 1, lp.target[1] ~ 1, lp.nuisance[-1] ~ 2*x) distribution(m,~a) <- binomial.lvm("logit") m <- binomial.rr(m, "y","a","lp.target","lp.nuisance") d <- sim(m,5e2,seed=1) I <- model.matrix(~1, d) X <- model.matrix(~1+x, d) with(d, riskreg_mle(y, a, I, X, type="rr")) with(d, riskreg_fit(y, a, nuisance=X, propensity=I, type="rr")) riskreg(y ~ a | 1, nuisance=~x , data=d, type="rr") ## Model with same design matrix for nuisance and propensity model: with(d, riskreg_fit(y, a, nuisance=X, type="rr")) ## a <- riskreg(y ~ a, target=~z, nuisance=~x, propensity=~x, data=d, type="rr") a <- riskreg(y ~ a | z, nuisance=~x, propensity=~x, data=d, type="rr") a predict(a, d[1:5,]) riskreg(y ~ a, nuisance=~x, data=d, type="rr", mle=TRUE)
Binary regression models with right censored outcomes
riskreg_cens( response, censoring, treatment = NULL, prediction = NULL, data, newdata, tau, type = "risk", M = 1, call.response = "phreg", args.response = list(), call.censoring = "phreg", args.censoring = list(), preprocess = NULL, efficient = TRUE, control = list(), ... )
riskreg_cens( response, censoring, treatment = NULL, prediction = NULL, data, newdata, tau, type = "risk", M = 1, call.response = "phreg", args.response = list(), call.censoring = "phreg", args.censoring = list(), preprocess = NULL, efficient = TRUE, control = list(), ... )
response |
Response formula (e.g., Surv(time, event) ~ D + W). |
censoring |
Censoring formula (e.g., Surv(time, event == 0) ~ D + A + W)). |
treatment |
Optional treatment model (ml_model) |
prediction |
Optional prediction model (ml_model) |
data |
data.frame. |
newdata |
Optional data.frame. In this case the uncentered influence function evalued in 'newdata' is returned with nuisance parameters obtained from 'data'. |
tau |
Time-point of interest, see Details. |
type |
"risk", "treatment", "rmst", "brier" |
M |
Number of folds in cross-fitting (M=1 is no cross-fitting). |
call.response |
Model call for the response model (e.g. "mets::phreg"). |
args.response |
Additional arguments to the response model. |
call.censoring |
Similar to call.response. |
args.censoring |
Similar to args.response. |
preprocess |
(optional) Data pre-processing function. |
efficient |
If FALSE an IPCW estimator is returned |
control |
See details |
... |
Additional arguments to lower level data pre-processing functions. |
The one-step estimator depends on the calculation of an integral
wrt. the martingale process corresponding to the counting process N(t) =
I(C>min(T,tau)). This can be decomposed into an integral wrt the counting
process, and the compensator
where the
latter term can be computational intensive to calculate. Rather than
calculating this integral in all observed time points, we can make a
coarser evaluation which can be controlled by setting
control=(sample=N)
. With N=0
the (computational intensive)
standard evaluation is used.##'
estimate object
Klaus K. Holst, Andreas Nordland
Predictive model scoring
scoring( response, ..., type = "quantitative", levels = NULL, metrics = NULL, weights = NULL, names = NULL, object = NULL, newdata = NULL, messages = 1 )
scoring( response, ..., type = "quantitative", levels = NULL, metrics = NULL, weights = NULL, names = NULL, object = NULL, newdata = NULL, messages = 1 )
response |
Observed response |
... |
model predictions (continuous predictions or class probabilities (matrices)) |
type |
continuous or categorical response (the latter is automatically chosen if response is a factor, otherwise a continuous response is assumed) |
levels |
(optional) unique levels in response variable |
metrics |
which metrics to report |
weights |
optional frequency weights |
names |
optional names of models coments (given as ..., alternatively these can be named arguments) |
object |
optional model object |
newdata |
optional new data.frame |
messages |
controls amount of messages/warnings (0: none) |
Numeric matrix of dimension m x p, where m is the number of different models and p is the number of model metrics
data(iris) set.seed(1) dat <- csplit(iris,2) g1 <- NB(Species ~ Sepal.Width + Petal.Length, data=dat[[1]]) g2 <- NB(Species ~ Sepal.Width, data=dat[[1]]) pr1 <- predict(g1, newdata=dat[[2]], wide=TRUE) pr2 <- predict(g2, newdata=dat[[2]], wide=TRUE) table(colnames(pr1)[apply(pr1,1,which.max)], dat[[2]]$Species) table(colnames(pr2)[apply(pr2,1,which.max)], dat[[2]]$Species) scoring(dat[[2]]$Species, pr1=pr1, pr2=pr2) ## quantitative response: scoring(response=1:10, prediction=rnorm(1:10))
data(iris) set.seed(1) dat <- csplit(iris,2) g1 <- NB(Species ~ Sepal.Width + Petal.Length, data=dat[[1]]) g2 <- NB(Species ~ Sepal.Width, data=dat[[1]]) pr1 <- predict(g1, newdata=dat[[2]], wide=TRUE) pr2 <- predict(g2, newdata=dat[[2]], wide=TRUE) table(colnames(pr1)[apply(pr1,1,which.max)], dat[[2]]$Species) table(colnames(pr2)[apply(pr2,1,which.max)], dat[[2]]$Species) scoring(dat[[2]]$Species, pr1=pr1, pr2=pr2) ## quantitative response: scoring(response=1:10, prediction=rnorm(1:10))
SuperLearner wrapper for ml_model
SL( formula = ~., ..., SL.library = c("SL.mean", "SL.glm"), binomial = FALSE, data = NULL, info = "SuperLearner" )
SL( formula = ~., ..., SL.library = c("SL.mean", "SL.glm"), binomial = FALSE, data = NULL, info = "SuperLearner" )
formula |
Model design |
... |
Additional arguments for SuperLearner::SuperLearner |
SL.library |
character vector of prediction algorithms |
binomial |
boolean specifying binomial or gaussian family (default FALSE) |
data |
Optional data.frame |
info |
model information (optional) |
ml_model object
Klaus Kähler Holst
Softmax transformation
softmax(x, log = FALSE, ref = TRUE, ...)
softmax(x, log = FALSE, ref = TRUE, ...)
x |
Input matrix (e.g., linear predictors of multinomial logistic model) |
log |
Return on log-scale (default FALSE) |
ref |
Add reference level (add 0 column to x) |
... |
Additional arguments to lower level functions |
Numeric matrix of dimension n x p, where n= nrow(x)
and
p = ncol(x) + (ref==TRUE)
Solve ODE with Runge-Kutta method (RK4)
solve_ode(ode_ptr, input, init, par = 0)
solve_ode(ode_ptr, input, init, par = 0)
ode_ptr |
pointer (externalptr) to C++ function or an R function |
input |
Input matrix. 1st column specifies the time points |
init |
Initial conditions |
par |
Parameters defining the ODE (parsed to ode_ptr) |
The external point should be created with the function targeted::specify_ode
.
Matrix with solution
Klaus Kähler Holst
specify_ode
example(specify_ode)
example(specify_ode)
Extract model component from design object
## S3 method for class 'design' specials(object, which, ...)
## S3 method for class 'design' specials(object, which, ...)
object |
design object |
which |
model component (e.g., "offset", "weights", ...) |
... |
Additional arguments to lower level functions |
Define compiled code for ordinary differential equation.
specify_ode(code, fname = NULL, pname = c("dy", "x", "y", "p"))
specify_ode(code, fname = NULL, pname = c("dy", "x", "y", "p"))
code |
string with the body of the function definition (see details) |
fname |
Optional name of the exported C++ function |
pname |
Vector of variable names (results, inputs, states, parameters) |
The model (code
) should be specified as the body of of C++ function.
The following variables are defined bye default (see the argument pname
)
Vector with derivatives, i.e. the rhs of the ODE (the result).
Vector with the first element being the time, and the following elements additional exogenous input variables,
Vector with the dependent variable
Parameter vector
All variables are treated as Armadillo (http://arma.sourceforge.net/) vectors/matrices.
As an example consider the Lorenz Equations
We can specify this model as
ode <- 'dy(0) = p(0)*(y(1)-y(0));
dy(1) = y(0)*(p(1)-y(2));
dy(2) = y(0)*y(1)-p(2)*y(2);'
dy <- specify_ode(ode)
As an example of model with exogenous inputs consider the following ODE:
This could be specified as
mod <- 'double t = x(0);
dy = p(0) + p(1)*y + p(2)*x(1)*y + p(3)*x(1)*t;'
dy <- specify_ode(mod)
##'
pointer (externalptr) to C++ function
Klaus Kähler Holst
solve_ode
The functions riskreg
and ate
returns an object of the type targeted
.
An object of class 'targeted
' is a list with at least the following components:
An estimate
object with the target parameter
estimates (see estimate.default
)
Object returned from the applied optimization routine
number of parameters of the model (target and nuisance)
String describing the model
objects of the S3 class 'targeted
'
The following S3 generic functions are available for an object of class targeted
:
coef
Extract target coefficients of the estimated model.
vcov
Extract the variance-covariance matrix of the target parameters.
IC
Extract the estimated influence function.
print
Print estimates of the target parameters.
summary
Extract information on both target parameters and estimated nuisance model.
'
## See example(riskreg) for examples
## See example(riskreg) for examples
Signed intersection Wald test
test_intersectsignedwald( thetahat1, se1, thetahat2, se2, noninf1, noninf2, corr, alpha )
test_intersectsignedwald( thetahat1, se1, thetahat2, se2, noninf1, noninf2, corr, alpha )
thetahat1 |
(numeric) parameter estimate 1 |
se1 |
(numeric) standard error of parameter estimate 1 |
thetahat2 |
(numeric) parameter estimate 2 |
se2 |
(numeric) standard error of parameter estimate 2 |
noninf1 |
(numeric) non-inferiority margin for parameter 1 |
noninf2 |
(numeric) non-inferiority margin for parameter 2 |
corr |
(numeric) correlation between parameter 1 and 2 |
alpha |
(numeric) nominal level |
list with Wald
Christian Bressen Pipper, Klaus Kähler Holst