---
title: "IPCW Cumulative Cost"
author: Klaus Holst & Thomas Scheike
date: "2026-05-24"
output:
rmarkdown::html_vignette:
fig_caption: no
header-includes:
- \usepackage{tikz}
vignette: >
%\VignetteIndexEntry{IPCW Cumulative Cost}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
Overview
========
We describe how to perform regression modelling for cumulative cost
\begin{align*}
{\cal U}(t) & = \int_0^t Z(s) dN(s)
\end{align*}
where $N(s)$ is a counting process that registers the times at which costs are realised and accumulated,
and $Z(t)$ is the cost (or mark) at the event times. The counting process can be
a mix of random and fixed times, and the data are represented in counting process format with
the marks/costs attached to the event times. There are many additional uses of
such cumulative processes; for example, when considering time lost in a recurrent events setting, which we return to
below.
We can estimate the marginal mean of the cumulative process
\begin{align*}
\nu(t) & = E ( {\cal U}(t) )
\end{align*}
possibly for strata with standard errors based on the derived influence function.
We provide semi-parametric regression modelling using the proportional model
\begin{align*}
E ( {\cal U}(t) | X) & = \Lambda_0(t) \exp( X^T \beta).
\end{align*}
In addition for a fixed time-point $t \in [0,\tau]$ we can estimate the mean given covariates
\begin{align*}
E ( {\cal U}(t) | X) & = \exp( X^T \beta)
\end{align*}
where $\tau$ is some maximum follow-up time.
- These quantities are estimated under independent right-censoring given $X$, using IPCW-adjusted estimating equations,
- similarly to the Ghosh-Lin model for recurrent events.
- A terminal event can be specified.
We also estimate the probability of exceeding thresholds over time
\begin{align*}
P ( {\cal U}(t) > k ) & = \mu_k(t),
\end{align*}
and in the setting with a terminal event this is based on a derived competing risks data structure that
keeps track of the competing terminal event.
\begin{tikzpicture}[
node distance=3.5cm,
box/.style={draw, rectangle, minimum width=2.8cm, minimum height=1cm, align=center}
]
% Nodes
\node[box] (start) {At Risk};
\node[box, right of=start, yshift=2.4cm] (exceed) {Exceed-k};
\node[box, right of=start, yshift=-2.4cm] (death) {D};
% Arrows with hazard labels
\draw[->, thick] (start) -- node[below] {\hspace{2 cm} $F_{exceed-k}(t)$} (exceed);
\draw[->, thick] (start) -- node[below] {$F_{D}(t)$} (death);
\end{tikzpicture}
Regression modelling of this quantity is also possible using competing risks
regression models, for example via the `cifreg` function in mets.
HF-action data
==============
Using the HF-action data, we simulate a severity score for each event.
``` r
library(mets)
data(hfactioncpx12)
hf <- hfactioncpx12
set.seed(1)
hf$severity <- abs((5+rnorm(741)*2))[hf$id]
## marginal mean using formula
outNZ <- recurrent_marginal(Event(entry,time,status)~strata(treatment)+cluster(id)
+marks(severity),hf,cause=1,death.code=2)
plot(outNZ,se=TRUE)
summary(outNZ,times=3)
#> [[1]]
#> new.time mean se CI-2.5% CI-97.5% strata
#> 682 3 10.3249 0.6044746 9.2056 11.5803 0
#>
#> [[2]]
#> new.time mean se CI-2.5% CI-97.5% strata
#> 601 3 9.33613 0.6429995 8.157232 10.68541 1
outN <- recurrent_marginal(Event(entry,time,status)~strata(treatment)+cluster(id),data=hf,
cause=1,death.code=2)
plot(outN,se=TRUE,add=TRUE)
```
``` r
summary(outN,times=3)
#> [[1]]
#> new.time mean se CI-2.5% CI-97.5% strata
#> 682 3 2.118496 0.1138572 1.906692 2.353829 0
#>
#> [[2]]
#> new.time mean se CI-2.5% CI-97.5% strata
#> 601 3 1.924062 0.1216577 1.699801 2.177912 1
```
For comparison we also compute the IPCW estimates with and without marks at time 3 using the linear model,
and note that they are identical. Standard errors are based on different formulae that are asymptotically
equivalent, and we note that they are very similar.
``` r
outNZ3 <- recregIPCW(Event(entry,time,status)~-1+treatment+cluster(id)+marks(severity),data=hf,
cause=1,death.code=2,time=3,cens.model=~strata(treatment),model="lin")
summary(outNZ3)
#> n events
#> 741 1281
#>
#> 741 clusters
#> coeffients:
#> Estimate Std.Err 2.5% 97.5% P-value
#> treatment0 10.32490 0.60444 9.14022 11.50959 0
#> treatment1 9.33613 0.64295 8.07597 10.59629 0
head(iid(outNZ3))
#> [,1] [,2]
#> 1 -0.0013037837 0.0000000
#> 2 0.0089249278 0.0000000
#> 3 0.0000000000 0.0211858
#> 4 -0.0056613816 0.0000000
#> 5 -0.0005288663 0.0000000
#> 6 -0.0342554114 0.0000000
outN3 <- recregIPCW(Event(entry,time,status)~-1+treatment+cluster(id),data=hf,cause=1,death.code=2,time=3,
cens.model=~strata(treatment),model="lin")
summary(outN3)
#> n events
#> 741 1281
#>
#> 741 clusters
#> coeffients:
#> Estimate Std.Err 2.5% 97.5% P-value
#> treatment0 2.11850 0.11385 1.89535 2.34164 0
#> treatment1 1.92406 0.12165 1.68564 2.16248 0
head(iid(outN3))
#> [,1] [,2]
#> 1 0.0004542472 0.000000000
#> 2 0.0009756994 0.000000000
#> 3 0.0000000000 0.009301496
#> 4 -0.0029668336 0.000000000
#> 5 -0.0001120764 0.000000000
#> 6 -0.0070693971 0.000000000
```
We also apply the semiparametric proportional cost model with IPCW adjustment:
``` r
propNZ <- recreg(Event(entry,time,status)~treatment+marks(severity)+cluster(id),data=hf,cause=1,death.code=2)
summary(propNZ)
#>
#> n events
#> 2132 1391
#>
#> 741 clusters
#> coefficients:
#> Estimate S.E. dU^-1/2 P-value
#> treatment1 -0.101014 0.089005 0.024304 0.2564
#>
#> exp(coefficients):
#> Estimate 2.5% 97.5%
#> treatment1 0.90392 0.75922 1.0762
plot(propNZ,main="Baselines")
GL <- recreg(Event(entry,time,status)~treatment+cluster(id),hf,cause=1,death.code=2)
summary(GL)
#>
#> n events
#> 2132 1391
#>
#> 741 clusters
#> coefficients:
#> Estimate S.E. dU^-1/2 P-value
#> treatment1 -0.110404 0.078656 0.053776 0.1604
#>
#> exp(coefficients):
#> Estimate 2.5% 97.5%
#> treatment1 0.89547 0.76754 1.0447
plot(GL,add=TRUE,col=2)
```
Those treated have 14\% lower cumulative severity and 11\% lower expected number of events.
Exceed threshold
================
Finally, we estimate the probability of exceeding cumulative severity thresholds of 1, 5, and 10:
``` r
ooNZ <- prob_exceed_recurrent(Event(entry,time,status)~strata(treatment)+cluster(id)+marks(severity),data=hf,
cause=1,death.code=2,exceed=c(1,5,10,20))
plot(ooNZ,strata=1)
plot(ooNZ,strata=2,add=TRUE)
```
``` r
summary(ooNZ,times=3)
#> $`0`
#> $`0`$prob
#> times
#> 3 2.99865085
#> N<1 3 0.06474819
#> exceed>=1 3 0.93525181
#> exceed>=5 3 0.90118229
#> exceed>=10 3 0.74537955
#> exceed>=20 3 0.49282394
#>
#> $`0`$se
#> times
#> 3 2.99865085
#> N<1 3 0.02657120
#> exceed>=1 3 0.02657120
#> exceed>=5 3 0.03013468
#> exceed>=10 3 0.04466101
#> exceed>=20 3 0.05149605
#>
#> $`0`$lower
#> times
#> [1,] 3 2.9986509
#> [2,] 3 0.1154033
#> [3,] 3 0.8845967
#> [4,] 3 0.8440133
#> [5,] 3 0.6627899
#> [6,] 3 0.4015580
#>
#> $`0`$upper
#> times
#> [1,] 3 2.99865085
#> [2,] 3 0.01119234
#> [3,] 3 0.98880766
#> [4,] 3 0.96222364
#> [5,] 3 0.83826056
#> [6,] 3 0.60483284
#>
#>
#> $`1`
#> $`1`$prob
#> times
#> 3 2.99865085
#> N<1 3 0.01770732
#> exceed>=1 3 0.98229268
#> exceed>=5 3 0.94934557
#> exceed>=10 3 0.79560979
#> exceed>=20 3 0.53319063
#>
#> $`1`$se
#> times
#> 3 2.99865085
#> N<1 3 0.01097663
#> exceed>=1 3 0.01097663
#> exceed>=5 3 0.01645312
#> exceed>=10 3 0.03727391
#> exceed>=20 3 0.05545584
#>
#> $`1`$lower
#> times
#> [1,] 3 2.99865085
#> [2,] 3 0.03898725
#> [3,] 3 0.96101275
#> [4,] 3 0.91763959
#> [5,] 3 0.72580801
#> [6,] 3 0.43486168
#>
#> $`1`$upper
#> times
#> [1,] 3 2.9986509
#> [2,] 3 0.0000000
#> [3,] 3 1.0000000
#> [4,] 3 0.9821470
#> [5,] 3 0.8721245
#> [6,] 3 0.6537533
```
Cumulative time lost for recurrent events
=========================================
The cumulative time lost for recurrent events is defined as
\begin{align*}
{\cal M}(t) = E\left[ \int_0^\tau (\tau-s) dN(s) \right] = \int_0^\tau \mu(s) ds
\end{align*}
where $\mu(t) = E( N(t) )$ is the marginal mean of the recurrent events at time $t$.
``` r
hf$lost5 <- 5-hf$time
RecLost <- recregIPCW(Event(entry,time,status)~-1+treatment+cluster(id)+marks(lost5),data=hf,
cause=1,death.code=2,time=5,cens.model=~strata(treatment),model="lin")
summary(RecLost)
#> n events
#> 741 1391
#>
#> 741 clusters
#> coeffients:
#> Estimate Std.Err 2.5% 97.5% P-value
#> treatment0 8.58300 0.42951 7.74118 9.42482 0
#> treatment1 7.66234 0.46400 6.75292 8.57177 0
head(iid(RecLost))
#> [,1] [,2]
#> 1 0.0016920221 0.00000000
#> 2 0.0073388996 0.00000000
#> 3 0.0000000000 0.02120478
#> 4 -0.0095548150 0.00000000
#> 5 -0.0005696809 0.00000000
#> 6 -0.0201750011 0.00000000
```
SessionInfo
============
``` r
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/kkzh/.asdf/installs/r/4.6.0/lib/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Copenhagen
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] timereg_2.0.7 survival_3.8-6 mets_1.3.10
#>
#> loaded via a namespace (and not attached):
#> [1] cli_3.6.6 knitr_1.51 rlang_1.2.0
#> [4] xfun_0.57 otel_0.2.0 future.apply_1.20.2
#> [7] listenv_0.10.1 lava_1.9.1 stats4_4.6.0
#> [10] grid_4.6.0 evaluate_1.0.5 yaml_2.3.12
#> [13] mvtnorm_1.3-7 numDeriv_2016.8-1.1 compiler_4.6.0
#> [16] codetools_0.2-20 Rcpp_1.1.1-1.1 ucminf_1.2.3
#> [19] future_1.70.0 lattice_0.22-9 digest_0.6.39
#> [22] parallelly_1.47.0 parallel_4.6.0 splines_4.6.0
#> [25] Matrix_1.7-5 tools_4.6.0 RcppArmadillo_15.2.6-1
#> [28] globals_0.19.1
```