Comparing dose-escalation designs by simulation

The escalation package by Kristian Brock. Documentation is hosted at https://brockk.github.io/escalation/

Introduction

This vignette focuses on the common task of comparing competing dose-escalation designs by simulation. Before reading on, be sure you have read the README file for a general explanation of how to compose dose-finding designs in escalation; and the Simulation vignette for a general introduction to using simulation.

Comparing competing designs

In simulate_compare, we implement the method of Sweeting et al. (2024) to efficiently compare dose-finding designs. The crux of this method is to ensure that the same simulated patients are used for each competing design so that, for instance, if a patient is given dose 2 by one design and experiences toxicity, then that patient will also experience toxicity if given dose 2 or above by any other design. Ensuring consistency across simulated iterates in this way reduces Monte Carlo error and allows much faster identification of differences between designs.

For example, let us compare the behaviour of the 3+3 and CRM designs investigated above. We start by defining the competing designs in a list with convenient names:

library(escalation)

target <- 0.25
skeleton <- c(0.05, 0.1, 0.25, 0.4, 0.6)

designs <- 
  list(
    "3+3" = get_three_plus_three(num_doses = 5),
    "CRM" = get_dfcrm(skeleton = skeleton, target = target) %>%
      stop_at_n(n = 12),
    "Stopping CRM" = get_dfcrm(skeleton = skeleton, target = target) %>%
      stop_at_n(n = 12) %>%
      stop_when_too_toxic(dose = 1, tox_threshold = 0.35, confidence = 0.8)
  )

Here we will compare three designs: 3+3; CRM without a toxicity stopping rule; and an otherwise identical CRM design with a toxicity stopping rule. The names we provide will be reused.

For illustration we use only a small number of replicates:

num_sims <- 20
true_prob_tox <- c(0.12, 0.27, 0.44, 0.53, 0.57)

sims <- simulate_compare(
  designs, 
  num_sims = num_sims, 
  true_prob_tox = true_prob_tox
)
#> Running 3+3 
#> Running CRM 
#> Running Stopping CRM

We can vertically stack the simulated performance of each design:

summary(sims)
#> # A tibble: 18 × 7
#>    dose     tox     n true_prob_tox prob_recommend prob_administer design      
#>    <ord>  <dbl> <dbl>         <dbl>          <dbl>           <dbl> <chr>       
#>  1 NoDose  0     0             0               0.2           0     3+3         
#>  2 1       0.6   3.75          0.12            0.6           0.463 3+3         
#>  3 2       1.45  3.45          0.27            0.2           0.426 3+3         
#>  4 3       0.45  0.9           0.44            0             0.111 3+3         
#>  5 4       0     0             0.53            0             0     3+3         
#>  6 5       0     0             0.57            0             0     3+3         
#>  7 NoDose  0     0             0               0             0     CRM         
#>  8 1       0.85  6.45          0.12            0.3           0.538 CRM         
#>  9 2       1     2.25          0.27            0.6           0.188 CRM         
#> 10 3       0.75  1.5           0.44            0.1           0.125 CRM         
#> 11 4       1.1   1.8           0.53            0             0.15  CRM         
#> 12 5       0     0             0.57            0             0     CRM         
#> 13 NoDose  0     0             0               0             0     Stopping CRM
#> 14 1       0.85  6.45          0.12            0.3           0.538 Stopping CRM
#> 15 2       1     2.25          0.27            0.6           0.188 Stopping CRM
#> 16 3       0.75  1.5           0.44            0.1           0.125 Stopping CRM
#> 17 4       1.1   1.8           0.53            0             0.15  Stopping CRM
#> 18 5       0     0             0.57            0             0     Stopping CRM

We also provide a convenient function to quickly visualise how the probability of selecting each dose in each design evolved as the simulations progressed:

convergence_plot(sims)

We can see immediately, for instance, that the designs generally agree that dose 2 is the best MTD candidate, and that the CRM designs are much more likely to recommend dose 3. We can be more precise by formally contrasting the probability of selecting each dose for each pair of designs:

library(dplyr)
library(ggplot2)

as_tibble(sims) %>% 
  filter(n %% 5 == 0) %>%
  ggplot(aes(x = n, y = delta)) +
  geom_point(size = 0.4) +
  geom_linerange(aes(ymin = delta_l, ymax = delta_u)) +
  geom_hline(yintercept = 0, linetype = "dashed", col = "red") +
  facet_grid(comparison ~ dose,
             labeller = labeller(
               .rows = label_both,
               .cols = label_both)
  )

The error bars here reflect 95% symmetric asymptotic normal confidence intervals. Change the alpha = 0.05 parameter when calling as_tibble(sims) to get confident intervals for a different significance level. We see that, even with the very small sample size of 50 simulated trials, the CRM designs are significantly more likely to recommend dose 3 than the 3+3 design. In contrast, in this scenario there is very little difference at all between the two CRM variants.

Working with PatientSamples

The idea at the core of the method by Sweeting et al. (2024) for efficient comparison of competing designs is to use the same patients across different designs. This reduces Monte Carlo error by examining where the designs differ in their recommendations given identical inputs.

This is achieved in escalation using classes inheriting from PatientSample. A single PatientSample reflects one particular state of the world, where patient i would reliably experience a toxicity or efficacy event if treated at a particular dose. The consistent occurrence of events is managed using latent uniform random variables. We show how to import and export these variables, and use them to infer potential outcomes at a range of doses.

To illustrate some of the fine-grained features of working with patient samples, we reproduce the BOIN12 example in Sweeting et al. (2024):

num_doses <- 5

designs <- list(

  "BOIN12 v1" = get_boin12(num_doses = num_doses,
                          phi_t = 0.35, phi_e = 0.25,
                          u2 = 40, u3 = 60,
                          c_t = 0.95, c_e = 0.9) %>%
    stop_when_n_at_dose(n = 12,  dose = 'any') %>%
    stop_at_n(n = 36),

  "BOIN12 v2" = get_boin12(num_doses = num_doses,
                          phi_t = 0.35, phi_e = 0.25,
                          u2 = 40, u3 = 60,
                          c_t = 0.85, c_e = 0.8
                          ) %>%
    stop_when_n_at_dose(n = 12,  dose = 'any') %>%
    stop_at_n(n = 36)
)

Exporting the underlying latent variables

By setting return_patient_samples = TRUE in the call to simulate_compare:

true_prob_tox <- c(0.05, 0.10, 0.15, 0.18, 0.45)
true_prob_eff <- c(0.40, 0.50, 0.52, 0.53, 0.53)

set.seed(2024)
x <- simulate_compare(
  designs = designs, 
  num_sims = 50, 
  true_prob_tox = true_prob_tox, 
  true_prob_eff = true_prob_eff, 
  return_patient_samples = TRUE
)
#> Running BOIN12 v1 
#> Running BOIN12 v2

we ensure the generated patient samples are exported in the returned object:

ps <- attr(x, "patient_samples")

There are 50 patient-samples because we performed 50 simulated iterates:

length(ps)
#> [1] 50

Each iterate is associated with one patient-sample.

For instance, the uniformly-distributed random variables that determine the occurrence of toxicity events in the first simulated iterate are:

ps[[1]]$tox_u
#>  [1] 0.69817312 0.30320913 0.51578548 0.61524718 0.05412517 0.65435328
#>  [7] 0.81482212 0.24535641 0.57055149 0.81225051 0.37904775 0.81430783
#> [13] 0.55000609 0.70191599 0.25793769 0.63087134 0.53763668 0.91237084
#> [19] 0.22604552 0.46627740 0.60173270 0.52517506 0.51566372 0.85122330
#> [25] 0.93486970 0.33551839 0.08832608 0.54574263 0.42418266 0.68010525
#> [31] 0.20996116 0.66860680 0.56893212 0.40129787 0.68108059 0.06243934

and the equivalent for efficacy events are:

ps[[1]]$eff_u
#>  [1] 0.70142031 0.90188285 0.86321521 0.29028973 0.62388601 0.71585945
#>  [7] 0.44400867 0.55189023 0.80299444 0.01705315 0.76742179 0.75409645
#> [13] 0.09794155 0.79412838 0.70943606 0.61495142 0.77500254 0.72021324
#> [19] 0.75763429 0.25464427 0.70966585 0.91368568 0.72269800 0.53335318
#> [25] 0.87993773 0.44162234 0.79396025 0.32815021 0.32048108 0.50633040
#> [31] 0.44140110 0.11780770 0.72699849 0.92028149 0.76116531 0.32360112

These values reflect the patient-specific propensities to toxicity and efficacy. For instance, we see that the first patient would always experience a toxicity if treated at a dose with associated true toxicity probability less than 0.6981731. Likewise, the same patient would experience efficacy if treated at a dose with associated true efficacy probability less than 0.7014203.

These latent variables can be exported using R’s many I/O functions and formats.

Calculating potential outcomes

We can calculate all potential outcomes for a list of patient-samples by calling get_potential_outcomes and providing the true event probabilities:

z <- get_potential_outcomes(
  patient_samples = ps,
  true_prob_tox = true_prob_tox,
  true_prob_eff = true_prob_eff
)

Note that were we working with tox-only dose-escalation designs like CRM or mTPI (for example), we would omit the true_prob_eff parameter.

For instance, the potential outcomes in the first simulated iterate are:

z[[1]]
#> $tox
#>       [,1] [,2] [,3] [,4] [,5]
#>  [1,]    0    0    0    0    0
#>  [2,]    0    0    0    0    1
#>  [3,]    0    0    0    0    0
#>  [4,]    0    0    0    0    0
#>  [5,]    0    1    1    1    1
#>  [6,]    0    0    0    0    0
#>  [7,]    0    0    0    0    0
#>  [8,]    0    0    0    0    1
#>  [9,]    0    0    0    0    0
#> [10,]    0    0    0    0    0
#> [11,]    0    0    0    0    1
#> [12,]    0    0    0    0    0
#> [13,]    0    0    0    0    0
#> [14,]    0    0    0    0    0
#> [15,]    0    0    0    0    1
#> [16,]    0    0    0    0    0
#> [17,]    0    0    0    0    0
#> [18,]    0    0    0    0    0
#> [19,]    0    0    0    0    1
#> [20,]    0    0    0    0    0
#> [21,]    0    0    0    0    0
#> [22,]    0    0    0    0    0
#> [23,]    0    0    0    0    0
#> [24,]    0    0    0    0    0
#> [25,]    0    0    0    0    0
#> [26,]    0    0    0    0    1
#> [27,]    0    1    1    1    1
#> [28,]    0    0    0    0    0
#> [29,]    0    0    0    0    1
#> [30,]    0    0    0    0    0
#> [31,]    0    0    0    0    1
#> [32,]    0    0    0    0    0
#> [33,]    0    0    0    0    0
#> [34,]    0    0    0    0    1
#> [35,]    0    0    0    0    0
#> [36,]    0    1    1    1    1
#> 
#> $eff
#>       [,1] [,2] [,3] [,4] [,5]
#>  [1,]    0    0    0    0    0
#>  [2,]    0    0    0    0    0
#>  [3,]    0    0    0    0    0
#>  [4,]    1    1    1    1    1
#>  [5,]    0    0    0    0    0
#>  [6,]    0    0    0    0    0
#>  [7,]    0    1    1    1    1
#>  [8,]    0    0    0    0    0
#>  [9,]    0    0    0    0    0
#> [10,]    1    1    1    1    1
#> [11,]    0    0    0    0    0
#> [12,]    0    0    0    0    0
#> [13,]    1    1    1    1    1
#> [14,]    0    0    0    0    0
#> [15,]    0    0    0    0    0
#> [16,]    0    0    0    0    0
#> [17,]    0    0    0    0    0
#> [18,]    0    0    0    0    0
#> [19,]    0    0    0    0    0
#> [20,]    1    1    1    1    1
#> [21,]    0    0    0    0    0
#> [22,]    0    0    0    0    0
#> [23,]    0    0    0    0    0
#> [24,]    0    0    0    0    0
#> [25,]    0    0    0    0    0
#> [26,]    0    1    1    1    1
#> [27,]    0    0    0    0    0
#> [28,]    1    1    1    1    1
#> [29,]    1    1    1    1    1
#> [30,]    0    0    1    1    1
#> [31,]    0    1    1    1    1
#> [32,]    1    1    1    1    1
#> [33,]    0    0    0    0    0
#> [34,]    0    0    0    0    0
#> [35,]    0    0    0    0    0
#> [36,]    1    1    1    1    1

We see that patient 1 would not have experienced a toxicity at any dose because their underlying toxicity propensity variable was quite high at 0.6981731. Gladly, however, the same patient would have experienced efficacy at any dose at or above dose-level 2.

Importing the underlying latent variables

We show above how to export the latent event variables to store of use them elsewhere. We can also import the latent variables, to mimic the patient population used in some external simulation, for instance.

To do so, we instantiate a list of PatientSample objects, with one for each simulated iterate. We will work with ten iterates in the interests of speed:

num_sims <- 10
ps <- lapply(1:num_sims, function(x) PatientSample$new())

We then call the set_eff_and_tox function on each to set the latent toxicity and efficacy propensities to the desired values. For illustration, let us simply sample uniform random variables and use those. In a more realistic example, we would set the latent variables to the values we wished to import, i.e. those values that were used in some external study or idealised values that we wish to use.

set.seed(2024)
for(i in seq_len(num_sims)) {
  tox_u_new <- runif(n = 50)
  eff_u_new <- runif(n = 50)
  ps[[i]]$set_eff_and_tox(tox_u = tox_u_new, eff_u = eff_u_new)
}

We have set the latent variable vectors to have length 50. This means we can work with up to 50 patients in each iterate. If the simulation tries to use a 51st patient, we will receive an error. When setting the latent variables in this way, ensure you use enough values to cover your maximum sample size.

To use the patient-samples we have created in simulated trials, we specify the patient_samples parameter:

x1 <- simulate_compare(
  designs = designs, 
  num_sims = length(ps), 
  true_prob_tox = true_prob_tox, 
  true_prob_eff = true_prob_eff, 
  patient_samples = ps
)
#> Running BOIN12 v1 
#> Running BOIN12 v2

Having specified the exact patient population in this way, if we run the simulation study a second time with the same patients:

x2 <- simulate_compare(
  designs = designs, 
  num_sims = length(ps), 
  true_prob_tox = true_prob_tox, 
  true_prob_eff = true_prob_eff, 
  patient_samples = ps
)
#> Running BOIN12 v1 
#> Running BOIN12 v2

we will see that the decisions within design are (usually) identical within a design. E.g.

all(
  recommended_dose(x1[["BOIN12 v1"]]) == 
    recommended_dose(x2[["BOIN12 v1"]])
)
#> [1] TRUE

The first BOIN12 variant recommends the exact same dose in the two batches because the patients are the same.

For completeness, when might the decisions vary despite the same patients being used? In a design that has inherent randomness. E.g. Wages & Tait’s design features adaptive randomisation, meaning that, even with identical patients, different behaviours will be seen.

Correlated toxicity and efficacy outcomes

Finally, let us introduce the CorrelatedPatientSample to sample correlated toxicity and efficacy events. To coerce a correlation of 0.5 between the latent uniform tox and eff event propensities, we run:

ps <- CorrelatedPatientSample$new(num_patients = 100, rho = 0.5)

We can observe

cor(ps$tox_u, ps$eff_u)
#> [1] 0.4827035

that the correlation is close to desired. Note however, that the correlation of the binary level events is not, in general, equal to the target value of rho, because they depend on the event probabilities:

tox <- sapply(seq_len(100), function(i) ps$get_patient_tox(i, prob_tox = 0.3))
eff <- sapply(seq_len(100), function(i) ps$get_patient_eff(i, prob_eff = 0.4))
cor(tox, eff)
#> [1] 0.3027588

Further refinements

Check out the Further refinements section of the Simulation vignette for further ways of refining the behaviour of simulations, including changing the cohort size and timing of patient arrivals, simulating the conclusion of partly-observed trials, tweaking the immediate next dose, returning all interim model fits, and working with large trials. The methods described there apply to both simulate_trials and simulate_compare.

References

Sweeting, Michael, Daniel Slade, Daniel Jackson, and Kristian Brock. 2024. “Potential Outcome Simulation for Efficient Head-to-Head Comparison of Adaptive Dose-Finding Designs.” arXiv Preprint arXiv:2402.15460.