```
library(rpact)
packageVersion("rpact")
```

# Planning and Analyzing a Group-Sequential Multi-Arm Multi-Stage Design with Binary Endpoint using rpact

# Introduction

This document provides an exemplary implementation of multi-arm-multi-stage (MAMS) design with binary endpoint using rpact. Concretely, the vignette consists of considering design implementation with respect to futility bounds on treatment effect scale, sample size calculations and simulations given different treatment selection approaches. Further, analysis using closed testing will be performed on generic data binary and survival data.

Note that since rpact itself doesn’t support landmark analysis, i.e. comparison of survival probabilities at a fixed point in time, using the Greenwoods standard error (SE) formula. Thus the first two analyses are based on the empirical event rates only. The R packages gestate and survival are then utilized to briefly show how one could merge the packages to perform the actually intended analysis using boundaries obtained by rpact and test statistics obtained by survival probabilities and standard errors estimated using gestate and survival.

For methodological and theoretical background, refer to “Group Sequential and Confirmatory Adaptive Designs in Clinical Trials” by Gernot Wassmer and Werner Brannath.

Before starting, load the rpact package and make sure the version of the package is at least 3.1.0:

`[1] '4.1.0'`

# Implementation of design

Consider we are interested in implementing a group sequential design with \(3\) stages, \(3\) treatment arms (\(2\) active, \(1\) control), \(2\) treatment arm comparisons (each active arm vs. the placebo arm), a binary endpoint, with a global one-sided \(\alpha=0.025\), and the power for each treatment arm comparison set to \(80\%\), hence \(1-\beta=0.8\). Additionally, let the critical boundaries be calculated using alpha-spending O’Brien & Fleming approach and for planning purposes, suppose equally distributed information rates. Further, non-binding futility bounds are to be set in the following way:

Considering the \(2\) active treatment arms, the goal of both arms is to confirm significant reduction in event (e.g. disease, death) rates as compared to a control arm. Thus, futility should appear exactly when there is only low reduction, no reduction or even increase seen in the data. Assume one designs a study where a treatment arm is futile in the first stage when there has only been a relative reduction of \(5\)% (i.e. rate ratio of \(0.95\)) and in the second stage when data indicates a relative reduction of \(10\)% only (i.e. rate ratio of \(0.9\)). For simplification, futility bounds are assumed to be independent. As rpact primarily uses futility bounds on \(z\)-scale, one needs to determined the \(z\)-scale value that correspond with the values listed above. Advantageously, the futility bounds on treatment effect scale are part of sample size calculation output. Therefore, rpact allows to determine the right \(z\)-value by basically examining various options for \(z\)-scale futility as input and decide on which values result in the needed treatment effect futility bounds. This can exemplarily be done by trying different \(z\)-scale futility bounds as input up until the sample size calculation command output (given certain treatment effect assumptions) indicates that the input corresponds with the desired treatment effect futility bound.

As an example on how to get the corresponding futility bound values on the different scales, see the example below:

First, one needs to initialize the design and basically take arbitrary futility bounds on \(z\)-scale as input:

```
# first and second stage futility on z-scale
<- 0.16
fut1 <- 0.39
fut2
<- getDesignGroupSequential(
d_fut kMax = 3,
alpha = 0.025,
beta = 0.2,
sided = 1,
typeOfDesign = "asOF",
informationRates = c(1 / 3, 2 / 3, 1),
futilityBounds = c(fut1, fut2),
bindingFutility = FALSE
)
```

Now, as mentioned, the sample size calculation output provides information about the futility boundaries on the treatment effect scale. Therefore, after assuming certain treatment effects, one needs to perform sample size calculation and extract the treatment effect futility bound from the respective output:

```
<- 0.1 # assumed rate in control
c_assum <- 0.5 # relative reduction that is to be detected with probability of 0.8
effect_assum
# rates indicate binary endpoint
<- getSampleSizeRates(
ssc_fut design = d_fut,
riskRatio = TRUE,
pi1 = c_assum * (1 - effect_assum),
pi2 = c_assum
)
$futilityBoundsEffectScale ssc_fut
```

```
[,1]
[1,] 0.9464954
[2,] 0.9085874
```

The values printed out above are now the futility bounds on treatment effect scale that do correspond with the values *fut1* and *fut2* from above. Since \(0.9465 < 0.95\) and \(0.9086 > 0.9\), the input value *fut1=\(0.16\)* seems to be slightly to large while *fut2=\(0.39\)* seems to be slightly to low, while this indicates how to adjust the input values such that one comes closer to the the true corresponding values.

Now, using a search algorithm or by simply trying different values, one results in futility bounds on \(z\)-scale summarized in the following table created using knitr:

First stage | Second stage | |
---|---|---|

Orig. treatment effect scale | 0.950 | 0.900 |

Approx. z-Value for given t-Value | 0.149 | 0.414 |

Corresponding t-scale Value | 0.950 | 0.900 |

Diff. between Approx. and Input | 0.000 | 0.000 |

In the table above, the *Orig. treatment effect scale* is the intended futility bound on treatment effect scale, *Approx. \(z\)-value* denotes the corresponding \(z\)-scale value approximation and *Corresponding \(t\)-scale Value* the actual \(t\)-scale value using the calculated approx. \(z\)-values as input.. One can see that the first stage futility bound on \(z\)-scale should approximately be chosen as \(0.14915\) and the second stage \(z\)-scale futility bound as \(0.41381\), resulting in the desired futility bound on treatment effect scale of approximately 0.95 in first and 0.9 in second stage, respectively. In the next chapter containing the sample size calculations, the respective outputs validate that these \(z\)-scale futility bounds are a reasonable choice.

Now that (approximate) futility bounds on \(z\)-scale are known, the above specified design can be entirely initialized, *kMax=3* indicates a study design of three stages.

```
# GSD with futility bounds according to above calculations
<- getDesignGroupSequential(
d kMax = 3,
alpha = 0.025,
beta = 0.2,
sided = 1,
typeOfDesign = "asOF",
informationRates = c(1 / 3, 2 / 3, 1),
futilityBounds = c(0.149145, 0.41381),
bindingFutility = FALSE
)|> summary() d
```

**Sequential analysis with a maximum of 3 looks (group sequential design)**

O’Brien & Fleming type alpha spending design, non-binding futility, one-sided overall significance level 2.5%, power 80%, undefined endpoint, inflation factor 1.0833, ASN H1 0.8652, ASN H01 0.843, ASN H0 0.6133.

Stage | 1 | 2 | 3 |
---|---|---|---|

Planned information rate | 33.3% | 66.7% | 100% |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Cumulative power | 0.0213 | 0.4471 | 0.8000 |

Futility probabilities under H1 | 0.062 | 0.011 |

Simply printing out the defined object gives a nice overview of all relevant design parameters. Note that the adjusted \(\alpha\) of the last stage (\(=0.0231\)) is slightly lower that the predefined global \(\alpha\), corresponding to a critical values of \(1.993\) being slightly larger than \(1.96\), which is due to alpha-spending O’Brien Fleming adjustment. Further, the output basically provides an overview of the input parameters.

# Sample size calculation

When it comes to sample size calculations for designs with binary endpoints, rpact provides the command `getSampleSizeRates()`

. It should be noted upfront that the sample size calculation in rpact always only refers to only one treatment arm comparison.

The sample size calculations code applicable here has already been indirectly used to properly determine the \(z\)-scale futility boundaries whenever futility boundaries are only given on treatment effect scale. However, this sections’ purpose now is to provide some more detail on sample size calculations in MAMS-designs with binary endpoint.

Suppose the treatment effect under \(H_0\) is \(0\) or there is even a rate increase in active treatment group, meaning the difference \(\pi_1-\pi_2 \geq 0\), with \(\pi_1\) representing the assumed event rate in the treatment group and \(\pi_2\) being the assumed event rate in the reference/control group. Again, let the parameter setting be given as above and assume an expected relative reduction of event occurrence of \(50\%\) given \(\pi_2 = 0.1\) (hence \(\pi_2 = 0.1*0.5=0.05\)). Since we are interested in directly comparing risks, we set `riskRatio`

to `TRUE`

which particularly results in testing \(H_0: \frac{\pi_1}{\pi_2} \geq 1\) against \(H_1:\frac{\pi_1}{\pi_2} < 1\). The sample size per stage for one treatment arm comparison can then be calculated using the commands:

```
<- 0.1 # assumed rate in control
c_rate <- 0.5 # relative reduction that is to be detected with probability of 0.8
effect
# rates indicate binary endpoint
<- getSampleSizeRates(
d_sample design = d,
riskRatio = TRUE,
pi1 = c_rate * (1 - effect),
pi2 = c_rate
)|> summary() d_sample
```

**Sample size calculation for a binary endpoint**

Sequential analysis with a maximum of 3 looks (group sequential design), one-sided overall significance level 2.5%, power 80%. The results were calculated for a two-sample test for rates (normal approximation), H0: pi(1) / pi(2) = 1, H1: pi(1) = 0.05, control rate pi(2) = 0.1.

Stage | 1 | 2 | 3 |
---|---|---|---|

Planned information rate | 33.3% | 66.7% | 100% |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Efficacy boundary (t) | 0.061 | 0.476 | 0.643 |

Futility boundary (t) | 0.950 | 0.903 | |

Cumulative power | 0.0213 | 0.4471 | 0.8000 |

Number of subjects | 313.8 | 627.5 | 941.3 |

Expected number of subjects under H1 | 751.7 | ||

Overall exit probability (under H0) | 0.5594 | 0.1828 | |

Overall exit probability (under H1) | 0.0838 | 0.4366 | |

Exit probability for efficacy (under H0) | 0.0001 | 0.0059 | |

Exit probability for efficacy (under H1) | 0.0213 | 0.4258 | |

Exit probability for futility (under H0) | 0.5593 | 0.1769 | |

Exit probability for futility (under H1) | 0.0625 | 0.0108 |

Legend:

*(t)*: treatment effect scale

The variable *Futility boundary (t)* represents the futility bounds transferred to treatment effect scale. Therefore, one can see that the predefined bounds calculated above correspond with a rate ratio of \(0.950\) at the first stage (i.e. relative reduction of \(5\%\)) and a rate ratio of \(0.903\) at the second (i.e. relative reduction of approx. \(10\%\)), being actually pretty close to the intended ones. The slight differences to the results above, again, are due to assumed independence (i.e. futility bound calculations for the different stages are done on an independent basis). However, even using this simplification, results are equivalent with a precision of \(3\) digits or more.

Another important information provided by the output is that to achieve the desired study characteristics, such as keeping the \(\alpha\)-error controlled at \(0.025\) while obtaining power of \(1-\beta = 0.8\), approx. \(\frac{314}{2}=157\) study subjects are needed per arm per stage given equally spread information rates. Thus, one results in a maximum of \(3*3*157=1413\) subjects in case one assumes a design with \(3\) treatment arms, \(2\) being active and \(1\) being a control . Multiplications by \(3\) are then due to having \(3\) stages and \(2\) active treatment arms compared to \(1\) control arm. It should be noted here that this is just an approximation of the sample size needed to achieve a power of \(1-\beta=0.8\), while power here is defined as the probability of successfully detecting the assumed effect. Simulations in following chapters will indicate that this rather rough approximate sample size is actually rather conservative since the studies appear to be overpowered in simulations, while the term power has a different meaning here: when considering simulations of studies with multiple active treatment arms, power is referred to as the probability to claim success of at least one active treatment arm in the study.

The row *Exit probability for futility* indicates the rather low probability of early stopping due to futility (\(1\). stage: \(0.0625\), \(2\). stage: \(0.108\)), while this depends on the assumed treatment effects and the defined boundaries. *Efficacy boundary (t)*, just as *Efficacy boundary (z-value scale)* shows that an enormous decrease in events needs to be detected in the treatment group in order to reject \(H_0\) in the first stage (rate ratio of \(0.061\), i.e. relative reduction of \(1-0.061=0.939\)) which, again reflects the “conservatism” of alpha-spending O’Brien Fleming approach in early stages, however resulting in rather liberal and monotonically decreasing boundaries along the stages.

```
# boundary plots
par(mar = c(4, 4, .1, .1))
plot(d_sample, main = paste0(
"Boundaries against stage-wise ",
"cumul. sample sizes - 1"
type = 1)
), plot(d_sample, main = paste0(
"Boundaries against stage-wise ",
"cumul. sample sizes - 2"
type = 2) ),
```

Plotting the cumulative sample size against the applicable boundaries is a way of nicely visualizing these important study characteristics. On the left plot, one can see a plot having the boundaries on the y-axis on \(z\)-scale.. The dashed line represents the critical value of a fixed study design with one-sided testing and \(\alpha=0.025 \,\) (\(\Phi^{-1}(1-\alpha) \approx 1..96\), \(\Phi\) denoting the cumulative distribution function of the standardnormal distribution, \(\Phi^{-1}\) the corresponding quantile function). The right plot basically contains the same information, while the y-axis is now given on treatment effect scale. The red line represents the efficacy bound which needs to be crossed to obtain statistical significance at the applicable stage, the blue line represents the futility bounds. Note that the plot again indicates that low risk ratio values are desirable.

# Study simulations

Performing simulations prior to conducting the study is often reasonable since they allow for evaluating characteristics, such as power (please refer to definition above), assuming different scenarios or constellations of treatment effects. Thus, simulations enable to get a more holistic view on how and where the planned study could go in case the assumptions appear to be (approximately) true.

Simulation of the study design will be done using the function `getSimulationMultiArmRates()`

, which needs an Inverse Normal Design as input, similarly defined, with the add-on of containing information about how the analysis in the simulations should be performed:

```
# design as above, just as inverse normal
<- getDesignInverseNormal(
d_IN kMax = 3,
alpha = 0.025,
beta = 0.2,
sided = 1,
typeOfDesign = "asOF",
informationRates = c(1 / 3, 2 / 3, 1),
futilityBounds = c(0.149145, 0.41380800),
bindingFutility = FALSE
)
```

To perform study simulations for instance for power or probability-of-success evaluation, one need to assume (different) effect rates in the \(2\) active treatment arms, which need to be defined in a matrix-object. For binary data, die `effectMatrix`

refers to the actual event rate in each arm, but not to the the difference in event rates between control and active treatment arms. Further, number of iterations needs to be defined in advance.

```
# set number of iterations to be used in simulation
<- 100 # 10000
maxNumberOfIterations
# specify the scenarios, nrow: number of scenarios, ncol: number of treatment arms
<- matrix(c(0.100, 0.100, 0.05, 0.05, 0.055, 0.045),
effectMatrix byrow = TRUE, nrow = 3, ncol = 2
)
# first column: first treatment arm, second column: second treatment arm
show(effectMatrix)
```

```
[,1] [,2]
[1,] 0.100 0.100
[2,] 0.050 0.050
[3,] 0.055 0.045
```

Considering a design with \(2\) active treatment arms, both to be compared against a control arm, the effect matrix contains treatment effect in the first active arm in the first column and the effect in the second active arm in the second column respectively.

Next step would be to actually perform the simulation. Several variables need to be initialized in order to individualize the simulation according to the specific needs. Choosing `typeOfShape`

as `userDefined`

refers to the above defined effect matrix, however, one could also assume a *linear* relationship and initialize a vector of maximal assumed effects (`piMaxVector`

) in treatment groups. When some functional form of dose response curve are used, it should be noted that the `piMaxVector`

may be used as the maximum treatment effect on the whole dose response curve instead of observed dose range, which could be misleading. Therefore it is suggested to use the `userDefined`

shape. `directionUpper`

is set to `FALSE`

since low obtained rates of events do correspond with a better clinical outcome and reducing the rate is considered beneficial to the subjects. Intersection test to be performed is *Simes* , while other options would be given by e.g. *Bonferroni* or *Dunnett*, with *Dunnett* being the default setting. `typeOfSelection`

is initially set to `rbest`

with `rValue`

being 2, which means that at each stage, the \(2\) best treatment arms should be carried forward. `successCriterion = "all"`

means that to stop the study early due to efficacy, both active treatment arms need to be tested significantly at interim analysis and another option would be to set `successCriterion = "atLeastOne"`

to declare significant testing of one treatment arm to be sufficient for study success at interim. The vector `plannedSubjects`

contains the cumulative per arm per stage sample sizes previously calculated.

```
# first simulation
<- getSimulationMultiArmRates(
simulation design = d_IN,
activeArms = 2,
effectMatrix = effectMatrix,
typeOfShape = "userDefined",
piControl = 0.1,
intersectionTest = "Simes",
directionUpper = FALSE,
typeOfSelection = "rBest",
rValue = 2,
effectMeasure = "testStatistic",
successCriterion = "all",
plannedSubjects = c(157, 314, 471),
allocationRatioPlanned = 1,
maxNumberOfIterations = maxNumberOfIterations,
seed = 145873,
showStatistics = TRUE
)|> summary() simulation
```

**Simulation of a binary endpoint (multi-arm design)**

Sequential analysis with a maximum of 3 looks (inverse normal combination test design), one-sided overall significance level 2.5%. The results were simulated for a multi-arm comparisons for rates (2 treatments vs. control), H0: pi(i) - pi(control) = 0, power directed towards smaller values, H1: pi_max as specified, control rate pi(control) = 0.1, planned cumulative sample size = c(157, 314, 471), effect shape = user defined, intersection test = Simes, selection = r best, r = 2, effect measure based on test statistic, success criterion: all, simulation runs = 100, seed = 145873.

Stage | 1 | 2 | 3 |
---|---|---|---|

Fixed weight | 0.577 | 0.577 | 0.577 |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Reject at least one [1] | 0.0200 | ||

Reject at least one [2] | 0.7900 | ||

Reject at least one [3] | 0.9200 | ||

Rejected arms per stage [1] | |||

Treatment arm 1 | 0 | 0 | 0.0100 |

Treatment arm 2 | 0 | 0 | 0.0100 |

Rejected arms per stage [2] | |||

Treatment arm 1 | 0 | 0.4000 | 0.3200 |

Treatment arm 2 | 0 | 0.3200 | 0.3900 |

Rejected arms per stage [3] | |||

Treatment arm 1 | 0 | 0.3100 | 0.3900 |

Treatment arm 2 | 0.0100 | 0.4800 | 0.3800 |

Success per stage [1] | 0 | 0 | 0 |

Success per stage [2] | 0 | 0.2600 | 0.3800 |

Success per stage [3] | 0 | 0.2800 | 0.3700 |

Stage-wise number of subjects [1] | |||

Treatment arm 1 | 157.0 | 157.0 | 157.0 |

Treatment arm 2 | 157.0 | 157.0 | 157.0 |

Control arm | 157.0 | 157.0 | 157.0 |

Stage-wise number of subjects [2] | |||

Treatment arm 1 | 157.0 | 157.0 | 157.0 |

Treatment arm 2 | 157.0 | 157.0 | 157.0 |

Control arm | 157.0 | 157.0 | 157.0 |

Stage-wise number of subjects [3] | |||

Treatment arm 1 | 157.0 | 157.0 | 157.0 |

Treatment arm 2 | 157.0 | 157.0 | 157.0 |

Control arm | 157.0 | 157.0 | 157.0 |

Expected number of subjects under H1 [1] | 711.2 | ||

Expected number of subjects under H1 [2] | 1229.3 | ||

Expected number of subjects under H1 [3] | 1243.4 | ||

Overall exit probability [1] | 0.6300 | 0.2300 | |

Overall exit probability [2] | 0.0600 | 0.2700 | |

Overall exit probability [3] | 0.0400 | 0.2800 | |

Selected arms [1] | |||

Treatment arm 1 | 1.0000 | 0.3700 | 0.1400 |

Treatment arm 2 | 1.0000 | 0.3700 | 0.1400 |

Selected arms [2] | |||

Treatment arm 1 | 1.0000 | 0.9400 | 0.6700 |

Treatment arm 2 | 1.0000 | 0.9400 | 0.6700 |

Selected arms [3] | |||

Treatment arm 1 | 1.0000 | 0.9600 | 0.6800 |

Treatment arm 2 | 1.0000 | 0.9600 | 0.6800 |

Number of active arms [1] | 2.000 | 2.000 | 2.000 |

Number of active arms [2] | 2.000 | 2.000 | 2.000 |

Number of active arms [3] | 2.000 | 2.000 | 2.000 |

Conditional power (achieved) [1] | 0.0614 | 0.1061 | |

Conditional power (achieved) [2] | 0.3845 | 0.5980 | |

Conditional power (achieved) [3] | 0.4098 | 0.7031 | |

Exit probability for efficacy | 0.0213 | 0.4258 | |

Exit probability for futility [1] | 0.6300 | 0.2300 | |

Exit probability for futility [2] | 0.0600 | 0.0100 | |

Exit probability for futility [3] | 0.0400 | 0 |

Legend:

*(i)*: treatment arm i*[j]*: effect matrix row j (situation to consider)

In this output, the different input situations are indicated through \([\,j\,]\) while \(j=1,2,3\).

Under \(H_0\) with assumed treatment effect equal to the assumed effect in the control group (scenario 1), probability of rejecting at least one is low (i.e. committing type I error, *Reject at least one [1]* \(= 0.0181\)), while rejection probability (under alternative power) is high especially assuming high treatment effects in both treatment arms (under alternative, *Reject at least one [2]* \(= 0.8550\) for scenario \(2\)). Note also that, in any case, both treatment arms are selected as `rValue=2`

means that the two best treatment arms (i.e. all arms in this case) are selected, regardless of if the one or both treatment arms meet the applicable futility bounds. Further, the simulation output contains several information. That is, e.g. stage-wise number of subjects are calculated and probabilities for arms to be selected in each stage are provided for each assumed effect. Additionally, expected sample size, which is commonly used as optimality criterion of designs, stopping due to futility-probabilities and conditional power, defined as the probability of having a statistically significant result given the data observed thus far, are listed. Note that, since probability of futility stop is rather high in scenario \(1\) (first stage: \(0.5797\), second stage: \(0.1920\)), the corresponding *expected number of subjects* is approximately \(776\), lying far below the expected sample sizes in the other scenarios.

## Stage-wise treatment arm selection

Now, suppose the underlying treatment arm selection scheme is different and especially not straightforwardly covered by the other available and pre-defined options in rpact (i.e. `best`

, `rBest`

, `epsilon`

, `all`

). Then, rpact accounts for that by allowing users to implement a user defined treatment arm selection function thus being capable of covering various different selection approaches. The user needs to set `typeOfSelection`

to `userDefined`

and define a function used as input for the `selectArmsFunction`

-argument.

Say e.g. a treatment arm should be selected if and only if it doesn’t meet the futility bound, and if it does, function should be specified such that this arm is deselected at the applicable stage. See the different treatment arm selection approaches illustrated for the first stage depending on the different potential outcomes in the figures below, while the first diagram illustrates the procedure using `rBest`

and the second using the `userDefined`

scheme here, while the numbers represent the following potential first stage analysis results:

- Both active Treament Arms significant
- 1 active Treatment Arm significant, 1 active Treatment Arm non-significant
- 1 active Treatment Arm futile, 1 active Treatment Arm significant
- Both active Treatment Arms non-significant
- 1 active Treatment Arm futile, 1 active Treatment Arm non-significant
- Both active Treatment Arm futile

rBest, rValue=2

This first diagram graphically represent how treatment selection proceeds choosing *typeOfSelection=rBest* with *rValue=2* in case of the study design previously defined. Early study success can be obtained whenever both treatment arms can be tested significantly at interim, continuation of study with only non-significant (meaning neither stopped for efficacy, not for futility) active treatment arms is to be done whenever one of the \(2\) active arms is discontinued due to efficacy and if none of the arms can be tested significantly at interim, the study continues with both treatment arms as even arms crossing the futility bounds are to be carried forward using *rBest, rValue=2*.

userDefined

This second diagram illustrates the active treatment arm selection that is to be implemented through the user defined selection function. Efficacy stop occurs when both active treatment arms are tested significantly at interim, continuation with non-significant arm only happens only when one active arm is significant while the other arm is not, but neither futile. The study also terminates early with success of one arm when one active arm is futile while the other can be deemed superior to control, continuation with non-futile arms only is done when both arms are non-significant or one of them is non-significant while the other one is futile. Lastly, futility stop can also happen at interim.

Now that the futility bounds are defined in the design specification already one could believe that selection rule is only an add-on to the futility bound. However, this is not true since specifying selection rule overwrites the futility bound as continuation-criterion for treatment arms. Thus, even if futility bounds are pre-specified, to implement stopping due to futility in application of `getSimulationMultiArmRates()`

(but also `getSimulationMultiArmMeans()`

, `getSimulationMultiArmSurvival()`

), one could use these as input in the customized selection function. In this case, having the previously defined futility bounds of \(0.149145\) at the first interim and \(0.413808\) at the second respectively, there is a need to individualize the selection scheme along the different stages. It is important to note here that `effectMeasure = "testStatistic"`

needs to be set as the futility bounds have intentionally been transformed to and calculated on \(z\)-scale. Now, rpact enables implementation of this selection scheme by allowing *stage* as an argument in the selection function, additional to `effectVector`

:

```
# first row: first stage futility bounds, second stage: second stage futility bound
<- matrix(c(d_IN$futilityBounds, d_IN$futilityBounds), nrow = 2)
futility_bounds
# selection function
<- function(effectVector, stage) {
selection # if stage==1, compare to first stage fut. bounds,
# if stage==2, compare to second stage fut. bounds
<- switch(stage,
selectedArms >= futility_bounds[1, ]),
(effectVector >= futility_bounds[2, ])
(effectVector
)return(selectedArms)
}
<- getSimulationMultiArmRates(
simulation design = d_IN,
activeArms = 2,
effectMatrix = effectMatrix,
typeOfShape = "userDefined",
piControl = 0.1,
intersectionTest = "Simes",
directionUpper = FALSE,
typeOfSelection = "userDefined",
selectArmsFunction = selection,
effectMeasure = "testStatistic",
successCriterion = "all",
plannedSubjects = c(157, 314, 471),
allocationRatioPlanned = 1,
maxNumberOfIterations = maxNumberOfIterations,
seed = 145873,
showStatistics = TRUE
)|> summary() simulation
```

**Simulation of a binary endpoint (multi-arm design)**

Sequential analysis with a maximum of 3 looks (inverse normal combination test design), one-sided overall significance level 2.5%. The results were simulated for a multi-arm comparisons for rates (2 treatments vs. control), H0: pi(i) - pi(control) = 0, power directed towards smaller values, H1: pi_max as specified, control rate pi(control) = 0.1, planned cumulative sample size = c(157, 314, 471), effect shape = user defined, intersection test = Simes, selection = user defined, effect measure based on test statistic, success criterion: all, simulation runs = 100, seed = 145873.

Stage | 1 | 2 | 3 |
---|---|---|---|

Fixed weight | 0.577 | 0.577 | 0.577 |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Reject at least one [1] | 0.0200 | ||

Reject at least one [2] | 0.8200 | ||

Reject at least one [3] | 0.8100 | ||

Rejected arms per stage [1] | |||

Treatment arm 1 | 0 | 0 | 0.0200 |

Treatment arm 2 | 0 | 0 | 0 |

Rejected arms per stage [2] | |||

Treatment arm 1 | 0 | 0.4200 | 0.3300 |

Treatment arm 2 | 0.0100 | 0.3700 | 0.3000 |

Rejected arms per stage [3] | |||

Treatment arm 1 | 0.0100 | 0.2300 | 0.2800 |

Treatment arm 2 | 0.0100 | 0.4800 | 0.2900 |

Success per stage [1] | 0 | 0 | 0.0200 |

Success per stage [2] | 0 | 0.3400 | 0.4500 |

Success per stage [3] | 0.0100 | 0.2800 | 0.4200 |

Stage-wise number of subjects [1] | |||

Treatment arm 1 | 157.0 | 140.5 | 85.6 |

Treatment arm 2 | 157.0 | 103.3 | 71.4 |

Control arm | 157.0 | 157.0 | 157.0 |

Stage-wise number of subjects [2] | |||

Treatment arm 1 | 157.0 | 145.4 | 133.7 |

Treatment arm 2 | 157.0 | 147.1 | 122.1 |

Control arm | 157.0 | 157.0 | 157.0 |

Stage-wise number of subjects [3] | |||

Treatment arm 1 | 157.0 | 129.7 | 112.1 |

Treatment arm 2 | 157.0 | 148.5 | 151.4 |

Control arm | 157.0 | 157.0 | 157.0 |

Expected number of subjects under H1 [1] | 657.8 | ||

Expected number of subjects under H1 [2] | 1121.0 | ||

Expected number of subjects under H1 [3] | 1106.8 | ||

Overall exit probability [1] | 0.6200 | 0.2700 | |

Overall exit probability [2] | 0.0500 | 0.4100 | |

Overall exit probability [3] | 0.0800 | 0.3600 | |

Selected arms [1] | |||

Treatment arm 1 | 1.0000 | 0.3400 | 0.0600 |

Treatment arm 2 | 1.0000 | 0.2500 | 0.0500 |

Selected arms [2] | |||

Treatment arm 1 | 1.0000 | 0.8800 | 0.4600 |

Treatment arm 2 | 1.0000 | 0.8900 | 0.4200 |

Selected arms [3] | |||

Treatment arm 1 | 1.0000 | 0.7600 | 0.4000 |

Treatment arm 2 | 1.0000 | 0.8700 | 0.5400 |

Number of active arms [1] | 2.000 | 1.553 | 1.000 |

Number of active arms [2] | 2.000 | 1.863 | 1.630 |

Number of active arms [3] | 2.000 | 1.772 | 1.679 |

Conditional power (achieved) [1] | 0.0627 | 0.5144 | |

Conditional power (achieved) [2] | 0.4522 | 0.7123 | |

Conditional power (achieved) [3] | 0.4165 | 0.7539 | |

Exit probability for efficacy | 0.0213 | 0.4258 | |

Exit probability for futility [1] | 0.6200 | 0.2700 | |

Exit probability for futility [2] | 0.0500 | 0.0700 | |

Exit probability for futility [3] | 0.0700 | 0.0800 |

Legend:

*(i)*: treatment arm i*[j]*: effect matrix row j (situation to consider)

Depending on what stage the simulation is currently iterating through, the `switch()`

in the treatment selection function recognizes the stage changes and adapts the applicable futility bound by switching rows in the predefined futility bound matrix.

Again, assuming \(H_0\) to be true, the probability of falsely rejecting it is \(0.0181\), thus this simulation again indicates type I error rate control at \(\alpha=0.025\). The power assuming a \(50\%\) reduction is given through \(0.8304\). Since now, stopping of treatment arms is determined by crossing the futility bounds or not, the probability of having futility in the first stage increases to \(0.5968\) in the first and to \(0.2984\) in the second stage under \(H_0\) as compared to the first simulation. Having a different treatment selection scheme also result in lower expected number of subjects and, in opposite to the first simulation, the number of arms per scenario lies \(<2\) starting from stage \(2\) since this treatment selection allows for early discontinuation of study arms whereas in the first simulation, arms are to be carried forward regardless of potential futility.

In both simulations, with initially calculated 157 subjects per arm per stage and a relative event rate reduction of \(50\%\) as alternative, one can see that the study is overpowered (since *Reject at least one [j]*>0.8 in scenario \(2\)), which is due to the considered alternative having approximately the same effect in the two active treatment arms. For the second simulation, performing simulations through different potentially optimal sample sizes (in a sense that being the smallest sample size needed to achieve \(80\%\) power) indicate that one could save some subjects and still achieve the desired power. The following plot shows the minimum sample size that one could choose to achieve the power requirements, which appears to be about \(146\), however, one should keep in mind that this number might slightly be deviating from an analytically optimal solution due to deviations introduced by the simulation:

Further, it should be noted here that even if one explicitly has \(\pi_2=0..1\) under the null hypothesis, testing the hypothesis \(H_0: \frac{\pi_1}{\pi_2}\geq1\) means that whenever \(\pi_1\geq\pi_2\) with \(\pi_2, \, \pi_1 \in (0,1]\), one has a scenario where the null hypothesis is true. Consequently, one should properly check if the type I error rate is controlled not only assuming \(\pi_2=0.1\), but also considering the other scenarios when \(\pi_2 \neq 0.1\). For the second simulation, considering only the cases where rate equality holds, following plot obtained by simulation indicate that, given the study configurations here, the type I error rate is controlled under various control event rate assumptions:

# Illustrative analysis with generic Data

In the next chapters, different hypothetical binary endpoint datasets are generated and analyzed using the `getAnalysisResults()`

command. As previously mentioned, rpact itself doesn’t support landmark analysis using the Greenwoods standard error (SE) formula. Thus the first two analyses base on the empirical event rates only. Afterwards, gestate and survival are then used to show how one could merge the packages to perform the intended analysis using boundaries obtained by rpact and test statistics obtained by survival probabilities and standard errors estimated using gestate and survival.

## First theoretical path

In the first stage analysis, one first has to manually enter a dataset for data observed in the trial:

```
<- getDataset(
genData_1 events1 = 4,
events2 = 8,
events3 = 16,
sampleSizes1 = 153,
sampleSizes2 = 157,
sampleSizes3 = 156
)
|> summary() genData_1
```

**Dataset of multi-arm rates**

The dataset contains the sample sizes and events of two treatment groups and one control group.

Stage | 1 | 1 | 1 |
---|---|---|---|

Group | 1 | 2 | 3 |

Sample size | 153 | 157 | 156 |

Number of events | 4 | 8 | 16 |

This dataset is a generic realization of data from the first stage in a design with binary endpoint under the input assumption of having a \(50\%\) rate reduction in the treatment groups given \(\pi_2=0.1\). The highest index (\(3\)) corresponds with the events occurred in the control group and with the underlying sample sizes, respectively. The other indices represent active treatment groups. Data here is chosen such that one can see lower event rates in the active treatment groups. Note the slight imbalances in sample sizes, which might occur due to dropouts or recruitment issues.

Actual analysis of the first stage using *Simes* as intersection test goes as follows:

```
<- getAnalysisResults(
results_1 design = d_IN,
dataInput = genData_1,
directionUpper = FALSE,
intersectionTest = "Simes"
)
```

`|> summary() results_1 `

**Multi-arm analysis results for a binary endpoint (2 active arms vs. control)**

Sequential analysis with 3 looks (inverse normal combination test design), one-sided overall significance level 2.5%. The results were calculated using a multi-arm test for rates, Simes intersection test, normal approximation test. H0: pi(i) - pi(control) = 0 against H1: pi(i) - pi(control) < 0.

Stage | 1 | 2 | 3 |
---|---|---|---|

Fixed weight | 0.577 | 0.577 | 0.577 |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Cumulative effect size (1) | -0.076 | ||

Cumulative effect size (2) | -0.052 | ||

Cumulative treatment rate (1) | 0.026 | ||

Cumulative treatment rate (2) | 0.051 | ||

Cumulative control rate | 0.103 | ||

Stage-wise test statistic (1) | -2.730 | ||

Stage-wise test statistic (2) | -1.716 | ||

Stage-wise p-value (1) | 0.0032 | ||

Stage-wise p-value (2) | 0.0431 | ||

Adjusted stage-wise p-value (1, 2) | 0.0063 | ||

Adjusted stage-wise p-value (1) | 0.0032 | ||

Adjusted stage-wise p-value (2) | 0.0431 | ||

Overall adjusted test statistic (1, 2) | 2.493 | ||

Overall adjusted test statistic (1) | 2.730 | ||

Overall adjusted test statistic (2) | 1.716 | ||

Test action: reject (1) | FALSE | ||

Test action: reject (2) | FALSE | ||

Conditional rejection probability (1) | 0.2907 | ||

Conditional rejection probability (2) | 0.1204 | ||

95% repeated confidence interval (1) | [-0.212; 0.043] | ||

95% repeated confidence interval (2) | [-0.191; 0.079] | ||

Repeated p-value (1) | 0.1150 | ||

Repeated p-value (2) | 0.2429 |

Legend:

*(i)*: results of treatment arm i vs. control arm*(i, j, …)*: comparison of treatment arms ‘i, j, …’ vs. control arm

Although having obviously lower event rates in the first stage already, none of the \(2\) hypotheses can be rejected, which can manually be tested comparing e.g. the overall adjusted test statistics for the global intersection hypothesis of having no effect in neither one of the active treatment arms to the efficacy bound and see that \(2.493 < 3.710\), i.e. no rejection of global intersection. Neither stop due to futility appears since no futility boundary is crossed at stage \(1\), which means that both treatment arms are to be carried forward to a second stage analysis. For obtaining non-significance in stage \(1\), one could also compare repeated p-values to full \(\alpha\).

Proceeding to second stage, a generic dataset might look as follows, while the second vector entries represent second stage data:

```
# assuming there was no futility or efficacy stop study proceeds to randomize subjects
<- getDataset(
genData_2 events1 = c(4, 7),
events2 = c(8, 7),
events3 = c(16, 15),
sampleSizes1 = c(153, 155),
sampleSizes2 = c(157, 155),
sampleSizes3 = c(156, 155)
)|> summary() genData_2
```

**Dataset of multi-arm rates**

The dataset contains the sample sizes and events of two treatment groups and one control group. The total number of looks is two; stage-wise and cumulative data are included.

Stage | 1 | 1 | 1 | 2 | 2 | 2 |
---|---|---|---|---|---|---|

Group | 1 | 2 | 3 | 1 | 2 | 3 |

Stage-wise sample size | 153 | 157 | 156 | 155 | 155 | 155 |

Cumulative sample size | 153 | 157 | 156 | 308 | 312 | 311 |

Stage-wise number of events | 4 | 8 | 16 | 7 | 7 | 15 |

Cumulative number of events | 4 | 8 | 16 | 11 | 15 | 31 |

Here, again, event rates are lower in treatment groups.

```
<- getAnalysisResults(
results_2 design = d_IN,
dataInput = genData_2,
directionUpper = FALSE,
intersectionTest = "Simes"
)
```

`|> summary() results_2 `

**Multi-arm analysis results for a binary endpoint (2 active arms vs. control)**

Sequential analysis with 3 looks (inverse normal combination test design), one-sided overall significance level 2.5%. The results were calculated using a multi-arm test for rates, Simes intersection test, normal approximation test. H0: pi(i) - pi(control) = 0 against H1: pi(i) - pi(control) < 0.

Stage | 1 | 2 | 3 |
---|---|---|---|

Fixed weight | 0.577 | 0.577 | 0.577 |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Cumulative effect size (1) | -0.076 | -0.064 | |

Cumulative effect size (2) | -0.052 | -0.052 | |

Cumulative treatment rate (1) | 0.026 | 0.036 | |

Cumulative treatment rate (2) | 0.051 | 0.048 | |

Cumulative control rate | 0.103 | 0.100 | |

Stage-wise test statistic (1) | -2.730 | -1.770 | |

Stage-wise test statistic (2) | -1.716 | -1.770 | |

Stage-wise p-value (1) | 0.0032 | 0.0384 | |

Stage-wise p-value (2) | 0.0431 | 0.0384 | |

Adjusted stage-wise p-value (1, 2) | 0.0063 | 0.0384 | |

Adjusted stage-wise p-value (1) | 0.0032 | 0.0384 | |

Adjusted stage-wise p-value (2) | 0.0431 | 0.0384 | |

Overall adjusted test statistic (1, 2) | 2.493 | 3.014 | |

Overall adjusted test statistic (1) | 2.730 | 3.182 | |

Overall adjusted test statistic (2) | 1.716 | 2.464 | |

Test action: reject (1) | FALSE | TRUE | |

Test action: reject (2) | FALSE | FALSE | |

Conditional rejection probability (1) | 0.2907 | 0.7911 | |

Conditional rejection probability (2) | 0.1204 | 0.5133 | |

95% repeated confidence interval (1) | [-0.212; 0.043] | [-0.130; -0.005] | |

95% repeated confidence interval (2) | [-0.191; 0.079] | [-0.119; 0.011] | |

Repeated p-value (1) | 0.1150 | 0.0086 | |

Repeated p-value (2) | 0.2429 | 0.0274 |

Legend:

*(i)*: results of treatment arm i vs. control arm*(i, j, …)*: comparison of treatment arms ‘i, j, …’ vs. control arm

Performing the same comparisons as in stage \(1\), one can see: The global null is to be rejected (\(3.014 > 2.511\)). Subsequently, the hypothesis for the first active treatment is rejected, since \(3.182 > 2.511\) indicating that the first active arm performs better than control. The same result can be obtained by recognizing that the *repeated p-value (1)* of \(0.0086\) falls below \(0.025\). The consequence is that this arm can be discontinued early due to efficacy. However, since the second treatment arm is not tested significant (\(2.464 < 2.511\)) and early stopping for efficacy is true only when all active treatments appear to be significant, study will continue up to the the final stage with the second treatment arm only. Again, no futility stop occurs. It should be noted here that the *adjusted stage-wise p-values* cannot directly be used for testing, but the values do only base on the second stage data. Since an inverse normal combination test is performed with weights given by \(\frac{1}{3}\) here, the first and second stage p-values can be used to calculate the overall adjusted test statistics, exemplarily \(\frac{\sqrt{\frac{1}{3}}*\Phi^{-1}(1-0.0063)+\sqrt{\frac{1}{3}}*\Phi^{-1}(1-0.0384)}{\sqrt{\frac{2}{3}}} \approx 3.014\).

Third-stage-dataset:

```
<- getDataset(
genData_3 events1 = c(4, 7, NA),
events2 = c(8, 7, 6),
events3 = c(16, 15, 16),
sampleSizes1 = c(153, 155, NA),
sampleSizes2 = c(157, 155, 156),
sampleSizes3 = c(156, 155, 160)
)|> summary() genData_3
```

**Dataset of multi-arm rates**

The dataset contains the sample sizes and events of two treatment groups and one control group. The total number of looks is three; stage-wise and cumulative data are included.

Stage | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 |
---|---|---|---|---|---|---|---|---|---|

Group | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |

Stage-wise sample size | 153 | 157 | 156 | 155 | 155 | 155 | NA | 156 | 160 |

Cumulative sample size | 153 | 157 | 156 | 308 | 312 | 311 | NA | 468 | 471 |

Stage-wise number of events | 4 | 8 | 16 | 7 | 7 | 15 | NA | 6 | 16 |

Cumulative number of events | 4 | 8 | 16 | 11 | 15 | 31 | NA | 21 | 47 |

Final analysis:

```
<- getAnalysisResults(
results_3 design = d_IN,
dataInput = genData_3,
directionUpper = FALSE,
intersectionTest = "Simes"
)
```

`|> summary() results_3 `

**Multi-arm analysis results for a binary endpoint (2 active arms vs. control)**

Sequential analysis with 3 looks (inverse normal combination test design), one-sided overall significance level 2.5%. The results were calculated using a multi-arm test for rates, Simes intersection test, normal approximation test. H0: pi(i) - pi(control) = 0 against H1: pi(i) - pi(control) < 0.

Stage | 1 | 2 | 3 |
---|---|---|---|

Fixed weight | 0.577 | 0.577 | 0.577 |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Cumulative effect size (1) | -0.076 | -0.064 | |

Cumulative effect size (2) | -0.052 | -0.052 | -0.055 |

Cumulative treatment rate (1) | 0.026 | 0.036 | |

Cumulative treatment rate (2) | 0.051 | 0.048 | 0.045 |

Cumulative control rate | 0.103 | 0.100 | 0.100 |

Stage-wise test statistic (1) | -2.730 | -1.770 | |

Stage-wise test statistic (2) | -1.716 | -1.770 | -2.149 |

Stage-wise p-value (1) | 0.0032 | 0.0384 | |

Stage-wise p-value (2) | 0.0431 | 0.0384 | 0.0158 |

Adjusted stage-wise p-value (1, 2) | 0.0063 | 0.0384 | 0.0158 |

Adjusted stage-wise p-value (1) | 0.0032 | 0.0384 | |

Adjusted stage-wise p-value (2) | 0.0431 | 0.0384 | 0.0158 |

Overall adjusted test statistic (1, 2) | 2.493 | 3.014 | 3.702 |

Overall adjusted test statistic (1) | 2.730 | 3.182 | |

Overall adjusted test statistic (2) | 1.716 | 2.464 | 3.253 |

Test action: reject (1) | FALSE | TRUE | TRUE |

Test action: reject (2) | FALSE | FALSE | TRUE |

Conditional rejection probability (1) | 0.2907 | 0.7911 | |

Conditional rejection probability (2) | 0.1204 | 0.5133 | |

95% repeated confidence interval (1) | [-0.212; 0.043] | [-0.130; -0.005] | |

95% repeated confidence interval (2) | [-0.191; 0.079] | [-0.119; 0.011] | [-0.099; -0.013] |

Repeated p-value (1) | 0.1150 | 0.0086 | |

Repeated p-value (2) | 0.2429 | 0.0274 | 0.0006 |

Legend:

*(i)*: results of treatment arm i vs. control arm*(i, j, …)*: comparison of treatment arms ‘i, j, …’ vs. control arm

Considering either e.g. overall adjusted test statistics for intersection and single hypothesis or repeated p-values, or directly referring to *Test action: reject (2)*, one can conclude that the study claimed success since the first active treatment arm has already been tested significantly in the second stage while the second active treatment arm appears to be superior to control in the final stage.

Visualization of the analyses can be done graphically by illustrating the repeated confidence intervals, while one can see how the intervals keep narrowing along the stages because of increasing cumulative sample sizes and consistent trend observed across stages:

`|> plot(type = 2) results_3 `

## Second theoretical path

Assume that the first stage dataset does not differ from the first generic example (thus, nor the analysis results), then a second stage dataset could be:

```
<- getDataset(
genData_4 events1 = c(4, 9),
events2 = c(8, 23),
events3 = c(16, 15),
sampleSizes1 = c(153, 155),
sampleSizes2 = c(157, 155),
sampleSizes3 = c(156, 155)
)
```

Note the high event number in the second active treatment arm.

Analysis results:

```
<- getAnalysisResults(
results_4 design = d_IN,
dataInput = genData_4,
directionUpper = FALSE,
intersectionTest = "Simes"
)
```

`|> summary() results_4 `

**Multi-arm analysis results for a binary endpoint (2 active arms vs. control)**

Stage | 1 | 2 | 3 |
---|---|---|---|

Fixed weight | 0.577 | 0.577 | 0.577 |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Cumulative effect size (1) | -0.076 | -0.057 | |

Cumulative effect size (2) | -0.052 | 0.000 | |

Cumulative treatment rate (1) | 0.026 | 0.042 | |

Cumulative treatment rate (2) | 0.051 | 0.099 | |

Cumulative control rate | 0.103 | 0.100 | |

Stage-wise test statistic (1) | -2.730 | -1.275 | |

Stage-wise test statistic (2) | -1.716 | 1.385 | |

Stage-wise p-value (1) | 0.0032 | 0.1011 | |

Stage-wise p-value (2) | 0.0431 | 0.9170 | |

Adjusted stage-wise p-value (1, 2) | 0.0063 | 0.2023 | |

Adjusted stage-wise p-value (1) | 0.0032 | 0.1011 | |

Adjusted stage-wise p-value (2) | 0.0431 | 0.9170 | |

Overall adjusted test statistic (1, 2) | 2.493 | 2.352 | |

Overall adjusted test statistic (1) | 2.730 | 2.832 | |

Overall adjusted test statistic (2) | 1.716 | 0.234 | |

Test action: reject (1) | FALSE | FALSE | |

Test action: reject (2) | FALSE | FALSE | |

Conditional rejection probability (1) | 0.2907 | 0.4500 | |

Conditional rejection probability (2) | 0.1204 | 0.0009 | |

95% repeated confidence interval (1) | [-0.212; 0.043] | [-0.125; 0.003] | |

95% repeated confidence interval (2) | [-0.191; 0.079] | [-0.080; 0.075] | |

Repeated p-value (1) | 0.1150 | 0.0340 | |

Repeated p-value (2) | 0.2429 | 0.2429 |

Legend:

*(i)*: results of treatment arm i vs. control arm*(i, j, …)*: comparison of treatment arms ‘i, j, …’ vs. control arm

As one can see, the global null hypothesis can not be rejected since \(2.352 < 2.511\). However, the second treatment arm results in a stop due to futility, since, after not rejecting the global null, \(0.234 < 0.414\) as indicated by the overall adjusted test statistic. That means this study course would lead to continuation only with active treatment arm \(1\).

Subsequent final stage dataset:

```
<- getDataset(
genData_5 events1 = c(4, 9, 7),
events2 = c(8, 23, NA),
events3 = c(16, 15, 16),
sampleSizes1 = c(153, 155, 165),
sampleSizes2 = c(157, 155, NA),
sampleSizes3 = c(156, 155, 160)
)|> summary() genData_5
```

**Dataset of multi-arm rates**

The dataset contains the sample sizes and events of two treatment groups and one control group. The total number of looks is three; stage-wise and cumulative data are included.

Stage | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 |
---|---|---|---|---|---|---|---|---|---|

Group | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |

Stage-wise sample size | 153 | 157 | 156 | 155 | 155 | 155 | 165 | NA | 160 |

Cumulative sample size | 153 | 157 | 156 | 308 | 312 | 311 | 473 | NA | 471 |

Stage-wise number of events | 4 | 8 | 16 | 9 | 23 | 15 | 7 | NA | 16 |

Cumulative number of events | 4 | 8 | 16 | 13 | 31 | 31 | 20 | NA | 47 |

Note that recruitment for treatment arm \(2\) has been terminated after stage \(2\).

Final stage analysis:

```
<- getAnalysisResults(
results_5 design = d_IN,
dataInput = genData_5,
directionUpper = FALSE,
intersectionTest = "Simes"
)
```

`|> summary() results_5 `

**Multi-arm analysis results for a binary endpoint (2 active arms vs. control)**

Stage | 1 | 2 | 3 |
---|---|---|---|

Fixed weight | 0.577 | 0.577 | 0.577 |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Cumulative effect size (1) | -0.076 | -0.057 | -0.058 |

Cumulative effect size (2) | -0.052 | 0.000 | |

Cumulative treatment rate (1) | 0.026 | 0.042 | 0.042 |

Cumulative treatment rate (2) | 0.051 | 0.099 | |

Cumulative control rate | 0.103 | 0.100 | 0.100 |

Stage-wise test statistic (1) | -2.730 | -1.275 | -2.024 |

Stage-wise test statistic (2) | -1.716 | 1.385 | |

Stage-wise p-value (1) | 0.0032 | 0.1011 | 0.0215 |

Stage-wise p-value (2) | 0.0431 | 0.9170 | |

Adjusted stage-wise p-value (1, 2) | 0.0063 | 0.2023 | 0.0215 |

Adjusted stage-wise p-value (1) | 0.0032 | 0.1011 | 0.0215 |

Adjusted stage-wise p-value (2) | 0.0431 | 0.9170 | |

Overall adjusted test statistic (1, 2) | 2.493 | 2.352 | 3.089 |

Overall adjusted test statistic (1) | 2.730 | 2.832 | 3.481 |

Overall adjusted test statistic (2) | 1.716 | 0.234 | |

Test action: reject (1) | FALSE | FALSE | TRUE |

Test action: reject (2) | FALSE | FALSE | FALSE |

Conditional rejection probability (1) | 0.2907 | 0.4500 | |

Conditional rejection probability (2) | 0.1204 | 0.0009 | |

95% repeated confidence interval (1) | [-0.212; 0.043] | [-0.125; 0.003] | [-0.102; -0.017] |

95% repeated confidence interval (2) | [-0.191; 0.079] | [-0.080; 0.075] | |

Repeated p-value (1) | 0.1150 | 0.0340 | 0.0010 |

Repeated p-value (2) | 0.2429 | 0.2429 |

Legend:

*(i)*: results of treatment arm i vs. control arm*(i, j, …)*: comparison of treatment arms ‘i, j, …’ vs. control arm

Since per definition, success of one treatment arm suffices to declare study success, that is the conclusion one comes to due to significance of treatment arm \(1\). Stopping treatment arm \(2\) earlier therefore did not influence the positive result.

# Landmark analysis with Greenwood’s formula

A landmark analysis is an analysis where, given a predefined and fixed point in time, survival probabilities at that specific time point are to be compared between different groups, such as active treatment and control group.

As previously mentioned, rpact itself does not support the procedure of comparing survival probabilities (more precisely estimates) at a specific point in time with standard error being calculated by Greenwoods formula, but bases on simple rate comparisons only. However, a R package called gestate supports various survival analysis methods, including landmark analysis. Firstly, one needs to load the package:

`library(gestate)`

Then, since gestates’ functionality bases on curve subjects and the analysis approach bases on survival data, the distribution of the data from treatment and control groups need to be specified. Assume that the data in the control group follows an exponential distribution Exp(\(\lambda_c\))-distribution with \(\lambda_c=0.1\) and the treatment groups follow Exp(\(\lambda_{t1}\))- and Exp(\(\lambda_{t2}\))-distribution with \(\lambda_{t1}=\lambda_{t2}=0.1*0.5=0.05\) respectively (hazard ratio of \(0.5\), heuristically representing the assumed treatment effect of a \(50\%\) rate-reduction), while other options would be to assume Weibull or piecewise Exponential distribution. This is initialized using:

```
<- 0.5
effect
# initializing assumed distributions of different treatment arms as well as control
<- Exponential(lambda = 0.1)
dist_c <- Exponential(lambda = 0.1 * (1 - effect))
dist_t1 <- Exponential(lambda = 0.1 * (1 - effect)) dist_t2
```

Then, another curve that needs to be initialized is the recruitment curve. Gestate covers the potential assumptions of e.g. having instantaneous, linear or piecewise linear recruitment. In many applications, assuming linear (e.g. constant rate) recruitment might be suitable, thus this is what might be used here as well. Note that the subject number equal the numbers that have been calculated in chapter \(2\) and, since measuring study duration in month is common, one may assume a total recruitment length of \(18 \, month = 1.5 \, years\). The data will later on be used to estimate survival probabilities and standard errors, subsequently to be used as input for the rpact continuous endpoint module. Since the mean, standard deviation and sample sizes in the dataset input all need to be stage-wise because of the use of inverse normal combination test, data simulation will also be done separately per stage per arm, resulting in a maximum of 9 simulated datasets in this application. Further, since the information levels are equally spread along the three stages, each stage-wise dataset will base on a recruitment of \(157\) subjects and a recruitment length of \(6 \, month\). Assuming an individual observation time of \(1 \, month\), one results in a total duration of \(7 \, month\) per stage:

```
# equally sized arms and stages
<- 157
n
# maximum duration of each stage
<- 7
maxtime
# initialize recruitment for control group
<- LinearR(Nactive = 0, Ncontrol = n, rlength = 6)
recruitment_c
# initialize recruitment for treatment groups, here: equal
<- LinearR(Nactive = n, Ncontrol = 0, rlength = 6)
recruitment_t1 <- LinearR(Nactive = n, Ncontrol = 0, rlength = 6) recruitment_t2
```

As already mentioned in chapter \(1\) (initialization of design), the interim analyses are to be performed at \(I_1 = \frac{1}{3}\) and \(I_2 = \frac{2}{3}\) with \(I\) denoting the information fraction and the index denoting the stage. Since information here stands in \(1:1\) correlation with sample size one can conclude that interims should be performed after \(6\) month and after \(12\) month of recruitment. Censoring could be specified through the variables `active_dcurve`

, `control_dcurve`

, however, in trials where no or only few censoring/dropout might be a reasonable assumption, these variables can be left on default (*=Blank()*). The only remaining type of censoring is censoring due to end of observation, e.g. subjects having had no event before study ends are considered censored.

Having made assumptions on data distribution, recruitment and censoring, simulating patient level survival data works as follows. It should be noted here that following simulations and data manipulations are demonstrated for the first stage data, however, second and third stage data is simulated and manipulated similarly.

```
# simulating data for all arms and stages independently (needed for dataset input)
# first stage, all arms, last index represents the stage
<- simulate_trials(
example_data_long_t1_1 active_ecurve = dist_t1,
control_ecurve = Blank(),
rcurve = recruitment_t1,
assess = maxtime,
iterations = 1,
seed = 1,
detailed_output = TRUE
)
<- simulate_trials(
example_data_long_t2_1 active_ecurve = dist_t2,
control_ecurve = Blank(),
rcurve = recruitment_t2,
assess = maxtime,
iterations = 1,
seed = 2,
detailed_output = TRUE
)
<- simulate_trials(
example_data_long_c_1 active_ecurve = Blank(),
control_ecurve = dist_c,
rcurve = recruitment_c,
assess = maxtime,
iterations = 1,
seed = 3,
detailed_output = TRUE
)
kable(head(example_data_long_t1_1, 10))
```

Time | Censored | Trt | Iter | ETime | CTime | Rec_Time | Assess | Max_F | RCTime |
---|---|---|---|---|---|---|---|---|---|

4.136118 | 1 | 2 | 1 | 15.103637 | Inf | 2.8638823 | 7 | Inf | 4.136118 |

2.375578 | 1 | 2 | 1 | 23.632856 | Inf | 4.6244223 | 7 | Inf | 2.375578 |

2.914134 | 0 | 2 | 1 | 2.914134 | Inf | 0.1667227 | 7 | Inf | 6.833277 |

2.795905 | 0 | 2 | 1 | 2.795905 | Inf | 3.1638647 | 7 | Inf | 3.836135 |

1.718086 | 1 | 2 | 1 | 8.721372 | Inf | 5.2819144 | 7 | Inf | 1.718086 |

4.761620 | 1 | 2 | 1 | 57.899371 | Inf | 2.2383802 | 7 | Inf | 4.761620 |

6.712245 | 1 | 2 | 1 | 24.591241 | Inf | 0.2877548 | 7 | Inf | 6.712245 |

6.168230 | 1 | 2 | 1 | 10.793657 | Inf | 0.8317695 | 7 | Inf | 6.168230 |

5.071047 | 1 | 2 | 1 | 19.131350 | Inf | 1.9289527 | 7 | Inf | 5.071047 |

2.940920 | 0 | 2 | 1 | 2.940920 | Inf | 0.9289897 | 7 | Inf | 6.071010 |

The table above provide information on patient level, such as time (to event), treatment group, time of event (if observed), assessment timing and recruitment time. To achieve that every subject only has an individual observation time of \(1 \, month\), one can manually set the censoring indicator to \(1\) whenever no event has happened or the simulation indicates that an event has happened only after the fixed observation time.

```
# Check: if time to event (first column) is >=1, subject is censored
which(example_data_long_t1_1[, 1] > 1), 2] <- 1
example_data_long_t1_1[which(example_data_long_t2_1[, 1] > 1), 2] <- 1
example_data_long_t2_1[which(example_data_long_c_1[, 1] > 1), 2] <- 1 example_data_long_c_1[
```

After doing this, since the administrative censoring due to end of observation has manually been manipulated, the time of censoring in the last column can manually be corrected according to the changes:

```
# if censoring indicator==1, the censoring time is set to event time
# according to the previous changes
# stage 1
which(example_data_long_t1_1[, 2] == 1), 9] <-
example_data_long_t1_1[which(example_data_long_t1_1[, 2] == 1), 1]
example_data_long_t1_1[
which(example_data_long_t2_1[, 2] == 1), 9] <-
example_data_long_t2_1[which(example_data_long_t2_1[, 2] == 1), 1]
example_data_long_t2_1[
which(example_data_long_c_1[, 2] == 1), 9] <-
example_data_long_c_1[which(example_data_long_c_1[, 2] == 1), 1] example_data_long_c_1[
```

Though working on subject level data (if available) is commonly recommended, one could be interested in tables containing time, number of subjects at risk, survival probabilities and respective standard errors, henceforth referred to as life tables. After setting the assessment time to the defined \(7 \, month\), for creating the life tables, the commands `Surv()`

and `survfit()`

from the R package survival are used:

```
# needs to be run in case package has not yet been installed
library(survival)
# setting desired assessment time
<- set_assess_time(example_data_long_t1_1, maxtime)
example_data_short_t1_1 <- set_assess_time(example_data_long_t2_1, maxtime)
example_data_short_t2_1 <- set_assess_time(example_data_long_c_1, maxtime)
example_data_short_c_1
# Creating live tables for each group depending on the stage
# at first landmark time point
<- summary(survfit(Surv(
lt_t1_1 "Time"],
example_data_short_t1_1[, 1 - example_data_short_t1_1[, "Censored"]
~ 1, error = "greenwood"))
) <- summary(survfit(Surv(
lt_t2_1 "Time"],
example_data_short_t2_1[, 1 - example_data_short_t2_1[, "Censored"]
~ 1, error = "greenwood"))
) <- summary(survfit(Surv(
lt_c_1 "Time"],
example_data_short_c_1[, 1 - example_data_short_c_1[, "Censored"]
~ 1, error = "greenwood"))
)
kable(head(cbind(
"Time" = lt_t1_1$time, "n.risk" = lt_t1_1$n.risk,
"n.event" = lt_t1_1$n.event, "surv.prob." = lt_t1_1$surv,
"Greenwoods SE" = lt_t1_1$std.err
10)) ),
```

Time | n.risk | n.event | surv.prob. | Greenwoods SE |
---|---|---|---|---|

0.4050083 | 157 | 1 | 0.9936306 | 0.0063491 |

0.7430455 | 156 | 1 | 0.9872611 | 0.0089502 |

0.7453705 | 155 | 1 | 0.9808917 | 0.0109263 |

1.0411090 | 152 | 1 | 0.9744385 | 0.0126170 |

1.1852241 | 143 | 1 | 0.9676242 | 0.0142506 |

1.1887832 | 142 | 1 | 0.9608100 | 0.0156951 |

1.3882469 | 136 | 1 | 0.9537452 | 0.0170959 |

1.7934816 | 127 | 1 | 0.9462354 | 0.0185375 |

1.8221775 | 125 | 1 | 0.9386655 | 0.0198748 |

1.9769123 | 119 | 1 | 0.9307776 | 0.0212154 |

This table created above contains time, number of subjects at risk, number of events, survival probability estimation and standard error estimation, thus allowing for creating a Kaplan-Meier-Plot to visualize the development of survival probability in time. Note that choosing `error="greenwood"`

leads to the desired SE estimation. One could also discretize the time into intervals to obtain life tables with consistent time steps, if desired. Exemplary Kaplan-Meier-Plots for the different treatment groups at the different stages are created below:

While the survival probabilities from the treatment arms develop rather similarly for the independently simulated stages, the figures demonstrate that, besides few early exceptions, the survival probabilities from the control group falls below the survival probabilities of the treatment groups almost constantly, indicating that the groups might actually (i.e. potentially statistically significant) differ.

As previously mentioned, the continuous module of rpact is now used to perform the landmark analysis of the gestate-simulated data. Note here that this is a valid approximation whenever assuming the survival probability estimates to be normally distributed is reasonable. To initialize continuous datasets, estimated survival probabilities from the life tables above will be used as mean, the transformed Greenwood SE estimations will be used as standard deviation and the sample size from the earlier sample size calculations is used, making the simplifying assumption of no dropouts. The first stage continuous dataset therefore is:

```
<- 7
lm
# first stage dataset
<- getDataset(
dataset_1 # accessing the survival probability values at the timepoint closest to the desired one
means1 = c(lt_t1_1$surv[which(abs(lt_t1_1$time - lm) == min(abs(lt_t1_1$time - lm)))]),
means2 = c(lt_t2_1$surv[which(abs(lt_t2_1$time - lm) == min(abs(lt_t2_1$time - lm)))]),
means3 = c(lt_c_1$surv[which(abs(lt_c_1$time - lm) == min(abs(lt_c_1$time - lm)))]),
stDevs1 = c(lt_t1_1$std.err[which(abs(lt_t1_1$time - lm) ==
min(abs(lt_t1_1$time - lm)))] * sqrt(n)),
stDevs2 = c(lt_t2_1$std.err[which(abs(lt_t2_1$time - lm) ==
min(abs(lt_t2_1$time - lm)))] * sqrt(n)),
stDevs3 = c(lt_c_1$std.err[which(abs(lt_c_1$time - lm) ==
min(abs(lt_c_1$time - lm)))] * sqrt(n)),
n1 = c(n),
n2 = c(n),
n3 = c(n)
)|> summary() dataset_1
```

**Dataset of multi-arm means**

The dataset contains the sample sizes, means, and standard deviations of two treatment groups and one control group.

Stage | 1 | 1 | 1 |
---|---|---|---|

Group | 1 | 2 | 3 |

Sample size | 157 | 157 | 157 |

Mean | 0.734 | 0.761 | 0.543 |

Standard deviation | 0.657 | 0.661 | 0.811 |

The explicit estimates can be seen using the `summary()`

-command.

Note that the highest index (\(=3\)) represents the control group. Since the time in the life table data has not been structured into intervals of equal length, the datasets do mostly not have survival probabilities and standard error entries exactly at the given landmark time point, which is why the closest point in time is chosen here. Analysis of the data according to the group sequential design in this vignette can then be done by:

```
<- getAnalysisResults(
result_1 dataInput = dataset_1, design = d_IN,
intersectionTest = "Simes",
normalApproximation = TRUE, varianceOption = "pairwisePooled"
)
```

`|> summary() result_1 `

**Multi-arm analysis results for a continuous endpoint (2 active arms vs. control)**

Sequential analysis with 3 looks (inverse normal combination test design), one-sided overall significance level 2.5%. The results were calculated using a multi-arm t-test, Simes intersection test, normal approximation test, pairwise pooled variances option. H0: mu(i) - mu(control) = 0 against H1: mu(i) - mu(control) > 0.

Stage | 1 | 2 | 3 |
---|---|---|---|

Fixed weight | 0.577 | 0.577 | 0.577 |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Cumulative effect size (1) | 0.191 | ||

Cumulative effect size (2) | 0.217 | ||

Cumulative (pooled) standard deviation | 0.714 | ||

Stage-wise test statistic (1) | 2.287 | ||

Stage-wise test statistic (2) | 2.601 | ||

Stage-wise p-value (1) | 0.0111 | ||

Stage-wise p-value (2) | 0.0046 | ||

Adjusted stage-wise p-value (1, 2) | 0.0093 | ||

Adjusted stage-wise p-value (1) | 0.0111 | ||

Adjusted stage-wise p-value (2) | 0.0046 | ||

Overall adjusted test statistic (1, 2) | 2.354 | ||

Overall adjusted test statistic (1) | 2.287 | ||

Overall adjusted test statistic (2) | 2.601 | ||

Test action: reject (1) | FALSE | ||

Test action: reject (2) | FALSE | ||

Conditional rejection probability (1) | 0.2359 | ||

Conditional rejection probability (2) | 0.2530 | ||

95% repeated confidence interval (1) | [-0.133; 0.514] | ||

95% repeated confidence interval (2) | [-0.107; 0.542] | ||

Repeated p-value (1) | 0.1425 | ||

Repeated p-value (2) | 0.1331 |

Legend:

*(i)*: results of treatment arm i vs. control arm*(i, j, …)*: comparison of treatment arms ‘i, j, …’ vs. control arm

Since the overall adjusted test statistic for the intersection hypotheses (\(=2.354\)) does not exceed the first stage critical value of \(3.710\), the intersection hypothesis of having no effect in either one of the treatment groups cannot be rejected, while e.g. repeated p-values lead to the same results. Furthermore, since all of the applicable test statistics do not fall below the futility bound of \(0.149\), the study ordinarily proceeds to the second stage:

```
<- getDataset(
dataset_2 means1 = c(
$surv[which(abs(lt_t1_1$time - lm) == min(abs(lt_t1_1$time - lm)))],
lt_t1_1$surv[which(abs(lt_t1_2$time - lm) == min(abs(lt_t1_2$time - lm)))]
lt_t1_2
),means2 = c(
$surv[which(abs(lt_t2_1$time - lm) == min(abs(lt_t2_1$time - lm)))],
lt_t2_1$surv[which(abs(lt_t2_2$time - lm) == min(abs(lt_t2_2$time - lm)))]
lt_t2_2
),means3 = c(
$surv[which(abs(lt_c_1$time - lm) == min(abs(lt_c_1$time - lm)))],
lt_c_1$surv[which(abs(lt_c_2$time - lm) == min(abs(lt_c_2$time - lm)))]
lt_c_2
),stDevs1 = c(
$std.err[which(abs(lt_t1_1$time - lm) ==
lt_t1_1min(abs(lt_t1_1$time - lm)))] * sqrt(n),
$std.err[
lt_t1_2which(abs(lt_t1_2$time - lm) == min(abs(lt_t1_2$time - lm)))
* sqrt(n)
]
),stDevs2 = c(
$std.err[which(abs(lt_t2_1$time - lm) ==
lt_t2_1min(abs(lt_t2_1$time - lm)))] * sqrt(n),
$std.err[which(abs(lt_t2_2$time - lm) ==
lt_t2_2min(abs(lt_t2_2$time - lm)))] * sqrt(n)
),stDevs3 = c(
$std.err[which(abs(lt_c_1$time - lm) == min(abs(lt_c_1$time - lm)))]
lt_c_1* sqrt(n),
$std.err[which(abs(lt_c_2$time - lm) == min(abs(lt_c_2$time - lm)))]
lt_c_2* sqrt(n)
),n1 = c(n, n),
n2 = c(n, n),
n3 = c(n, n)
)
<- getAnalysisResults(
result_2 dataInput = dataset_2, design = d_IN,
intersectionTest = "Simes", normalApproximation = TRUE,
varianceOption = "pairwisePooled"
)
```

`|> summary() result_2 `

**Multi-arm analysis results for a continuous endpoint (2 active arms vs. control)**

Sequential analysis with 3 looks (inverse normal combination test design), one-sided overall significance level 2.5%. The results were calculated using a multi-arm t-test, Simes intersection test, normal approximation test, pairwise pooled variances option. H0: mu(i) - mu(control) = 0 against H1: mu(i) - mu(control) > 0.

Stage | 1 | 2 | 3 |
---|---|---|---|

Fixed weight | 0.577 | 0.577 | 0.577 |

Cumulative alpha spent | 0.0001 | 0.0060 | 0.0250 |

Stage levels (one-sided) | 0.0001 | 0.0060 | 0.0231 |

Efficacy boundary (z-value scale) | 3.710 | 2.511 | 1.993 |

Futility boundary (z-value scale) | 0.149 | 0.414 | |

Cumulative effect size (1) | 0.191 | 0.226 | |

Cumulative effect size (2) | 0.217 | 0.329 | |

Cumulative (pooled) standard deviation | 0.714 | 0.931 | |

Stage-wise test statistic (1) | 2.287 | 1.769 | |

Stage-wise test statistic (2) | 2.601 | 3.512 | |

Stage-wise p-value (1) | 0.0111 | 0.0384 | |

Stage-wise p-value (2) | 0.0046 | 0.0002 | |

Adjusted stage-wise p-value (1, 2) | 0.0093 | 0.0004 | |

Adjusted stage-wise p-value (1) | 0.0111 | 0.0384 | |

Adjusted stage-wise p-value (2) | 0.0046 | 0.0002 | |

Overall adjusted test statistic (1, 2) | 2.354 | 4.015 | |

Overall adjusted test statistic (1) | 2.287 | 2.868 | |

Overall adjusted test statistic (2) | 2.601 | 4.323 | |

Test action: reject (1) | FALSE | TRUE | |

Test action: reject (2) | FALSE | TRUE | |

Conditional rejection probability (1) | 0.2359 | 0.7272 | |

Conditional rejection probability (2) | 0.2530 | 0.9870 | |

95% repeated confidence interval (1) | [-0.133; 0.514] | [-0.005; 0.441] | |

95% repeated confidence interval (2) | [-0.107; 0.542] | [0.096; 0.527] | |

Repeated p-value (1) | 0.1425 | 0.0119 | |

Repeated p-value (2) | 0.1331 | 0.0007 |

Legend:

*(i)*: results of treatment arm i vs. control arm*(i, j, …)*: comparison of treatment arms ‘i, j, …’ vs. control arm

Again referring to the overall adjusted test statistics, since \(4.015 > 2.511\), the global intersection hypothesis can be rejected in the second stage. Subsequently, the hypothesis of no effect in the first treatment group can be rejected because \(2.868 > 2.511\) and the hypothesis of no effect in the second treatment group can be rejected due to \(4.323\) exceeding the applicable critical value. Another testing approach would be to compare repeated p-values listed at the end of the output (\(0.0119, \, 0.0007\)) to the full significance level \(\alpha=0.025\).

As mentioned in the context of the simulations in chapter 5, study success can only be concluded when both treatment arms have been tested significantly against control. As this appears to be the case here, the study would have successfully finished at the second stage.

# Closing remarks

This vignette demonstrated how to implement a MAMS study design given the case the futility bounds are only known on treatment effect scale, to perform simulations with special regards to treatment selection as well as how to analyze generic data using the binary module or perform landmark analysis with the support of gestate. Especially considering the final chapter about the landmark analysis, further investigations on how the results and characteristics such as \(\alpha\)-level control and/or the power behave could be done in subsequent research.

System: rpact 4.1.0, R version 4.3.3 (2024-02-29 ucrt), platform: x86_64-w64-mingw32

To cite R in publications use:

R Core Team (2024). *R: A Language and Environment for Statistical Computing*. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

To cite package ‘rpact’ in publications use:

Wassmer G, Pahlke F (2024). *rpact: Confirmatory Adaptive Clinical Trial Design and Analysis*. R package version 4.1.0, https://www.rpact.com, https://github.com/rpact-com/rpact, https://rpact-com.github.io/rpact/, https://www.rpact.org.