Visualizing Time to Event Response – smouksassi.github.io

In this post, I will continue with the exposure response theme. The previous blog post focused on Binary response outcomes. In today’s post I will illustrate how to explore and communicate time to event outcomes. First we start using a specialized geom from my ggquickeda package: geom_km(). Since this is a native ggplot2 geom it supports, aesthetics, faceting and margins, themings and anything else you expect from a ggplot2 object. Later on I cover the usage of the specialized ggkmrisktable function that support adding number at risk tables and more.

Code

library(ggplot2)
library(patchwork)
library(tidyr)
library(dplyr)
library(ggquickeda)
library(scales)

 options(ggplot2.discrete.colour =
           list(c( "#4682AC","#FDBB2F","#EE3124" ,"#336343","#7059a6", "#803333")))
  options(ggplot2.discrete.fill =
           list(c( "#4682AC","#FDBB2F","#EE3124" ,"#336343","#7059a6", "#803333"))) 
  
lung_long <-  survival::lung |>
 mutate(status = ifelse(status==1,0,1)) |>
 gather(Endpoint,DV,status) |>
 filter(!is.na(ph.karno),!is.na(pat.karno),!is.na(ph.ecog))|>
 mutate(ph.ecog = ifelse(ph.ecog>1,">1","1"))

ggplot(lung_long, aes(
                 time = time,
               status = DV,
               color  = ph.ecog,
               linetype = ph.ecog,
               fill = ph.ecog)) + 
  geom_kmband()+
  geom_km() +
  facet_grid(~ph.ecog,margin="ph.ecog",
             labeller = label_both)+
  theme_bw(base_size = 16) +
  labs(x="time of follow up",y="Survival")

This is a nice Kaplan-Meir plot that shows the K-M estimate by ecog status and both overlaid in the facet margin (all) panel. To include the effects of continuous variables, we will need to “bin” it. I will use age as a example predictor binning it by tertiles. Some nice features are outlined below:

This is a ggplot object and we can facet by any variable available in the dataset. Furthermore, I will illustrate the usage of the nice facet_nested from the ggh4x package to group facet strips that belong to the same level.
The plot is done in two ways using exptile as color and faceting variable which result in a facet margin (all) having the pooled estimate (all tertiles combined). Then, using exptile as color and faceting by an exact copy of it that we call exptile2. This tricks ggplot2 margins to overlay the three tertiles curves in the (all) panels.

Code

exposure_metric_plac_value <- 0

lung_long_metrics <- lung_long %>% 
  gather(expname,expvalue,age) |>
  group_by(Endpoint,expname) |>
  mutate( Q33 = quantile(expvalue[!expvalue %in% c(exposure_metric_plac_value)], 1/3, na.rm = TRUE), 
          Q66 = quantile(expvalue[!expvalue %in% c(exposure_metric_plac_value)], 2/3, na.rm = TRUE),
          exptile = case_when(
            expvalue == exposure_metric_plac_value ~ "zero", 
            expvalue > exposure_metric_plac_value & expvalue <= Q33 ~ "T1",
            expvalue > Q33 & expvalue <= Q66 ~  "T2",
            expvalue > Q66 ~ "T3")
  )
library(ggh4x)
ggplot(lung_long_metrics, aes(
                 time = time,
               status = DV,
               color  = exptile,
               linetype = ph.ecog,
               fill = exptile)) + 
  geom_km() +
  facet_nested( ph.ecog~ expname+ exptile,
               labeller = label_both,margin = "exptile")+
  theme_bw(base_size = 12) +
  labs(x="time of follow up",y="Survival")

Code

lung_long_metrics$exptile2 <- lung_long_metrics$exptile

ggplot(lung_long_metrics, aes(
               time = time,
               status = DV,
               color  = exptile,
               linetype = ph.ecog,
               fill = exptile)) + 
  geom_km() +
  facet_nested( ph.ecog ~ expname + exptile2,
               labeller = label_both, margins = "exptile2")+
  theme_bw(base_size = 12) +
  labs(x="time of follow up",y="Survival")

It is common for these type of plots to present a transformation of the default which is the probability of not having the event . For example, changing it to cumulative hazard, or to event probability. It is also desirable to have a number at risk table underneath the plot. ggkmrisktable was written to facilitate and automate these tasks with several flexible options. First, we try to reproduce the last two plot. Notice how exptile2 is available automatically to enable different margins.

Code

ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  linetype = "ph.ecog",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = TRUE, km_band = FALSE,
  nrisk_table_variables = c("n.risk")
)+
  facet_grid(ph.ecog~exptile,margin = "exptile")

Code

ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  linetype = "ph.ecog",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = TRUE, km_band = FALSE,
  nrisk_table_variables = c("n.risk")
)+
  facet_grid(ph.ecog~exptile2,margin = "exptile2")

The plot above is busy and need to be simplified. Next, I show how to transform the event and I facet by ecog status.

Code

ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  linetype = "ph.ecog",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  show_exptile_values = FALSE,
  color_fill = "exptile",
  xlab = "time of follow up",
  ylab = "Survival",
  nrisk_table_plot = TRUE,
  nrisk_table_variables = c("n.risk","n.censor"),
  nrisk_position_scaler = 0.2,
  nrisk_position_dodge = 0.2,
  nrisk_offset = 0,
  km_trans = c("event"),
  km_ticks = FALSE, km_band = FALSE,
  km_median_table_order = c("default"),
  facet_formula = ~ph.ecog)+
  facet_nested( expname~ph.ecog,labeller = label_both)+
  labs(x="time of follow up",y="Survival")

The use of additional options like showing the tertiles cutoffs, the median survival, controlling the order and the interval at which numbers at risk appear is shown next.

Code

ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  show_exptile_values = TRUE,
  show_exptile_values_order = "default",
  km_median ="table",
  km_median_table_pos ="right",
  km_median_table_order = "default",
  color_fill = "exptile",
  xlab = "time of follow up",
  ylab = "Survival",
  nrisk_table_plot = TRUE,
  nrisk_table_variables = c("n.risk","n.censor"),
  nrisk_position_scaler = 0.2,
  nrisk_position_dodge = - 0.2,
  nrisk_filterout0 = TRUE,
  nrisk_table_breaktimeby = 50,
  nrisk_offset = 0,
  km_ticks = FALSE, km_band = TRUE,
  km_trans = "event",
  facet_formula = ~expname)

While the function supports computing and printing p-values, I strongly, recommend that one run the statistical tests outside of the function. Make sure to document what test was done, on which data. Then we included the appropriate information using regular text/lable ggplot2 layer. An example, showing how to do an LRT on a cox fit is included.

Code

ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "exptile",
  linetype = "exptile", groupvar1 = "none",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = FALSE,
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = TRUE,
  facet_formula = expname ~ Endpoint )

Code

ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "ph.ecog",
  linetype = "none", groupvar1 = "none",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = FALSE,
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = TRUE,
  km_logrank_pvalue_cutoff = 0.001,
  facet_formula = ~Endpoint)

Code

ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "exptile",
  linetype = "exptile", groupvar1 = "ph.ecog",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = FALSE,
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = TRUE,
  facet_formula = ~ph.ecog)

One would expect that the p-values shown in each panel are specific for the subset of panel data comparing the curves. While the case of single split, are clear, things get out of control fast when we have multiple splits/predictors and then deciding on what particular test we really want. Is it overall on all data ar within a specific strata/subset?

Code

library(survival)
#| message: false
#| warning: false
#| paged-print: true

# overall 
survdiffexptile <- survdiff(Surv(time, DV) ~ exptile,
         data=lung_long_metrics)
pvalueexptile <- pchisq(survdiffexptile$chisq, length(survdiffexptile$n)-1, lower.tail = FALSE)
pvalueexptile

[1] 0.2132509

Code

survdiffph.ecog <- survdiff(Surv(time, DV) ~ ph.ecog,
         data=lung_long_metrics)
pvalueph.ecog <- pchisq(survdiffph.ecog$chisq, length(survdiffph.ecog$n)-1, lower.tail = FALSE)


survdiffexptileecoggt1 <- survdiff(Surv(time, DV) ~ exptile,
         data=lung_long_metrics %>% 
           filter(ph.ecog!="1"))
pvalueexptileecoggt1 <- pchisq(survdiffexptileecoggt1$chisq, length(survdiffexptileecoggt1$n)-1, lower.tail = FALSE)
pvalueexptileecoggt1

[1] 0.5051531

Code

survdiffexptileecogeq1 <- survdiff(Surv(time, DV) ~ exptile,
         data=lung_long_metrics %>% 
           filter(ph.ecog=="1"))
pvalueexptileecogeq1 <- pchisq(survdiffexptileecogeq1$chisq, length(survdiffexptileecogeq1$n)-1, lower.tail = FALSE)
pvalueexptileecogeq1

[1] 0.5783551

A global test across multiple variables that enter the survival equation, with or without interaction can be needed. For these, Idefinitely recommend to avoid using the automatic built-in p-values. The ggkmrisktable has groupvar1, groupvar2, and groupvar3 variables to give the user more control on how to “group” to test p-values but possibilities remain limited as compared to controlling the test with user-written code.

Code

ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "exptile",
  linetype = "ph.ecog", groupvar1 = "none",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = FALSE,
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = TRUE,
  facet_formula = ~Endpoint)

Code

survdiffph.ecogexptile <- survdiff(Surv(time, DV) ~ ph.ecog+exptile, data=lung_long_metrics)
pvalueph.ecogexptile <- pchisq(survdiffph.ecogexptile$chisq, length(survdiffph.ecogexptile$n)-1, lower.tail = FALSE)
pvalueph.ecogexptile

[1] 0.002126178

Code

rmsmodel1 <- coxph(Surv(time, DV) ~ 1, data       = lung_long_metrics)
rmsmodel2 <- coxph(Surv(time, DV) ~ ph.ecog*exptile, data = lung_long_metrics)
#anova(rmsmodel1,rmsmodel2)

andata <- anova(rmsmodel1,rmsmodel2)


pvalueinformation = data.frame(p=andata[2,4],
                               Endpoint="status",
                               Model="Surv(time, DV) ~ ph.ecog*age(tertiles)",
                               test="LRT")


ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "exptile",
  linetype = "ph.ecog", groupvar1 = "none",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = FALSE,
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = FALSE,
  facet_formula = ~Endpoint)+
  geom_text(data=pvalueinformation,
            aes(x=1000,y=0.2,label=paste0("Model: ",Model,"\n",
                                         test," test vs null model: ", scales::pvalue( p,add_p = TRUE,accuracy = 0.0001))),inherit.aes = FALSE,
            hjust=1)

Code

ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "exptile",
  linetype = "ph.ecog", groupvar1 = "none",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = TRUE,nrisk_table_variables = c("n.risk"),
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = FALSE,
  facet_formula = ~Endpoint)+
  facet_grid(expname~ph.ecog)+
  geom_text(data=pvalueinformation,
            aes(x=1000,y=0.2,label=paste0("Model: ",Model,"\n",
                                         test," test vs null model: ", scales::pvalue( p,add_p = TRUE,accuracy = 0.0001))),inherit.aes = FALSE,
            hjust=1)

In this post, I introduced some of the features available in ggkmrisktable. Since this is a regular ggplot2 plot the possibilities are limitless.

If you have any questions feel free to contact me on github.