Visualizing Time to Event Response

news
visualization
Author

Samer Mouksassi

Published

August 25, 2025

In this post, I will continue with the exposure response theme. The previous blog post focused on Binary response outcomes. In today’s post I will illustrate how to explore and communicate time to event outcomes. First we start using a specialized geom from my ggquickeda package: geom_km(). Since this is a native ggplot2 geom it supports, aesthetics, faceting and margins, themings and anything else you expect from a ggplot2 object. Later on I cover the usage of the specialized ggkmrisktable function that support adding number at risk tables and more.

Code
library(ggplot2)
library(patchwork)
library(tidyr)
library(dplyr)
library(ggquickeda)
library(scales)

 options(ggplot2.discrete.colour =
           list(c( "#4682AC","#FDBB2F","#EE3124" ,"#336343","#7059a6", "#803333")))
  options(ggplot2.discrete.fill =
           list(c( "#4682AC","#FDBB2F","#EE3124" ,"#336343","#7059a6", "#803333"))) 
  
lung_long <-  survival::lung |>
 mutate(status = ifelse(status==1,0,1)) |>
 gather(Endpoint,DV,status) |>
 filter(!is.na(ph.karno),!is.na(pat.karno),!is.na(ph.ecog))|>
 mutate(ph.ecog = ifelse(ph.ecog>1,">1","1"))

ggplot(lung_long, aes(
                 time = time,
               status = DV,
               color  = ph.ecog,
               linetype = ph.ecog,
               fill = ph.ecog)) + 
  geom_kmband()+
  geom_km() +
  facet_grid(~ph.ecog,margin="ph.ecog",
             labeller = label_both)+
  theme_bw(base_size = 16) +
  labs(x="time of follow up",y="Survival")

This is a nice Kaplan-Meir plot that shows the K-M estimate by ecog status and both overlaid in the facet margin (all) panel. To include the effects of continuous variables, we will need to “bin” it. I will use age as a example predictor binning it by tertiles. Some nice features are outlined below:

Code
exposure_metric_plac_value <- 0

lung_long_metrics <- lung_long %>% 
  gather(expname,expvalue,age) |>
  group_by(Endpoint,expname) |>
  mutate( Q33 = quantile(expvalue[!expvalue %in% c(exposure_metric_plac_value)], 1/3, na.rm = TRUE), 
          Q66 = quantile(expvalue[!expvalue %in% c(exposure_metric_plac_value)], 2/3, na.rm = TRUE),
          exptile = case_when(
            expvalue == exposure_metric_plac_value ~ "zero", 
            expvalue > exposure_metric_plac_value & expvalue <= Q33 ~ "T1",
            expvalue > Q33 & expvalue <= Q66 ~  "T2",
            expvalue > Q66 ~ "T3")
  )
library(ggh4x)
ggplot(lung_long_metrics, aes(
                 time = time,
               status = DV,
               color  = exptile,
               linetype = ph.ecog,
               fill = exptile)) + 
  geom_km() +
  facet_nested( ph.ecog~ expname+ exptile,
               labeller = label_both,margin = "exptile")+
  theme_bw(base_size = 12) +
  labs(x="time of follow up",y="Survival")

Code
lung_long_metrics$exptile2 <- lung_long_metrics$exptile

ggplot(lung_long_metrics, aes(
               time = time,
               status = DV,
               color  = exptile,
               linetype = ph.ecog,
               fill = exptile)) + 
  geom_km() +
  facet_nested( ph.ecog ~ expname + exptile2,
               labeller = label_both, margins = "exptile2")+
  theme_bw(base_size = 12) +
  labs(x="time of follow up",y="Survival")

It is common for these type of plots to present a transformation of the default which is the probability of not having the event . For example, changing it to cumulative hazard, or to event probability. It is also desirable to have a number at risk table underneath the plot. ggkmrisktable was written to facilitate and automate these tasks with several flexible options. First, we try to reproduce the last two plot. Notice how exptile2 is available automatically to enable different margins.

Code
ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  linetype = "ph.ecog",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = TRUE, km_band = FALSE,
  nrisk_table_variables = c("n.risk")
)+
  facet_grid(ph.ecog~exptile,margin = "exptile")

Code
ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  linetype = "ph.ecog",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = TRUE, km_band = FALSE,
  nrisk_table_variables = c("n.risk")
)+
  facet_grid(ph.ecog~exptile2,margin = "exptile2")

The plot above is busy and need to be simplified. Next, I show how to transform the event and I facet by ecog status.

Code
ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  linetype = "ph.ecog",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  show_exptile_values = FALSE,
  color_fill = "exptile",
  xlab = "time of follow up",
  ylab = "Survival",
  nrisk_table_plot = TRUE,
  nrisk_table_variables = c("n.risk","n.censor"),
  nrisk_position_scaler = 0.2,
  nrisk_position_dodge = 0.2,
  nrisk_offset = 0,
  km_trans = c("event"),
  km_ticks = FALSE, km_band = FALSE,
  km_median_table_order = c("default"),
  facet_formula = ~ph.ecog)+
  facet_nested( expname~ph.ecog,labeller = label_both)+
  labs(x="time of follow up",y="Survival")

The use of additional options like showing the tertiles cutoffs, the median survival, controlling the order and the interval at which numbers at risk appear is shown next.

Code
ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  show_exptile_values = TRUE,
  show_exptile_values_order = "default",
  km_median ="table",
  km_median_table_pos ="right",
  km_median_table_order = "default",
  color_fill = "exptile",
  xlab = "time of follow up",
  ylab = "Survival",
  nrisk_table_plot = TRUE,
  nrisk_table_variables = c("n.risk","n.censor"),
  nrisk_position_scaler = 0.2,
  nrisk_position_dodge = - 0.2,
  nrisk_filterout0 = TRUE,
  nrisk_table_breaktimeby = 50,
  nrisk_offset = 0,
  km_ticks = FALSE, km_band = TRUE,
  km_trans = "event",
  facet_formula = ~expname)

While the function supports computing and printing p-values, I strongly, recommend that one run the statistical tests outside of the function. Make sure to document what test was done, on which data. Then we included the appropriate information using regular text/lable ggplot2 layer. An example, showing how to do an LRT on a cox fit is included.

Code
ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "exptile",
  linetype = "exptile", groupvar1 = "none",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = FALSE,
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = TRUE,
  facet_formula = expname ~ Endpoint )

Code
ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "ph.ecog",
  linetype = "none", groupvar1 = "none",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = FALSE,
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = TRUE,
  km_logrank_pvalue_cutoff = 0.001,
  facet_formula = ~Endpoint)

Code
ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "exptile",
  linetype = "exptile", groupvar1 = "ph.ecog",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = FALSE,
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = TRUE,
  facet_formula = ~ph.ecog)

One would expect that the p-values shown in each panel are specific for the subset of panel data comparing the curves. While the case of single split, are clear, things get out of control fast when we have multiple splits/predictors and then deciding on what particular test we really want. Is it overall on all data ar within a specific strata/subset?

Code
library(survival)
#| message: false
#| warning: false
#| paged-print: true

# overall 
survdiffexptile <- survdiff(Surv(time, DV) ~ exptile,
         data=lung_long_metrics)
pvalueexptile <- pchisq(survdiffexptile$chisq, length(survdiffexptile$n)-1, lower.tail = FALSE)
pvalueexptile
[1] 0.2132509
Code
survdiffph.ecog <- survdiff(Surv(time, DV) ~ ph.ecog,
         data=lung_long_metrics)
pvalueph.ecog <- pchisq(survdiffph.ecog$chisq, length(survdiffph.ecog$n)-1, lower.tail = FALSE)


survdiffexptileecoggt1 <- survdiff(Surv(time, DV) ~ exptile,
         data=lung_long_metrics %>% 
           filter(ph.ecog!="1"))
pvalueexptileecoggt1 <- pchisq(survdiffexptileecoggt1$chisq, length(survdiffexptileecoggt1$n)-1, lower.tail = FALSE)
pvalueexptileecoggt1
[1] 0.5051531
Code
survdiffexptileecogeq1 <- survdiff(Surv(time, DV) ~ exptile,
         data=lung_long_metrics %>% 
           filter(ph.ecog=="1"))
pvalueexptileecogeq1 <- pchisq(survdiffexptileecogeq1$chisq, length(survdiffexptileecogeq1$n)-1, lower.tail = FALSE)
pvalueexptileecogeq1
[1] 0.5783551

A global test across multiple variables that enter the survival equation, with or without interaction can be needed. For these, Idefinitely recommend to avoid using the automatic built-in p-values. The ggkmrisktable has groupvar1, groupvar2, and groupvar3 variables to give the user more control on how to “group” to test p-values but possibilities remain limited as compared to controlling the test with user-written code.

Code
ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "exptile",
  linetype = "ph.ecog", groupvar1 = "none",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = FALSE,
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = TRUE,
  facet_formula = ~Endpoint)

Code
survdiffph.ecogexptile <- survdiff(Surv(time, DV) ~ ph.ecog+exptile, data=lung_long_metrics)
pvalueph.ecogexptile <- pchisq(survdiffph.ecogexptile$chisq, length(survdiffph.ecogexptile$n)-1, lower.tail = FALSE)
pvalueph.ecogexptile
[1] 0.002126178
Code
rmsmodel1 <- coxph(Surv(time, DV) ~ 1, data       = lung_long_metrics)
rmsmodel2 <- coxph(Surv(time, DV) ~ ph.ecog*exptile, data = lung_long_metrics)
#anova(rmsmodel1,rmsmodel2)

andata <- anova(rmsmodel1,rmsmodel2)


pvalueinformation = data.frame(p=andata[2,4],
                               Endpoint="status",
                               Model="Surv(time, DV) ~ ph.ecog*age(tertiles)",
                               test="LRT")


ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "exptile",
  linetype = "ph.ecog", groupvar1 = "none",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = FALSE,
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = FALSE,
  facet_formula = ~Endpoint)+
  geom_text(data=pvalueinformation,
            aes(x=1000,y=0.2,label=paste0("Model: ",Model,"\n",
                                         test," test vs null model: ", scales::pvalue( p,add_p = TRUE,accuracy = 0.0001))),inherit.aes = FALSE,
            hjust=1)

Code
ggkmrisktable(
  data = lung_long,
  time = "time",
  status = "DV",
  endpoint = "Endpoint",
  color_fill = "exptile",
  linetype = "ph.ecog", groupvar1 = "none",
  groupvar2 = "none",groupvar3 = "none",
  exposure_metrics = c("age"),
  exposure_metric_split = c("tertile"),
  nrisk_table_plot = TRUE,nrisk_table_variables = c("n.risk"),
  km_ticks = FALSE, km_band = FALSE,
  km_trans = "event",
  km_logrank_pvalue = FALSE,
  facet_formula = ~Endpoint)+
  facet_grid(expname~ph.ecog)+
  geom_text(data=pvalueinformation,
            aes(x=1000,y=0.2,label=paste0("Model: ",Model,"\n",
                                         test," test vs null model: ", scales::pvalue( p,add_p = TRUE,accuracy = 0.0001))),inherit.aes = FALSE,
            hjust=1)

In this post, I introduced some of the features available in ggkmrisktable. Since this is a regular ggplot2 plot the possibilities are limitless.

If you have any questions feel free to contact me on github.