Qualification of Kidney Volume as a Drug Development Tool: Part 2

FDA
Time to Event
Drug Development Tool
Author

Samer Mouksassi

Published

April 7, 2026

This is the second part of Qualification of Kidney Volume as a Drug Development Tool: Part 1. Here, I will continue covering the analysis framework and more specifically the joint modeling of longitudinal total kidney volume (TKV) and the outcomes of interest which were 30% decline of glomerular filtration rate (eGFR) or end-stage renal disease (ESRD) to ultimately qualify the model simulations as a drug development tool (DDT) for trial enrichment in patients with autosomal dominant polycystic kidney disease (ADPKD). Since the data is not public I will be using published outputs and share relevant code that can be applied in real-life analysis. For the record, here is a quote from the FDA Biomarker Qualification Review for Total Kidney Volume: “Based upon consideration of the strengths and limitations of the data, the Biomarker Qualification Review Team (BQRT) recommends that Total Kidney Volume (TKV) determined at baseline, in combination with patient age and baseline eGFR, can be qualified as a prognostic enrichment biomarker for autosomal dominant polycystic kidney disease (ADPKD) subjects at high risk for a progressive decline in renal function, defined as a confirmed 30% decline in eGFR.”

Of note that the FDA not only replicated the findings on the submitted registry data and the data generated via cross validation, but also using clinical trial data that were only available internally to the FDA.

In a first step, linear mixed-effect models with a random intercept were used to fit ln-transformed TKV values, keeping only patients with at least 2 TKV measurements separated by at least 6 months apart. Baseline TKV was defined as the first TKV measurement for a subject, whereas baseline age was the age associated with the first TKV measurement. Correlated random effects on slope and intercept was used

require(nlme)

fitLME <- lme(logtkv ~ MPYRS,  random= ~ MPYRS | UDERID, data = alltkvdata,
  control = list(msVerbose = 1, maxIter=100, msMaxIter=1000, niterEM=1000))
summary(fitLME )

####
Linear mixed-effects model fit by REML
  Data: alltkvdatajfitmean 
       AIC      BIC    logLik
  939.2572 974.1581 -463.6286

Random effects:
 Formula: ~MPYRS | UDERID
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev     Corr  
(Intercept) 0.87413534 (Intr)
MPYRS       0.04581673 -0.478
Residual    0.13535305       

Fixed effects:  logtkv ~ MPYRS 
               Value  Std.Error   DF   t-value p-value
(Intercept) 6.669682 0.03478784 1842 191.72454       0
MPYRS       0.051151 0.00234949 1842  21.77093       0
 Correlation: 
      (Intr)
MPYRS -0.413

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-5.263898609 -0.391074859 -0.006556208  0.378224483  3.912862812 

Number of Observations: 2484
Number of Groups: 641 

In a second step, a placeholder time to event model is set with baseline eGFR (EGFRBSL) and Age (AGERFST) which were deemed important in the first part analysis. The log of Total Kidney Volume will enter the model via the joint modeling link. The JM package has a function named jointModel with default link parameterization = value. Other supported possibilities are slope or both. More advanced options can be built using the derivForm argument for example the AUC of log of TKV up to the time of the event.

The baseline risk function was chosen to be piecewise linear with equally-spaced lng.in.kn knots. The number of knots was incrementally increased till the AIC started to rise and or no improvement on the goodness of fit was noticed.

require(JM)

fitSURV <- coxph(Surv(MPYRS, EVFL) ~ 1 + EGFRBSL + AGERFST 
     ,data=e30strictendpointj, x = TRUE)

fit.tkv_e30strict_allpiecewise5 <- jointModel(fitLME, fitSURV,timeVar="MPYRS",verbose=F,
                                              method ="piecewise-PH-aGH",lng.in.kn=5)
fit.tkv_e30strict_allpiecewise10 <- jointModel(fitLME, fitSURV,timeVar="MPYRS",verbose=F,
                                               method ="piecewise-PH-aGH",lng.in.kn=10)
fit.tkv_e30strict_allpiecewise12 <- jointModel(fitLME, fitSURV,timeVar="MPYRS",verbose=F,
                                               method ="piecewise-PH-aGH",lng.in.kn=12)

anova(fit.tkv_e30strict_allpiecewise10,fit.tkv_e30strict_allpiecewise12,test=F)
anova(fit.tkv_e30strict_allpiecewise12,fit.tkv_e30strict_allpiecewise15,test=F)
###
                                     AIC     BIC  log.Lik df
fit.tkv_e30strict_allpiecewise10 2367.87 2457.13 -1163.93   
fit.tkv_e30strict_allpiecewise12 2358.03 2456.21 -1157.01  2
fit.tkv_e30strict_allpiecewise15 2363.56 2475.13 -1156.78  3

summary(fit.tkv_e30strict_allpiecewise12) 
###
Call:
jointModel(lmeObject = fitLME, survObject = fitSURV, timeVar = "MPYRS", 
    method = "piecewise-PH-aGH", verbose = F, lng.in.kn = 12)

Data Descriptives:
Longitudinal Process        Event Process
Number of Observations: 2484    Number of Events: 192 (30%)
Number of Groups: 641

Joint Model Summary:
Longitudinal Process: Linear mixed-effects model
Event Process: Relative risk model with piecewise-constant
        baseline risk function
Parameterization: Time-dependent 

   log.Lik      AIC      BIC
 -1157.014 2358.028 2456.215

Variance Components:
             StdDev    Corr
(Intercept)  0.8733  (Intr)
MPYRS        0.0457 -0.4736
Residual     0.1353        

Coefficients:
Longitudinal Process
             Value Std.Err  z-value p-value
(Intercept) 6.6684  0.0348 191.8738 <0.0001
MPYRS       0.0516  0.0024  21.7459 <0.0001

Event Process
              Value Std.Err  z-value p-value
EGFRBSL      0.0101  0.0032   3.1880  0.0014
AGERFST      0.0004  0.0077   0.0519  0.9586
Assoct       0.8457  0.1204   7.0233 <0.0001
log(xi.1)  -10.9734  0.9986 -10.9884        
log(xi.2)  -10.0710  0.9932 -10.1405        
log(xi.3)   -8.9790  0.9969  -9.0065        
log(xi.4)  -10.0879  0.9980 -10.1083        
log(xi.5)   -9.8822  0.9998  -9.8845        
log(xi.6)  -10.0449  1.0076  -9.9692        
log(xi.7)   -9.0142  0.9962  -9.0486        
log(xi.8)   -9.2201  0.9981  -9.2377        
log(xi.9)   -9.6585  1.0222  -9.4486        
log(xi.10)  -9.4987  1.0476  -9.0670        
log(xi.11)  -9.6459  1.0720  -8.9978        
log(xi.12)  -9.3686  1.0457  -8.9588        
log(xi.13) -10.4570  1.1220  -9.3198        

Integration:
method: (pseudo) adaptive Gauss-Hermite
quadrature points: 3 

Optimization:
Convergence: 0 

Notice how in the output of the summary of the jointmodel the association Assoct between the predicted “value” and hazard of the event is highly significant <0.0001. To get the hazard ratio and its 95% confidence interval we exponentiate the output of confint. We can also output automated diagnostics using an automatic plot method on JM objects.

 exp (confint(fit.tkv_e30strict_allpiecewise12 ,parm="Event"))
            2.5 %     est.   97.5 %
EGFRBSL 1.0039020 1.010162 1.016460
AGERFST 0.9855089 1.000397 1.015510

par(mfrow=c(2,2))
plot(fit.tkv_e30strict_allpiecewise12,add.KM=T,ask=F)

Next, we want to retest within the joint modeling framework whether interactions between, Age, eGFR and TKV are needed which can be done using the interFact argument for example below we are testing the interaction between baseline eGFR and log of TKV value:

fitSURV <- coxph(Surv(MPYRS, EVFL) ~ 1 + EGFRBSL + AGERFST 
     ,data=e30strictendpointj, x = TRUE)

fit.tkv_e30strict_all_interactiontkvegfr <- update(fit.tkv_e30strict_allpiecewise12, 
                             control=list(optimizer="nlminb"),  verbose=T,
                             interFact =list(value=~EGFRBSL,data=e30strictendpointj)
                                           )
summary(fit.tkv_e30strict_all_interactiontkvegfr) 

The final model included interaction in the survival sub model EGFRBSL:AGERFST:

fitLME <- lme(logtkv ~ MPYRS,  random= ~ MPYRS | UDERID, data = alltkvdatajfitmean,
              control = list(msVerbose = 1, maxIter=100, msMaxIter=1000, niterEM=1000))
fitSURV <- coxph(Surv(MPYRS, EVFL) ~ 1 + EGFRBSL + AGERFST + EGFRBSL:AGERFST,
                 data=e30strictendpointj, x = TRUE)
jointfinalmodelval <- jointModel(fitLME, fitSURV,timeVar="MPYRS",verbose=F,
                                 method ="piecewise-PH-aGH",lng.in.kn=12,
                                 control=list(optimizer="nlminb")
                                 )

The model was also internally validated using standard 5-fold cross validation and the FDA took the extra step to externally validated against data I have never seen. Below I include the figures that were part of the paper:

The most exciting application of Joint Models is the capability to simulate trajectories of time and illustrate its impact on the probability of the event:

alltkvdatajfitmean[alltkvdatajfitmean $UDERID==2250,]
# conditional survival probabilities for Patient 2250  using Monte Carlo
set.seed(123) # we set the seed for reproducibility
ND <- alltkvdatajfitmean[alltkvdatajfitmean$UDERID == 2250 , ]
survPreds <- vector("list", nrow(ND))
for (i in 1:nrow(ND)) {
    set.seed(123)
    survPreds[[i]] <- survfitJM(jointfinalmodel , newdata = ND[1:i, ],idVar="UDERID")
}
# plot of the dynamic survival probabilities
par(mfrow = c(2, 2), oma = c(0, 2, 0, 2))
for (i in c(1,2,4,5)) {
    plot(survPreds[[i]], estimator = "median", conf.int = TRUE,
        include.y = TRUE,fill.area=T, main = paste("Follow-up time:",
            round(survPreds[[i]]$last.time, 1)))
arrows(ND$LASTTIMESEEN[1], 1, ND$LASTTIMESEEN[1], 0.9, col= "red",length=0.1)
}
mtext("log Kidney Volume", side = 2, line = -1, outer = TRUE)
mtext("Survival Probability", side = 4, line = -1, outer = TRUE)

We can also simulate clinical trial inclusion criteria and see how these patients TKV trajectories and probabilities of events can be projected:

To conclude this blog I wanted to redo the dynamic predictions using ggplot2 and gganimate

Code
library(gganimate)

set.seed(123) # we set the seed for reproducibility
ND <- alltkvdatajfitmean[alltkvdatajfitmean$UDERID == 1557 , ]
e30strictendpointj[e30strictendpointj$UDERID == 1557 , ]
survPreds <- vector("list", nrow(ND))
for (i in 1:nrow(ND)) {
  set.seed(123)
  survPreds[[i]] <- survfitJM(jointfinalmodel , newdata = ND[1:i, ],idVar="UDERID")
}

survPreds1 <- as.data.frame(survfitJM(
  jointfinalmodel ,
  newdata = ND[1:1, ],idVar="UDERID")$summaries)
survPreds2 <- as.data.frame(survfitJM(
  jointfinalmodel ,
  newdata = ND[1:2, ],idVar="UDERID")$summaries)
survPreds3 <- as.data.frame(survfitJM(
  jointfinalmodel ,
  newdata = ND[1:3, ],idVar="UDERID")$summaries)



survPreds1$frame <- 1
survPreds2$frame <- 2
survPreds3$frame <- 3
survPredsall <- rbind(survPreds1,survPreds2,survPreds3)
names(survPredsall)<- c("times","Mean","Median","Lower","Upper","frame")
ggplot(survPredsall,aes(times,Median))+
  geom_line()+
  facet_grid(~frame)

survPredsall$facet <- "Survival"

NDobserved <- rbind(ND[1:1,c("MPYRS","logtkv")],
                    ND[1:2,c("MPYRS","logtkv")],
                    ND[1:3,c("MPYRS","logtkv")])

NDobserved$frame <- c(1,2,2,3,3,3)
NDobserved$facet <- "Total Kidney Volume"

fitteddata1<-  data.frame( 
  times=  as.vector(unlist(survPreds[[1]]$fitted.times))[1],
  fittedy= as.vector(unlist(survPreds[[1]]$fitted.y))[1],
  logtkv= ND[1:1,"logtkv"]
)
fitteddata1$frame<-1

fitteddata2<-  data.frame( 
  times=  as.vector(unlist(survPreds[[2]]$fitted.times))[1:2],
  fittedy= as.vector(unlist(survPreds[[2]]$fitted.y))[1:2],
  logtkv= ND[1:2,"logtkv"]
)
fitteddata2$frame<-2

fitteddata3<-  data.frame( 
  times=  as.vector(unlist(survPreds[[3]]$fitted.times))[1:3],
  fittedy= as.vector(unlist(survPreds[[3]]$fitted.y))[1:3],
  logtkv= ND[1:3,"logtkv"]
)
fitteddata3$frame<-3

fitteddata<- rbind(fitteddata1,fitteddata2,fitteddata3)
fitteddata$facet <- "Total Kidney Volume"
fitteddata$logtkv <- NDobserved$logtkv

fulljoineddf <- full_join(survPredsall,fitteddata)


ggplot(fulljoineddf , aes(times, Median)) +
  geom_point(aes(y = exp(fittedy)),size=3,shape="triangle") +
  geom_line( aes(y = exp(fittedy)) , size=1, linetype="dashed") +
  geom_point(aes(y=exp(logtkv)),size=2, color="red")+
  geom_ribbon(aes(x=times,y=Median,ymin=Lower,ymax=Upper),alpha=0.25)+
  geom_line(  aes(x=times,y=Median),size=1.5)+
  facet_grid(facet~frame,scales = "free_y",switch="y",as.table=FALSE)+
  labs(y= "", x = "Time of Follow up (years)")+
  theme_bw()+
  theme(legend.position = "none",strip.placement = "outside",
        axis.title.y.left = element_blank())



ggplot(fulljoineddf , aes(times, Median)) +
  geom_point(aes(y = exp(fittedy)),size=3,shape="triangle") +
  geom_line( aes(y = exp(fittedy)) , size=1, linetype="dashed") +
  geom_point(aes(y=exp(logtkv)),size=2, color="red")+
  geom_ribbon(aes(x=times,y=Median,ymin=Lower,ymax=Upper),alpha=0.25)+
  geom_line(  aes(x=times,y=Median),size=1.5)+
  facet_grid(facet~.,scales = "free_y",switch="y",as.table=FALSE)+
  labs(y= "", x = "Time of Follow up (years)")+
  theme_bw()+
  theme(legend.position = "none",strip.placement = "outside",
        axis.title.y.left = element_blank())+
  transition_states(frame, transition_length = 3,
                    state_length = 1,
                    wrap = FALSE)