Objectives:
Compare two alternative approaches to parameter estimation using the first-order conditional estimation with interaction (FOCE-I) method in mixed-effect non-linear regression for efficiency and robustness.
Methods:
Typically, in the FOCE-I, finite difference (FD) is used to approximate the gradient. FD is a numerical approximation to the gradient, accomplished by calculating the objective function value at multiple points. Almquist [1] first described using sensitivity equations (SE) to calculate the gradient in FOCE-I and reported improved speed in NONMEM. An alternative to SE is automatic differentiation using dual numbers (AD). This method is widely used in training neural networks. We have implemented AD for parameter optimization (ADPO). We present a comparison of the effect of these two exact methods of calculating the gradient in FOCE-I. The NONMEM FAST option vs NONMEM without FAST was used to evaluate the effect of SE and NLME with ADPO vs NLME without ADPO was used to evaluate the effect of ADPO.
72 models were constructed using pyDarwin. The models included 1,2 and 3 compartments, with Michaelis-Menten elimination, and were specified by ordinary differential equations (ODE). Other model options were an estimated exponent on Km in the Michaelis-Menten expression, the central volume independent of or an allometric function of weight, and various OMEGA structures, up to 4 diagonal elements and various non-diagonal structures. All models included first-order absorption. The data set included 50 subjects with 8 samples each. The DVERK ODE solver was used for all evaluations. The data set construction and parameter values were designed such that the models ranged from well-identified to poorly-identified, as an attempt to represent the real world model selection process.
The primary metrics of algorithm performance are execution time for estimation and the covariance step. Secondary endpoints include the fraction that had successful convergence and the fraction with successful covariance step.
Results:
Figure 1 shows execution time of FD vs SE in NONMEM vs the number of THETAs. Only models that completed in < 6 hours without unrecoverable numerical problems were included.
Figure 2 shows execution time of FD vs AD in NLME vs the number of THETAs.
Conclusions:
In this limited sample of 1, 2 and 3 compartment ODE models across a range of model complexity, both SE and ADPO improve the execution speed of the covariance step by approximately 3 fold (~66% reduction in time). ADPO also improved the execution time for estimation by approximately 3 fold. SE may have improved the likelihood of successful convergence but not of a successful covariance step. ADPO had no effect on the likelihood of successful convergence or of a successful covariance step.
1. Almquist J, et al Using sensitivity equations for computing gradients of the FOCE and FOCEI approximations to the population likelihood. J Pharmacokinet Pharmacodyn. 2015 Jun;42(3):191-209. doi: 10.1007/s10928-015-9409-1. Epub 2015 Mar 24. PMID: 25801663; PMCID: PMC4432110.