Comparison of different diagnostic criterion to evaluate nonlinear mixed-effects models

Background: The use of conditional weighted residual (CWRES) and post-hoc ETA (ηph) distribution properties, instead of weighted residual (WRES) distribution properties, as model diagnostic tools when using the first-order conditional (FOCE) algorithm in NONMEM, may give a more accurate picture of if and when a model is miss-specified.

Aim: A simulation study was performed to compare these methods of model diagnosis, in terms of their type I error and power.

Methods: Pharmacokinetic (PK) and pharmacodynamic (PD) data sets were generated under rich and sparser sampling conditions.  Multiple simulations were then performed on these data sets 1) under the basic model (with random variability in parameters and model error centered around zero) to estimate type 1 error, and 2) under several alternative models (with altered parameters or variability) to estimate power.  CWRES, WRES and ηphvalues were calculated from simulated and model predicted values.  Normality, mean and variance testing were then performed to investigate the distribution properties of each diagnostic and to compare this to the expected distribution.

Results: Based on 1000 simulations, type 1 error associated with the CWRES and ηph distributions was close to 5% when testing for the expected distribution shape, mean and variance under rich sampling situations.  In contrast, WRES displayed very large type 1 error, indicating problems with use of the FO algorithm in its calculation when modeling under a FOCE method.  Sparser sampling, and hence ‘shrinkage’, increased type 1 error associated with the CWRES distribution when testing for the expected distribution mean, and increased type 1 error associated with the ηph distribution when testing for the expected distribution variance.  Distinctive patterns were evident as to when CWRES and ηph distribution testing was able to detect model misspecification, following 100 under various alternative models.  When a fixed effect parameter was miss-specified this was usually detectable through a shift in the mean of the ηph distribution associated with that changed parameter.  When model between subject variability was miss-specified this was generally detectable through a shift in the variance of the ηph distribution associated with the changed error.  When model residual random error was miss-specified this was detectable through a shift in the variance of the CWRES distribution.  WRES distribution testing showed no pattern, and often poor power, in its ability to detect model misspecification.

Conclusions: Overall CWRES and ηph distribution testing displayed lower type 1 error and greater power than WRES distribution testing, to evaluate model adequacy when performing NONMEM analysis under the FOCE algorithm.  Both the CWRES and ηph distribution should be examined when the source of misspecification is unknown.

Christine Staatz