Double-Blinded External Evaluation of Predictive Performance: pyDarwin vs. Human Expert in a Paediatric Tobramycin Population PK Model Development

Introduction: Paediatric population pharmacokinetic (PopPK) model development is complex due to physiological changes occurring throughout growth and maturation. pyDarwin, an open-source Python package, integrates machine learning with NONMEM for model selection, but its capability in handling complex paediatrics PopPK model selection remains unvalidated. A double-blinded external comparison of the pyDarwin-selected and manually-developed models is necessary to objectively assess the utility of machine-learning-assisted model selection.

Aims: To assess the performance of pyDarwin in complex PopPK model development with paediatric tobramycin PK as a motivating example.

Methods: The model-building dataset included 442 plasma tobramycin concentrations from 63 children with cystic fibrosis, obtained from a prospective audit at the Royal Children’s Hospital and Children’s Hospital Westmead. Manual Group (KH and SD) manually developed a model using stepwise selection in NONMEM 7.5.1, while pyDarwin Group (WY and XZ), in a blinded manner, used pyDarwin for automated model selection based on a predefined study analysis plan to ensure comparable model exploration steps. A twenty-dimensional search space was defined in the token file, including the number of compartments, three candidate residual error models, inter-individual error models, and candidate relationships between the maturation function, body weight, fat-free mass, renal function, concomitant medications, and sex on clearance (CL), etc. 500 posterior predictive check simulations were performed after each run to determine a penalty of 3, which was applied for each percentage difference between the “observed” AUC and simulated AUC. External validation was conducted using an independent dataset of 1177 concentrations from 147 children. The manually developed and pyDarwin-selected models were compared based on Akaike Information Criterion (AIC), goodness-of-fitness (GOF) plots, normalized prediction distribution errors (NPDE), visual predictive checks (VPC), and predictive performance metrics.

Results: Both models had a maturation function with fixed coefficients [1] and incorporated eGFR as a covariate on CL and weight on volume (V). The manually developed model estimated the allometric scaling exponents and pyDarwin used fixed constants. The only structural difference was that the manually developed model was a one-compartment, while pyDarwin-selected a two-compartment model. Internal validation showed that both models demonstrated good fit to the model-building dataset, with the pyDarwin-selected model achieving a lower AIC (1465.7 vs. 1392.8). External validation diagnostic plots indicated that pyDarwin-selected model exhibited better predictive performance with lower predictive bias and higher precision, as reflected in the predictive performance metrics (Table 1).

Conclusions: pyDarwin demonstrated satisfactory capability in PopPK model selection, even for sophisticated paediatric PK model development. This highlights its potential as an efficient and effective tool for routine PK model selection.

References

[1] Holford N, Heo YA, Anderson B. A pharmacokinetic standard for babies and adults. J Pharm Sci. 2013;102(9):2941-2952.