A bounded integer model for rating and composite scale data

Background: The number of possible score values (N) in rating and composite scale data may span from a few to >100. When modeling such data, two main strategies are typically used for the baseline model: (i) the probability of each score is estimated, leading to an estimation of N-1 baseline parameters, or (ii) treating the data as continuous and estimating an average baseline with variability. The former will be increasingly data demanding as the number of categories increase and the latter will, by treating the data as continuous, make assumptions that violate the nature of the data, especially if much data are observed at the extremes of the scoring scale.

Objective: To explore a new model for rating and composite scale data that reduces the number of baseline parameters while treating the data as bounded integers.

Methods: For an N-category variable, the N-1 quantiles of the standard normal distribution (Z1/N to Z(N-1)/N) are identified. A function of fixed and random effects, time and covariates (f(Q,h,t,X)) describing a latent variable is used in conjunction with an estimated variability (g(s,h,t,X)) and Z-value quantiles to predict the probability of each score using probit functions. The new modeling approach, called “bounded integer (BI) model” was implemented for two data sets based on 11-category Likert pain scores (1,2) and 109-category UPDRS scores (3) and compared to previously published models.

Results: For the Likert data, the final bounded integer model (OFV=47492; 14 estimated parameters), and a smaller version (53135; 9), performed better than previously published models based on treatment of the data as ordered categorical (48902; 18) and as continuous (55080; 9). All these models had elements for serial correlation (Markov or AR1). Also without serial correlation was the goodness-of-fit best for the bounded integer model. For the UPDRS data, the final bounded integer model performed better (5564; 16) than the continuous data model (5681; 16) even though no data were close to a boundary. The bounded integer model results in for both Likert and UPDRS data were reasonable with respect to parameter values, predictions, residuals and simulations.

Discussion: The bounded integer model offers flexibility and parsimony in data description similar to the continuous model, yet respecting the score boundaries and integer nature of data similar to the ordered categorical model. It is therefore not surprising that it can perform better than both standard model types.

References: (1) Plan et al. Clin Pharmacol Ther. 2012 May;91(5):820-8. (2) Schindler & Karlsson. AAPS J. 2017 Jun 20. (3) Troconiz et al. Clin Pharmacol Ther. 1998 Jul;64(1):106-16

Mats Karlsson

  • Uppsala university