How item response theory can help you take patient insights even further
- Acaster Lloyd

- Feb 18
- 3 min read
Updated: May 21

For many, item response theory (IRT) is a tool for selecting items for a new patient-reported outcome (PRO) or producing a short-form PRO of an existing tool. Whilst not inaccurate, there is a whole world of other applications of IRT that may be less familiar. So, let’s look at the wider benefits of an IRT approach.
What is IRT?
IRT is a broad group of psychometric methods that can be used in a variety of ways to analyse PRO data. It is distinguished from the other major group of psychometric methods, classical test theory (CTT), by its focus on modelling item responding – specifically, how probable it is that a respondent will answer in a particular way, given their level of the PRO being measured.
IRT models differ in their level of complexity but can include parameter estimates that represent how severe an item is (difficulty), how well it tells responders apart (discrimination), guessing, and upper limits to probability. IRT models can be applied to dichotomous or polytomous response scales. When using IRT models, a first task is to understand which model best characterises patients’ responses to the measure. Once a model is selected, the parameter estimates and predicted values from the model can be used for further insights.

Back to shortening a scale
A typical application of IRT is to create short forms of longer scales. With this goal in mind, you might run an IRT model, select items based on the item parameters – for example, selecting a range of items with varying difficulty – and then use this shortened measure in future work. PROMIS is an excellent example of a systematic application of IRT to produce shortened measures.
Item parameters may change
Parameters may differ by group. For example, the item “I can do laundry” from activities of daily living measure may perform differently across age groups. This is an example of test bias. Similarly, parameters may change over time – for example, an item might get systematically easier. This would be an example of response shift.
You can study both bias and response shift in the IRT framework by using a group of methods referred to as differential item functioning (DIF). There are many insights from studying DIF. In our example above, if item parameters change by age group, this means that the PRO is not performing in the same way for young and old patients of the same level of PRO. When this is true, it is questionable whether you could compare scores across groups on the PRO.
DIF analysis is a powerful tool that can be used to assess bias across groups or time, response shift, differential effects of treatment by studying items across trial arms and much more.
Reliability is not constant in IRT models
In CTT, reliability is a constant value for all responders. In IRT, reliability (or information, as it is referred to) varies dependent on the level of the construct being measured. In short, this means a given PRO may be more or less reliable for different ranges of scores. IRT lets you estimate the ranges of scores where a PRO is most reliable. In turn, this allows you to make more nuanced selection decisions for the inclusion of PROs in studies, for example, as selection can be tailored to the expected severity of samples.
How can this benefit your understanding of individual level change?
One way that reliability varying may be useful is in detecting individual level change. For individual level change, you are typically trying to identify what change in a score is greater than the change you might expect due to measurement error. The implication of reliability varying is that the magnitude of change required to be deemed meaningful will also vary. For a responder at a level where the reliability for the PRO is high, a smaller change over time would be deemed outside of that expected due to error than for a responder where reliability of the PRO is low.
In summary
There are many more valuable applications of IRT, including Rasch models, score interpretations, score linkage and harmonisation, response scale analysis and adaptive testing, to name just a few.
If you find this broad topic of interest, or specific areas in particular, feel free to comment and let us know — we’d be happy to discuss and share more insights with you.






This post is a compelling look at how Item Response Theory (IRT) can unlock deeper, more nuanced insights from patient-reported outcomes. Just as IRT enhances precision by modeling individual responses, effective corporate training and strategic Consulting can tailor learning experiences to meet diverse organizational needs. If you're looking to elevate your team’s performance with data-informed development strategies, click for more on how we help organizations grow with clarity and impact.
This post is a fascinating deep dive into how precision and personalization—hallmarks of item response theory—can elevate the way we interpret patient-reported outcomes. It’s a great reminder that thoughtful modeling, whether in psychometrics or Home Renovation, leads to better, more tailored results. If you're planning a remodeling project that reflects how you truly live, visit on how data-driven design and craftsmanship can transform your space with purpose.