Use of locum doctors in NHS trusts in England: analysis of routinely collected workforce data 2019–2021

Objectives Temporary doctors, known as locum doctors, play an important role in the delivery of care in the National Health Service (NHS); however, little is known about the extent of locum use in NHS trusts. This study aimed to quantify and describe locum use for all NHS trusts in England in 2019–2021. Setting Descriptive analyses of data on locum shifts from all NHS trusts in England in 2019–2021. Weekly data were available for the number of shifts filled by agency and bank staff and the number of shifts requested by each trust. Negative binomial models were used to investigate the association between the proportion of medical staffing provided by locums and NHS trust characteristics. Results In 2019, on average 4.4% of total medical staffing was provided by locums, but this varied substantially across trusts (25th–75th centile=2.2%–6.2%). Over time, on average two-thirds of locum shifts were filled by locum agencies and a third by trusts’ staff banks. On average, 11.3% of shifts requested were left unfilled. In 2019–2021, the mean number of weekly shifts per trust increased by 19% (175.2–208.6) and the mean number of weekly unfilled shifts per trust increased by 54% (32.7 to 50.4). Trusts rated by the Care Quality Commission (CQC) as inadequate or requiring improvement (incidence rate ratio=1.495; 95% CI 1.191 to 1.877), and smaller trusts had a higher use of locums. Large variability was observed across regions for use of locums, proportion of shifts filled by locum agencies and unfilled shifts. Conclusions There were large variations in the demand for and use of locum doctors in NHS trusts. Trusts with poor CQC ratings and smaller trusts appear to use locum doctors more intensively compared with other trust types. Unfilled shifts were at a 3-year high at the end of 2021 suggesting increased demand which may result from growing workforce shortages in NHS trusts.

The methods are clear, the results are clearly presented, and the discussion makes sense. Limitations are articulated.
I have a few suggestions that may help improve clarity further.
It's a bit difficult to understand the logic of the several measures without first seeing the box. The authors might introduce the outcome measures section with the statement "a worked example of the algorithm….', then go into the description of the measures. Within the box, there is a typo (page 17, line 11). I believe it should be 'doctor FTE reported in that month.' I'm having trouble reconciling the statement in the abstract that 'over time, 2/3 of locum shifts…' with the results section, page 8, line 4 '45.3% were bank shifts and 54.7% were agency.' I see lower in that page (line 36) the 2/3 reference. Unless I'm misreading, those should all be consistent: they all represent the proportion of locum use that is agency vs. bank. Perhaps the distinction is in 'total UNADJUSTED' on line 3, page 8. If so, the authors should clarify in the abstract and the other sections that the 2/3 is an adjusted number. Page 7, line 44 in the parentheses, I believe the authors mean that '8.6% of data were missing.' While the authors report IRRs that have very large percentage differences, it would be helpful to also provide an estimate of the Introduction Page 5 -Lines 6 to 13: The authors use various words to describe medical staff which makes it difficult to understand who they are referring to. For instance, in the same paragraph, you describe medical staff, NHS consultants, higher specialist trainees and middle-grade doctors. The author should perhaps briefly explain that the medical staff is diverse to avoid confusion. Response: Thank you, this is valid point. We have added a statement in p.5 par1. to clarify this. The statement reads, "In the UK, challenges in the recruitment and retention of medical staff, including doctors of all grades, consultants, registrars and other doctors in training," Page 5 -Line 41: why not in 2021 too? Response: We selected 2019 as the primary year of analyses for several reasons. First, following the implementation of lockdown measures during the first wave of the COVID-19 pandemic, services in the NHS were disrupted with dramatic reductions in healthcare utilisation, hospital admissions and diagnostic and imaging procedures, (1,2) and with many UK hospitals focusing solely on managing COVID-19 patients. During this period, locum working was severely affected, where redeployment of NHS permanent staff, on-hold services, cancelations of annual leave and the lack of demand at the various COVID-19 hospitals resulted in less demand for locum doctors, for some of whom, work disappeared overnight. (3) This lack of demand for locum doctors during the first UK lockdown is evident in figure 3 where we show the variation in the mean number of locum shifts in 2019-2021, but has also been documented in our qualitative work which is underway. Second, in December 2021 when the UK was recording the highest COVID-19 case rates since the beginning of the pandemic, the demand for locum doctors in NHS trusts peaked, as shown again in figure, perhaps due to rising waiting times and increased workload for all NHS doctors. We therefore observed patterns of instability in the delivery of locum doctor services across NHS trusts in 2020-2021, and made the decision to analyse locum data for that period, descriptively (additional descriptive analysis is contained in the supplementary material). Third, some of the drivers of locum use that were used in our negative binomial models, such as CQC ratings, trust size and trust type may not have been relevant during the pandemic and interpretation would have been difficult. We have added a statement to clarify the use of 2019 data. In p.5 par.3 the statement now reads "We explore regional variations for these measures and identify NHS trusts with the highest and lowest locum usage in 2019, as this was the year before the onset of the COVID-19 pandemic which was a period of substantial disruptions in the delivery of NHS services.
Page 6 lines 23-29: The authors should be clearer as to which data and analyses are presented in their paper. It is only at this stage that you are referring to "dentists" as it was not mentioned earlier in the manuscript. What are the rationales for presenting data for doctors and dentists altogether? To my knowledge dentists are not doctors in the UK, so it's not clear why results should be presented together. Finally, the literature and aims should be re-defined on the basis that your analysis includes data on dentists. Response: This is a very good point, thank you. In the UK, the majority of NHS staffapproximately 1.2 million full-time equivalents -work in 'hospital and community services' (HCHS) as direct employees of NHS trusts providing acute, ambulance, mental health and community services. (4) This group also includes 150,000 staff who are substantive trust employees (i.e. they are on the trust's payroll) and work in general practice, community pharmacies and dentistry.(5) Therefore, the NHS Improvement database provides data for all trusts' substantive employees in every staff group, including our group of interest, defined as the medical and dental staff group. High street dentists are not included in the data. We understand that this was confusing, and we have amended the text to reflect this. In p.6 par. 3 the statement now reads, "The data capture a snapshot of the weekly number of shifts done by doctors within hospital and community services (HCHS) of the NHS, who are defined as all practising doctors who are registered with the General Medical Council (GMC)including some GPs and dental staff -who are employed substantively by trusts i.e. are on a trust's payroll". We would also like to highlight that dental staff, concerns only a very small fraction of the NHS trust workforce. We have not changed the aims as we think this is likely to mislead the reader given the dental group is very small and not our intended focuswe hope the additional text helps clarify this.
Proportion of unfilled shifts Page 8 line 24: Could the authors explain why they're only using the year 2019 to run the negative binomial model on locum intensity? Especially if you're interested in the "between region variation". Using the years 2020 and 2021 may shed more light on such "between variation". Using a mixed effect negative binomial may also be of interest by having the region variable as the "random effect" if you suspect some clustering at the region level. The author should also briefly explain the rationales for using the offset. They are simply quoted without proving a justification for their choice. Response: Thank you for your suggestions. For the reasons outlined in our response to your fifth comment, we ran the negative binomial model for all outcomes, using only the year 2019. Of course, if the reviewer would like to see the results from a model using all years, we can add them in the next revision. However, we feel that the instability that was observed during 2020 and 2021 across NHS services would make the interpretation of results problematic. We have used a mixed effect negative binomial model with region as a random effect and the results from the mixed effects and primary negative binomial model are provided below. Please note, the model below (Table 1) uses the total FTE of all clinically qualified staff who are substantive employees of NHS trusts following your next suggestion to use this as a measure for trust size in the place of permanent doctor FTE. We can supply results from the model using permanent doctor FTE as a measure of trust size, if needed. The results from the two models were almost identical, however, the likelihood ratio test (LR) favoured the negative binomial model over the mixed effects model with a region specific random (LR test for mixed effects model vs negative binomial model = 0.68 [Pr>= chibar2 =0.2050]); indicating that there was no significant variation in the number of locum shifts between regions. We therefore decided to keep the negative binomial model as the primary model in our analyses. Similar results were observed for the number of agency shifts (LR test for mixed effects model vs negative binomial model = 0.90 [Pr>= chibar2 =0.1717]) and the number of unfilled shifts (LR test for mixed effects model vs negative binomial model = 0.78 [Pr>= chibar2 =0.1937]). Regarding the choice of offsets, we added a statement to justify their use in the text. In p.9 par.3 the statement reads "Our dependent variables were: the mean number of locum shifts (offset: natural logarithm of mean permanent doctor FTE to model the rate of locum shifts per permanent doctor FTE); the mean number of agency shifts (offset: natural logarithm of the mean total shifts to model the rate at which a shift is filled by agency staff); and the mean number of unfilled shifts (offset: natural logarithm of mean shifts requested to model the rate at which a requested shift is left unfilled). Offsets are used as each dependent variable is derived from count data, where the value of the count is determined by the size of the workforce or exposure to locums." b Coefficients can be interpreted as proportionate changes, for example, trusts that were rated as inadequate and requiring improvement had 45.5% higher locum intensity than trusts that were rated as having good and outstanding services.
Page 8 line 38 I wonder if only using trust permanent doctor FTE is enough to "measure" the trust size. Would it be worth considering the number of permanent nursing staff as well, as they represent the largest health workforce group? Response: Thank you for your suggestion. We had initially tested our models for the inclusion of quintiles for all qualified clinical staff, including all HCHS doctors, nurses and health visitors, midwives, ambulance staff, scientific therapeutic and technical staff to measure trust size. We include the results from this analysis in our responses (Table 2). However, we chose not to provide these in the original submission because the model with quintiles of doctor FTE to measure trust size, performed slightly better according to the Akaike (AIC) and Bayesian (BIC) information criteria which are typically used to inform model selection in these cases. The negative binomial with doctor FTE as a measure of trust size minimised the AIC=2,584.11 and BIC=2,648.59 compared to the negative binomial model with all clinically qualified staff FTE as a measure of trust size, AIC=2,585.46 and BIC=2,649.93. b Coefficients can be interpreted as proportionate changes, for example, trusts that were rated as inadequate and requiring improvement had 45% higher locum intensity than trusts that were rated as having good and outstanding services.
Results Page 9 line 3 -Could the authors specify the share of locum shifts in comparison to all the shifts worked by medical staff who are not locum? This could help contextualise the findings and help further understand the extent of locum shift use. For instance, do such shifts represent 5, 10 or 15% of shifts worked by medical staff overall? Response: Thank you for your comment. The number of shifts worked by permanent medical staff is not available and this is the reason our measure of locum intensity adjusted for permanent doctor FTE instead. NHS Improvement only collect data on the number of shifts filled by locums. NHS Digital collects data on FTE and headcount for all permanent doctors working in NHS trusts but does not collect any information on locum doctors. Therefore, the two organisations report their data in different units (shifts vs FTE) and this did not allow us to express the share of locum shifts in comparison to all shifts worked by all other medical staff. However, using permanent doctor FTE we adjusted the estimates of locum use and expressed rates of use in comparison to the work done by permanent doctors. For example, in p.9 par.3 (locum intensity results) we found that on average 4.4% of medical staffing in 2019 (i.e. locum intensity=0.22) was provided by locums doctors across NHS trusts. This is also acknowledged in the limitations (p.16 last para) and we added additional information to explain this. The new statement reads, "Second, although NHS Improvement collects data on the number of locum shifts, it does not collect the shift duration, locum FTE or the number of shifts filled by permanent doctors which would allow a more straightforward comparison with permanent doctor FTE. Therefore, we had to assume that shift lengths for permanent and locum doctors were broadly equivalent in order to estimate the proportion of medical staffing provided by locum doctors. Should data on the number of shifts filled by permanent doctors or data on locum FTE become available, this limitation could be addressed".
Regional variation in locum use The descriptive statistics were nicely and concisely written. Well done. Again, why not use the data from the years 2020 and 2021? Response: Here we aimed to provide descriptive statistics and explore regional variation for the main year of analyses (i.e. 2019). This was before any impacts of the COVID 19 pandemic, which we didn't want to focus on in all the analysis. Regional variation was broadly similar in all years, as shown in the updated figure 2 in the supplement, and we feel that the descriptive statistics table should focus solely on the year 2019, for space reasons and the readability of the manuscript. Following your suggestion, we now provide data on regional variation in locum use, agency shifts and unfilled shifts for the years 2020 and 2021. (Tables 2 and 3 in the supplement).
Results from regression analyses I realise there are many results to present, but Table 3 could benefit from being re-arranged to be more reader-friendly. Perhaps using squared brackets for the robust standard errors and stars for the significance level. Also, the authors should briefly add the interpretation of IRR (e.g. meaning when the IRR are below or above 0). Note "b" could perhaps be moved up the table in the 1st paragraph. As for the result interpretation, should it not be as follows: North West had on average 4.5% higher locum intensity than trust in London, as the IRR=1.045? Response: Thank you for these helpful suggestions. We have added square brackets for standard errors, and added asterisks to denote significant results. We added "b" from the footnote in the main text p.12 par.1. and we also provide a statement about the interpretation of IRRs which now reads "The results are reported as incidence rate ratios (IRRs) for the coefficients of interest followed by Pvalues and standard errors in square brackets and 95% confidence intervals in brackets. IRRs are defined as the number of exposed events (e.g. number of locum shifts) divided by the number of unexposed events (offsete.g. permanent doctor FTE) in each time period and are essentially a ratio of two incidence rates. An IRR with a value greater than 1 indicates that the incident rate is higher in an exposed group compared to an unexposed group and the opposite is true for an IRR value less than 1". We also corrected the typo in the footnote which now reads "Coefficients can be interpreted as proportionate changes, for example, trusts in the North West had on average 4.5% higher locum intensity than trusts in London." We believe these changes have improved substantially table 2. Table 2 -Locum intensity Can the authors explain why the CIs of ambulance trust services are so wide (IRR=55.43; 95% CI 20.56 to 149).? I'm sceptical as to whether such results are reliable. It may be of interest to present the confidence in squared brackets in the result section (e.g. 0.496 [0.299 -0.258 95% CI]) to make it more reader-friendly. Response: There were only 9 ambulance trusts in the data with high variability between them in terms of permanent doctor FTE and locum shifts. Ambulance trusts employed very few doctors (permanent doctor FTE mean=1.77 FTE [sd=1.40; median=1.29; 25th-75th percentile: 0.73 -3.34]) and had few locum shifts (mean locum shifts=12.1 [sd=26.5; median=0; 25th -75th percentile: 0 -3.95]) in 2019. In p.16 par.1 we do acknowledge that these findings are a statistical artefact due to the very low number of permanent doctors and the very low number of locum shifts in these trusts. We amended the previous statement to make this more eminent in the manuscript. The statement now reads "However, this result is an artefact of the very low numbers of permanent doctors employed by ambulance trusts and the very small number of locum shifts filled when compared to other trust types." Table 2 -Agency shift. It would be worth mentioning that the results for the trust size are not significant. Response: We have added a statement at the end of p.15 par.1, which reads, "The effects of trust size on the proportion of agency shifts were not statistically significant".

Summary
These findings "Our findings show that on average 4.4% of medical staffing in NHS trusts in 2019 was provided by locum medical staffing" only appears in the summary and should be added in the results section as well. Can the authors elaborate when they state ".
[…] and can provide important information about the effective planning of the NHS workforce"? What do the authors mean in terms of "effectiveness" (e.g. effective in terms of staff-per-patient ratio, saving costs or productivity?) As these aspects are not addressed in their paper I would suggest rephrasing by being more specific. Response: Thank you, this is a very good suggestion. We have added a statement in the results section, p. Could the authors estimate (if data or available research permits) the extra financial costs of using agency locum to fill shifts, based on their regression results? For instance, trusts with worse CQC ratings use more locums: what would this represents in terms of pound sterling for that trust (e.g. extra £ pound spent) Response: Due to the NHS Improvement database being relatively new, cost information for bank or agency locums is not available and we therefore cannot estimate the extra financial costs of using agency locum to fill shifts. We could provide a crude estimate of the cost but given the large fluctuations in locum costs , our estimate would not be accurate. We also added a statement in p.16 par. 2 -strengths and limitationsabout the non-availability of cost data. This statement reads, "Fifth, the data do not contain any information on costs for locum doctors and we were therefore unable to estimate the extra financial costs of using locums to fill shifts." Strength and limitations -Line 34: I would suggest rephrasing the following sentence "We reveal the impact of COVID-19 on locum use in NHS Trust." Considering you provided only descriptive statistics for the year 2020, your findings are not measuring the impact, per se. Rather you're providing evidence on the extent to which locum staff were used during the pandemic. Response: This is very good point. We deleted the previous statement and added a new statement which reads "We provide evidence on the extent of locum use across NHS Trusts during the COVID-19 pandemic.". I would suggest using the same font for the last sections, contributors, ethics approval etc. Response: Thank you. We have now applied the same font and font size to the last sections of the manuscript. 5. While the authors report IRRs that have very large percentage differences, it would be helpful to also provide an estimate of the absolute differences in shift coverages, which may not be quite as dramatic. Response: Thank you, this is a very good suggestion. We estimated the absolute differences in shift coverages for the statistically significant coefficients across all models and the results are provided below. To do this we used the margins command in Stata, which estimates absolute differences in shift coverages for all three outcomes adjusted for all predictors in the models for the strata of interest. The table with the estimates of absolute differences in shift coverages is now provided in the supplement (table S5)  -a Model A included data on 220 trusts (observation) while models B and C included data on 214 trusts with robust standard errors. b Coefficients can be interpreted as absolute changes, for example, trusts that were rated as inadequate and requiring improvement on average had 84.5 more weekly locum shifts than trusts that were rated as having good and outstanding services.