The new study of UK nuclear test veterans

Nations that aspire to be great powers must have the atomic bomb. So, at least, believed British politicians of both parties in the aftermath of the Second World War. Despite austerity and a shortage of resources, a programme was set up to develop a British atomic bomb, and, when it became clear that this was no longer the ultimate deterrent, a hydrogen bomb (Gowing and Arnold 1974a, 1974b). The development of these weapons needed a test programme. This eventually involved over 20 000 men from all three Armed Forces and civilian scientists. During the 1950s, fission devices were, with the agreement of the Australian Government, tested in Australia and more powerful fusion devices around Christmas Island in the Pacific. Minor trials to investigate various aspects of weapons design and safety were also conducted in Australia. A full account of the British Weapons programme has been given by the nuclear historian, Lorna Arnold (2001, 2006) and a summary has been published in the Journal (Kendall et al 2004, Muirhead et al 2004). The first suggestions of ill health in test participants were in 1970 (Hansard 1970). But was this ill health a consequence of test involvement, or was it the natural result of a population of fit young men passing into middle age? Concerns continued and, in response to a British Broadcasting Corporation (BBC) TV programme, preliminary investigations were undertaken (Knox et al 1983a, 1983b). However, these were inconclusive and a full epidemiological study to investigate this question was set up. The (then) National Radiological Protection Board was commissioned to undertake it and Dr Sarah Darby was to lead the project. In 1984 she moved to Oxford’s Clinical Trials Support Unit. This had the great benefit of bringing Sir Richard Doll into the study. Doll was, of course, the epidemiologist most responsible for changing the public view of smoking from a harmless pleasure with great benefits to the exchequer, to a public health disaster that was responsible for the premature deaths of half of those who indulged. Understandably this was not a reappraisal that was readily embraced by the many vested interests (Keating 2009), and Doll’s involvement was a guarantee, if one were needed, that the study would lead where the evidence indicated. Establishing as complete a cohort as possible of test participants was an essential and huge task. It was necessary to have a clear definition of what constituted ‘test participation’ and to enrol such participants in a way that did not depend on their subsequent health. This meant the use of contemporary documents. But it was not enough just to set up a cohort of test participants. Because test participants were selected for foreign service, they were fitter than the general population and would be expected to have better health (McLaughlin et al 2008). To detect any effects of test participation, comparisons should be made with a control group of servicemen and civilians, similar to the test participants in as many aspects as possible except that they had not taken part in the tests. This meant that comparisons could be made both with the general population (using standardised mortality ratios (SMRs) or, for cancer incidence, standardised incidence ratios (SIRs)) but also directly with the control group (using relative risks (RRs)).

Nations that aspire to be great powers must have the atomic bomb. So, at least, believed British politicians of both parties in the aftermath of the Second World War. Despite austerity and a shortage of resources, a programme was set up to develop a British atomic bomb, and, when it became clear that this was no longer the ultimate deterrent, a hydrogen bomb Arnold 1974a, 1974b).
The development of these weapons needed a test programme. This eventually involved over 20 000 men from all three Armed Forces and civilian scientists. During the 1950s, fission devices were, with the agreement of the Australian Government, tested in Australia and more powerful fusion devices around Christmas Island in the Pacific. Minor trials to investigate various aspects of weapons design and safety were also conducted in Australia. A full account of the British Weapons programme has been given by the nuclear historian, Lorna Arnold (2001Arnold ( , 2006 and a summary has been published in the Journal . The first suggestions of ill health in test participants were in 1970 (Hansard 1970). But was this ill health a consequence of test involvement, or was it the natural result of a population of fit young men passing into middle age? Concerns continued and, in response to a British Broadcasting Corporation (BBC) TV programme, preliminary investigations were undertaken (Knox et al 1983a(Knox et al , 1983b. However, these were inconclusive and a full epidemiological study to investigate this question was set up. The (then) National Radiological Protection Board was commissioned to undertake it and Dr Sarah Darby was to lead the project. In 1984 she moved to Oxford's Clinical Trials Support Unit. This had the great benefit of bringing Sir Richard Doll into the study. Doll was, of course, the epidemiologist most responsible for changing the public view of smoking from a harmless pleasure with great benefits to the exchequer, to a public health disaster that was responsible for the premature deaths of half of those who indulged. Understandably this was not a reappraisal that was readily embraced by the many vested interests (Keating 2009), and Doll's involvement was a guarantee, if one were needed, that the study would lead where the evidence indicated.
Establishing as complete a cohort as possible of test participants was an essential and huge task. It was necessary to have a clear definition of what constituted 'test participation' and to enrol such participants in a way that did not depend on their subsequent health. This meant the use of contemporary documents. But it was not enough just to set up a cohort of test participants. Because test participants were selected for foreign service, they were fitter than the general population and would be expected to have better health (McLaughlin et al 2008). To detect any effects of test participation, comparisons should be made with a control group of servicemen and civilians, similar to the test participants in as many aspects as possible except that they had not taken part in the tests. This meant that comparisons could be made both with the general population (using standardised mortality ratios (SMRs) or, for cancer incidence, standardised incidence ratios (SIRs)) but also directly with the control group (using relative risks (RRs)).

The first three epidemiological analyses
The first epidemiological analysis of nuclear test participants and controls, with follow-up to the end of 1983, was published in 1988 as a paper (Darby et al 1988b) and a detailed report (Darby et al 1988a). Contrary to press reports (Daily Mirror 2022a) the latter was freely available, through Her Majesty's Stationery Office (HMSO) for example. As expected, overall mortality in test participants was lower than that in the general population. It was also similar to that in controls, both overall and for malignant diseases taken together.
However, when specific types of cancer were examined, the SMRs for leukaemia and for multiple myeloma in test participants seemed unremarkable, SMR 113 and 111 respectively. But in controls the rates were very low: SMRs of 32 and 0 respectively, the latter following from there being no multiple myeloma deaths in controls. For both these diseases the RRs in test participants were elevated to a statistically significant extent relative to controls, with RR of 3.45 and ∞, respectively. Such large differences were unlikely to be due to the play of chance, but no other reason for them could be found. Moreover, the leukaemia cases in test participants tended to be of the types known from other evidence to be associated with radiation exposure, rather than of chronic lymphocytic leukaemia (CLL) (Armstrong et al 2012). The excess cases did not appear to be concentrated in those with recorded radiation doses, nor in groups thought to be most at risk of radiation exposure. Nevertheless, the investigators could not exclude the possibility that rates of these diseases in servicemen were naturally very low and that involvement in the test programme had elevated them to an extent that was, co-incidentally, consistent with national rates.
In order to clarify what was going on, a second analysis was carried out (Darby et al 1993a(Darby et al , 1993b) with a period of follow-up longer by seven years. As before, SMRs for all causes and for all cancers taken together were similar in participants and controls and were below those in the general population. However, in sharp contrast to the first analysis, the rates of leukaemia and of multiple myeloma in the additional years of follow-up were slightly (non-significantly) lower in test participants than in controls, and the controls had rates similar to those of the general population. This made it seem more plausible that the low rates of these diseases in controls seen in the first analysis were due to chance. Again, it seemed possible that there was a small risk of leukaemia in the early years after the tests.
A third analysis was published in 2003 (Muirhead et al 2003a(Muirhead et al , 2003b) with a further eight years follow-up compared to its predecessor. This analysis was conducted under an oversight Committee and, because of concerns expressed by Veterans' Associations, focussed especially on multiple myeloma. As with the second analysis, there was no evidence for higher risks of multiple myeloma in participants during the extended follow-up period. However, leukaemia rates, excluding CLL, were somewhat higher. There was also a suggestion of higher rates of liver cancer in participants.
We should also note a study of Australian participants in the British Nuclear Weapons tests (Carter et al 2006, Gun et al 2006, Crouch et al 2009. This was a study of men working alongside the British at the Australian tests. This study did not involve a control group but rather comparisons with the general Australian population and between groups with different degrees of radiation exposure. It was of particular importance from a UK perspective because it included a detailed retrospective review of the doses incurred, including doses from internal emitters. Few Australian test participants had assessed doses of 20 mSv or above and it was concluded that radiation exposure at the test sites was not a cause of increased cancer risks.

The latest epidemiological analysis
A fourth analysis of test participants and controls, by Gillies and Haylock, is published in this issue of the Journal (Gillies and Haylock 2022). It carries the mortality follow-up through to the end of 2017, another 19 years. The number of deaths available for analysis is now more than two and a half times the previous total and the number of incident cancers more than three times previously. In the cold-hearted world of epidemiology this counts as great progress. With so much extra data the picture must now be much clearer, must it not?
Well, up to a point. So far as multiple myeloma is concerned, the extra 19 years of follow-up again show no signs of an excess in test participants compared with controls. In the latest 19 years of follow-up, the SMRs for leukaemia excluding CLL in participants and controls were compatible with national rates and very similar to each other. However, for incidence, the SIRs were 114 and 86 and the RR 1.32 (0.99, 1.78), not far from statistical significance (p = 0.058). Given this, and an RR for incidence in participants relative to controls over the whole of the follow-up of 1.38 (p = 0.01), it is hard to disagree with Gillies and Haylock's conclusion: '… there is still evidence that non-CLL risks are higher in participants than the matched control group' .
Compared to national rates, both participants and controls show highly significantly reduced mortality over the whole follow-up period and over the most recent 19 years. From supplementary tables S17(b) and (c) one can see that this reduction shows no clear sign of disappearing, even fifty years after first test participation. A 'Healthy Soldier Effect' is commonly seen, in which those selected for service in the armed forces show lower mortality than members of the general population of the same age (McLaughlin et al 2008). It may seem remarkable that the effect should be so long-lasting, but a similar observation was made in Norwegian servicemen (Strand et al 2020). The Healthy Soldier Effect is less pronounced for cancer incidence.
Over the whole follow-up period Gillies and Haylock report significantly raised risks in participants relative to controls for all causes of death combined (p = 0.035) and suggestively raised mortality for all cancers combined (p = 0.081) and for all non-cancer causes of death combined (p = 0.070). Nevertheless, the SMRs in test participants remain significantly or suggestively low compared to national rates. Looking at individual types of solid cancer, there are significantly or suggestively (taken as p < 0.1) raised mortality in participants relative to controls for stomach cancer, cancers of the respiratory system (including lung), prostate cancer and bladder cancer. Significantly or suggestively reduced RR of mortality in participants were reported only for cancer of the kidney and ureter. These four solid cancer sites with excess mortality were broadly supported by the incidence data with the striking exception of prostate cancer, where there was no suggestion of an excess in participants. Against that, there was a significant (p = 0.004) and nearly twofold raised incidence of benign and unspecified brain and CNS tumours in participants.
As well as looking at test participants as a whole, Gillies and Haylock also examined 'Special Exposure Groups' A and B, those in whom radiation exposures might have been expected to be most likely to occur, from external radiation and from intakes of radionuclides respectively. Table S8 shows raised SIR for brain and prostate cancer for Special Exposure Groups A and B; table S9 shows raised RR for participants relative to controls for these two diseases. The overlap between A and B is not large enough to drive these observations. There are no significantly low RR for participants relative to controls in these tables. Gun et al (2008) reported excesses of cancers of the head and neck and of the prostate in Australian participants in the Australian Tests.
What are we to make of these findings-increased RRs of mortality from four types of solid cancer, as well as leukaemia, in test participants as a whole against one in controls, and excesses of the same two cancers in the two Special Exposure Groups? Should we consider an explanation other than chance? The Daily Mirror was in no doubt: any higher rates in participants were due to their test involvement (Daily Mirror 2022b), although lower rates were not explained.
The recorded radiation exposures are much too low to account for any detectable increase in cancers, accounting for perhaps one induced death overall if current risk estimates are approximately correct. Might exposures have been grossly underestimated? This seems hard to believe: the radiation hazards at the major tests were predictable and not unfamiliar and should have been managed without great difficulty. Lorna Arnold, who was not inclined to support a bland official line where the evidence pointed elsewhere (see her book on the Windscale Fire (Arnold 2007)), has a careful discussion which rings true: reasonable radiation safety standards were in place, together with rules, organisations and procedures to convert them into practice (Arnold 2006). It is improbable that these rules were always followed perfectly, but implausible that exposures many times the total actually recorded might have been missed.
It is also relevant that the study of Australian participants in the British Nuclear Weapons tests in Australia (Gun et al 2008) involved retrospective assessment of doses from both external and internal radiation, while the UK study considered only external (film badge) doses. Doses from internal radiation are those delivered by inhaled or ingested radionuclides. The Australian dose assessment was conservative, i.e. aiming for estimates that were towards the upper end of the plausible range. The investigators found that external doses were substantially bigger than internal and that Australian dose estimates were broadly similar to those used by UK investigators. There was very little scope for internal exposures at the Pacific tests and table S10 does not suggest that SIRs for participants in Australian tests were higher than those in the Pacific. Nor are there suggestions of substantial internal dose in other groups of persons exposed to atomic weapons tests , 2019, Till et al 2014. Nevertheless, the question of internal exposures in test participants has caused concern (Hansard 1991). It would not be too late to arrange for urine monitoring for long lived radionuclides from a sample of test veterans, which would likely readily decide this possibility (Dai et al 2016, Giussani et al 2020. Might test participants have been exposed to some other risk factor, either specifically because of the tests or because of their employment generally? As Gillies and Haylock point out, there is evidence for exposure to solar UV in test participants and controls generally and exposure to asbestos and of high alcohol consumption in both participants and controls from the Navy. They also note that benzene is a known leukaemogen, but that there is no real suggestion that study subjects were exposed to it. It is known that large quantities of dichlorodiphenyltrichloroethane (DDT) were sprayed from the air at Christmas Island to control flies (Hansard 1997). The International Association for Research on Cancer has classified DDT as probably carcinogenic in humans (International Agency for Research on Cancer 2018). However, the strongest human evidence relates to cancers of the liver and testis, and non-Hodgkin lymphoma. These do not show significantly increased risks in test veterans. Perhaps test veterans were exposed to other carcinogens, but we know of no plausible suggestions as to what they may have been.
The control group were selected to be as similar as possible to the test participants, except in the matter of involvement in the nuclear tests. It is abundantly clear from the SMR and SIR analyses over all four studies that this matching was successful in that mortality and incidence of cancer in test participants were much more similar to those in controls than to those in the general population. However, this matching will not have been perfect, and it is likely that there are small differences between test participants and controls in lifestyle factors and that these might lead to slightly different patterns of disease. For example, diet is a factor affecting several types of cancer including that of the stomach (Key et al 2020). Differences in smoking would have had a particularly potent effect. Higher levels of smoking than in the general population were identified as a probable contributor to the patterns of disease seen in the Australian study (Gun et al 2008).
In 1988, at the time of the first analysis, it appeared that test participants smoked less than controls (although the study team formed the impression that the test veterans smoked quite enough (Keating 2009)). There is now (supplementary tables S20(a) and (b)) suggestive evidence for slightly higher levels of smoking-related non-cancer diseases in participants (RR = 1.03, p = 0.069) though the evidence for smoking-related cancers is weaker (RR = 1.03, p = 0.138). However, the elevation of mortality from circulatory disease in the participants vs the controls is also minimal (RR = 1.03, p = 0.078) and there is a similarly slight elevation (RR = 1.03, p = 0.325) of chronic obstructive pulmonary disease among participants vs controls. Lung cancer is probably the most sensitive indicator of the effects of smoking with an RR of ∼20 expected among current smokers compared with non-smokers . The lung cancer data of Table 14b suggest that officers (and civilians, Table 12b) smoked dramatically less than the general population; other ranks smoked rather more. However, rates were similar in participants and controls and lung cancer is only very marginally increased among the participants vs controls (RR = 1.04 for incidence, RR = 1.06 for mortality). In summary, participants may have smoked a little more than controls, but there is no evidence that differences were great. It is highly unlikely that any such small difference in smoking habits could drive the observed significant elevations of leukaemia, stomach, bladder or prostate cancer in test participants.
Last, but possibly not least, the observations may be due to chance. Chance is by its nature unpredictable and capricious and certainly cannot be ruled out as a possible explanation for differences between participants and controls. There may be a lesson in remembering the apparent excess of liver cancer observed in the UK third analysis . Elevated RRs were seen in test participants, both overall and in the previous and the new period of follow-up, based on a total of 33 liver cancers in participants. Elevated risks of liver cancer had also been reported in the high-dose group of participants in the US 'Hardtack I' test (Watanabe et al 1995). But, while risks of liver cancer overall were increased in test participants, risks of primary liver cancer were not and Muirhead et al (2004) concluded that the observation was likely to be due to chance. This finely balanced judgement has been vindicated in the current analysis where, based on almost five times as many cases (154), there is no sign of a significantly elevated risk of liver cancer. Perhaps this is an illustration of the dangers of assuming that all statistically significant associations are causal.
However, the findings prompt sufficient questions to be worthy of further investigation. Assembling the cohort of test participants and controls was very difficult and expensive. Maintaining it by updating the follow-up information is relatively cheap. It is hard not to conclude that one more analysis is needed to help clarify the remaining uncertainties.