A geospatiotemporal and causal inference epidemiological exploration of substance and cannabinoid exposure as drivers of rising US pediatric cancer rates

Age-adjusted US total pediatric cancer incidence rates (TPCIR) rose 49% 1975–2015 for unknown reasons. Prenatal cannabis exposure has been linked with several pediatric cancers which together comprise the majority of pediatric cancer types. We investigated whether cannabis use was related spatiotemporally and causally to TPCIR. State-based age-adjusted TPCIR data was taken from the CDC Surveillance, Epidemiology and End Results cancer database 2003–2017. Drug exposure was taken from the nationally-representative National Survey of Drug Use and Health, response rate 74.1%. Drugs included were: tobacco, alcohol, cannabis, opioid analgesics and cocaine. This was supplemented by cannabinoid concentration data from the Drug Enforcement Agency and ethnicity and median household income data from US Census. TPCIR rose while all drug use nationally fell, except for cannabis which rose. TPCIR in the highest cannabis use quintile was greater than in the lowest (β-estimate = 1.31 (95%C.I. 0.82, 1.80), P = 1.80 × 10− 7) and the time:highest two quintiles interaction was significant (β-estimate = 0.1395 (0.82, 1.80), P = 1.00 × 10− 14). In robust inverse probability weighted additive regression models cannabis was independently associated with TPCIR (β-estimate = 9.55 (3.95, 15.15), P = 0.0016). In interactive geospatiotemporal models including all drug, ethnic and income variables cannabis use was independently significant (β-estimate = 45.67 (18.77, 72.56), P = 0.0009). In geospatial models temporally lagged to 1,2,4 and 6 years interactive terms including cannabis were significant. Cannabis interactive terms at one and two degrees of spatial lagging were significant (from β-estimate = 3954.04 (1565.01, 6343.09), P = 0.0012). The interaction between the cannabinoids THC and cannabigerol was significant at zero, 2 and 6 years lag (from β-estimate = 46.22 (30.06, 62.38), P = 2.10 × 10− 8). Cannabis legalization was associated with higher TPCIR (β-estimate = 1.51 (0.68, 2.35), P = 0.0004) and cannabis-liberal regimes were associated with higher time:TPCIR interaction (β-estimate = 1.87 × 10− 4, (2.9 × 10− 5, 2.45 × 10− 4), P = 0.0208). 33/56 minimum e-Values were > 5 and 6 were infinite. Data confirm a close relationship across space and lagged time between cannabis and TPCIR which was robust to adjustment, supported by inverse probability weighting procedures and accompanied by high e-Values making confounding unlikely and establishing the causal relationship. Cannabis-liberal jurisdictions were associated with higher rates of TPCIR and a faster rate of TPCIR increase. Data inform the broader general consideration of cannabinoid-induced genotoxicity.


(Continued from previous page)
Conclusion: Data confirm a close relationship across space and lagged time between cannabis and TPCIR which was robust to adjustment, supported by inverse probability weighting procedures and accompanied by high e-Values making confounding unlikely and establishing the causal relationship. Cannabis-liberal jurisdictions were associated with higher rates of TPCIR and a faster rate of TPCIR increase. Data inform the broader general consideration of cannabinoid-induced genotoxicity.
Keywords: Cannabis, Cannabinoid, Δ9-tetrahydrocannabinol, Cannabigerol, Genotoxicity, Acute leukaemia, Pediatric cancer Background CDC Surveillance, Epidemiology and End Results (SEER) data from 9 US cancer registries indicates that the ageadjusted total Pediatric (age less than 20 years) cancer incidence rate (TPCIR) has risen 49.0% from 12.96 to 19.32 / 100,000 from 1975 to 2015 [1]. Cancer incidence is U-shaped across the pediatric age range being higher in the under 5 years and over 14 years age groups [2]. Leukaemias, brain and nervous system, neuroblastoma, soft tissue sarcoma, lymphoma and testicular cancer are amongst the commonest pediatric cancers [2,3].
Notwithstanding a generally falling mortality rate from childhood cancer, the TPCIR incidence is acknowledged to be rising since the records of collated cancer registries were first published in 1975 [2]. The cause of this unprecedented increase is at present unclear. Moreover major ethnic differentials are evident for tumours such as All Childhood Cancer (ACC), acute lymphatic leukaemia (ALL) and brain and testicular cancers where the rates in African-American patients vary from 20 to 70% of those in the Caucasian-American community [2]. Again the reasons for such large ethnic disparities are unknown. It therefore appears that several of the major questions relating to the aetiopathogenesis of pediatric cancer are outstanding.
Whilst in adult populations the relationship between cannabis use and cancer incidence is controversial with both positive and negative reports in existence [4,5], amongst pediatric populations the situation is much clearer. It was noted by the California Environmental Protection Agency in a very detailed literature review that five of six studies reported a positive relationship [6][7][8][9][10][11]. Parental cannabis use has been linked with acute lymphatic leukaemia, acute myeloid leukaemia, childhood astrocytoma, rhabdomyosarcoma and neuroblastoma [2, [7][8][9][10][11][12]. Together these comprise 60-70% of the total cancers seen in children younger than 14 years and those between 15 and 20 years [2]. In such a context it becomes plausible that the rise in cannabis use since the 1960's may be a primary driver of total pediatric cancer.
Testicular cancer is a particularly interesting case. It is well established that testicular cancer occurs mainly in younger men with an age peak at 30-34 years and 20% of cases occur in the pediatric age range [1]. The testes houses the germ cells and cannabinoids are known to have myriad direct effects on the reproductive tract in both sexes [13][14][15][16][17]. There is great uniformity in studies of the cannabis-testicular cancer link as all four studies found a risk elevation of over two-fold [18][19][20][21] with an overall risk for current, weekly and chronic smokers of non-seminomatous germ cell tumours estimated in meta-analysis of 2.59 (95%C.I. 1. 60-4.19) [22]. Since pediatric cancer often results from inherited genetic errors [23,24] this implies that major genetic errors in germ cells are induced by parental cannabis exposure.
Adding to concerns related to the potentially genotoxic actions of prenatal cannabinoid exposure (PCE) is an increasing interest in elevation of many birth defects following PCE in Hawaii, Colorado, Canada and Australia [25][26][27][28]. A recent report noted a three-fold rise in total congenital defects in the northern Territories of Canada where more cannabis is smoked [28]. Downs syndrome, due to a major genetic trisomic error, has also been found to be elevated following PCE in Hawaii, Colorado and Australia [25][26][27] and this syndrome has an established link with childhood ALL with 6-10% of Downs syndrome children being affected by this malignancy [29,30].
As discussed below the physiology and pathophysiology of both the endocannabinoid system and the impacts of diverse exogenous phytocannabinoids is presently being studied in great detail and major impacts on reproductive health, genetic and physiological quality of gametes, epigenetic effects on both DNA methylation and histone synthesis and signalling, immunomodulatory and mitochondriopathic effects, and transgenerational inheritable epigenetic effects in both man and mouse are well established and have been demonstrated by a number of investigators [15,17,[31][32][33][34][35][36][37][38].
Concerns are heightened by the recent demonstration that 69% of cannabis dispensaries in Colorado recommended cannabis use to pregnant patients for various symptoms in a recent telephone survey [39] and that in 2017 an estimated 161,000 women used cannabis whilst pregnant across USA [40,41].
Taken together these data suggest that an improved understanding of cannabis-related carcinogenesis in the closely defined pediatric context might well lead to important insights into cannabis-related genotoxicity more generally [42,43]. Moreover the advent of sophisticated geospatial analysis together with some of the formal techniques of causal inference analysis implies that sophisticated and modern analytical procedures could be brought to bear on these important and increasingly topical issues. Techniques such as inverse probability weighting and e-Values are designed to formally investigate causal, as opposed to merely associational, relationships.
The objective of this study was to determine if the rise in pediatric cancers across USA paralleled the recent rise in the use of cannabis when considered formally across space and time, and if the relationship met the criteria for causal inference when assessed by strict quantitative criteria.

Data
Annual data on age-adjusted rates of pediatric cancer cases occurring in patients less than 20 years old was accessed from the publicly available SEER*Explorer website [1]. Data on state-based pediatric cancer rates was accessed via the SEER*Stat software from the SEER / NCI database [44]. Drug use data was accessed from the nationally representative National Survey of Drug Use and Health (NSDUH) conducted by the Substance Abuse and Mental Health Services Administration (SAMHSA) [45]. This survey reports a 74.1% response rate [46]. Data on the following drug variables was collated: monthly cigarette use; annual alcohol use disorder, monthly cannabis use, annual analgesic abuse and annual cocaine use. Data on ethnic composition and median household income by state and year was accessed via the tidycensus package in R from the US Census Bureau. The ethnicities for which data was collected were: Caucasian American, African American, Hispanic American, Asian American, American Indian / Alaskan Native American, Native Hawaiian / Pacific Islander American. Data on national cannabinoid concentrations for Δ9-tetrahydrocannabinol (THC), cannabinol, cannabigerol and cannabichromene was obtained from various published reports [47][48][49]. Data on cannabis legal status was adduced from an internet search [50].

Derived data
Given the clear differences in drug use by ethnicity it was considered important to formally take ethnic cannabis use into account in regression modelling. Data on the frequency of cannabis use by ethnicity was available at the national level from the SAMHSA Substance Abuse and Mental Health Data Archive (SAMHDA) Restricted Use Data Analysis System (RDAS) [45]. For each ethnicity and for each year the percentage of the ethnicity using cannabis at the midpoint of the indicated frequency were multiplied together and summed to gain an ethnic cannabis use index. Hence if fraction x of an ethnicity used cannabis from 20 to 30 days per month then x would be multiplied by 25. This was repeated and summed across all use frequencies to obtain a specific ethnic cannabis use index for that year. This index was multiplied by the state cannabis use rate and the THC concentration in that year to derive an estimate of the ethnic exposure to THC in each state. Similarly the concentration of selected cannabinoids was multiplied by the state cannabis use rate to derive a state based exposure to that cannabinoid. Cannabis use quintiles were defined in each year and concatenated to form strata across all years.

Missing data
The total pediatric cancer rate for Wyoming 2008 was absent. This was imputed as the mean of its rate in 2007 and 2009. The rate of analgesic use was missing for all states in 2015. This was imputed as the mean of the state rates for 2014 and 2016.
Statistics R version 4.0.2 (2020-06-22) from CRAN was used for data analysis and accessed via the RStudio 1.2.5042 (2009-2020) GUI. Data analysis was performed in September 2020. Graphs and map-graphs were drawn using packages ggplot, albersusa and sf. Covariates were logtransformed to approximate normality based on the Shapiro-Wilks test. Linear, mixed effects, panel, robust marginal structural models and spatial models were studied using packages base, nlme, plm, survey and splm (spatial panel linear models) respectively [51][52][53]. In each case model reduction was performed by the classical technique of serial deletion of the least significant term. A variety of modelling procedures was employed for the following reasons. Mixed effects regression was useful for state-wise study of data, for inverse probability weighted corrections, and for generation of standard deviations which can be input to eValue calculations. Panel regression modelling was well suited to the time series sequential nature of the dataset, can be inverse probability weighted and allowed the use of both lagging and instrumental variables. Robust regression was conducted to examine the robust effects after inverse probability weighting. Spatiotemporal regression was performed as the data are inherently distributed across space and time and there was good evidence from the models for both spatial and temporal autocorrelation (see Results). As the models also produce a variance estimate their output is well suited to the calculation of e-Values. Inverse probability weighting was conducted with the ipw package and e-Values for regression models were calculated with the package EValue. Tests for trend were conducted with the chi squared test in Base. T-tests were conducted for parametric group comparisons and were two tailed. P < 0.05 was considered significant throughout.
Panel analysis utilized the pooling technique, a time effect, the random method of Swarmy, the instrumental method of Amemiya and were inverse probability weighted. Robust structural models were conducted by state and were inverse probability weighted.

Spatial analysis
Interstate geospatial linkages were made on the "queen" basis of shared edges or corners and compiled with the poly2nb function from package spdep. They were edited as described so that no state, such as Alaska or Hawaii, was left geospatially isolated (as shown in Results). Model specification of spatial models was undertaken from the general full model to the specific [54]. That is to say the standard spatiotemporal regression model was conducted using the splm function spreml (spatial panel random effects maximum likelihood) including spatial autocorrelation after Kapoor, Kelejian and Prucha [55], random effects, serial correlation in the residual errors and spatial autocorrelation, coded as sem2srre in spreml models [52]. Significance of the final model parameters phi, psi, rho and lambda which quantify random error, serial correlation in the residuals, spatial error correlation and spatial autocorrelation respectively, confirmed that this maximal structure was appropriate (see Results tables). The spatial error adjustment of Kapoor, Kelejian and Prucha takes into account spatial correlation in both the exposure and the outcome and this was considered to be reflective of the real world situation in this case [54]. spreml models do allow the use of both spatial and temporal lagging which has been utilized as described. At the time of writing splm and spreml spatial models do not allow the use of instrumental variables or inverse probability weighting which implies the need for supplementary techniques.

Causal inference
Two techniques of causal inference were employed. Inverse probability weights were constructed for the exposure of interest, monthly cannabis exposure, as a function of the other drug variables which were our primary variables of interest. These weights were used to weight mixed effects, panel and robust regression models appropriately. The effect of this procedure is to equalize exposure across study groups and has also been validated for continuous exposures as considered here.
Such techniques are said to create pseudo-randomized groups from which causal inferences can properly be made. We also calculated e-Values which are a measure of the association required of any unmeasured potential confounder variable with both the exposure and the outcome to discount the reported results. In the literature minimum (of the two) e-Values above 1.25 are commonly considered of relevance [56].

Data availability
All data, including R code, inverse probability weights, geospatial weights, and source datasets, has been made publicly available through the Mendeley data base repository and may be accessed at this URL: https://doi. org/10.17632/cnwv9hdspd.1.

Ethics
The datasets used were all publicly available and deidentified. No reference has been made at any point to individually identifiable data. The present work was approved by the University of Western Australia Human Research Ethics Committee on June 7th 2019 (No. RA/ 4/20/4724).

Results
Inspection of the SEER*Explorer website shows that at the national level that age-adjusted rates of several cancers in the pediatric age group (younger than 20 years) are rising including all cancer and acute lymphatic leukaemia which is the commonest tumour. The annotation on the SEER website is made from the JoinPoint program which also comes from NCI and CDC. These tumours are listed in Table 1 and illustrated in Fig. 1 using data based on 9 US cancer registries 1975-2017. Supplementary Figure 1 shows other cancers which are mostly rising utilizing data from 21 US cancer registries 2000-2017. Figure 2 shows national drug exposure data from NSDUH 2003-2017 and US Census bureau median household income data. It is important to note that exposure to most classes of drugs is dropping with the notable exception of cannabis. Since SAMHSA NSDUH data could be temporally matched to the CDC SEER cancer database for the years 2003-2017, this became the period of analysis. Figure 3 shows the concentration of various cannabinoids found in federal cannabis seizures 1980-2017 [47][48][49]. Figure 4 shows the age-adjusted state-based TPCIR plotted as a function of exposure to the various substances listed. The regression line for cannabis is noted to be weakly and non-significantly positive. Figure 5 shows plots of the TPCIR rate against selected cannabinoids. The regression lines for THC and cannabigerol appear to be strongly positive. Figure 6 shows the TPCIR as a function of ethnic cannabis exposure. In each case the regression line appears to be strongly positive and up-sloping. Table 2 lists applicable results from linear regression against time, cannabis, THC, various substances, cannabinoids and ethnicity. Many results are significant with the notable exception of cannabis. Figure 7 shows the result of assessing the TPCIR as a function of cannabis use quintiles both cross-sectionally (boxplots) and over time (scatterplots). Panel A appears to show a rising trend with cannabis use quintile. One notes in particular that the notches of the fourth and fifth quintiles do not overlap those of Quintiles 1 and 2 which indicates significance. In Panel B the highest two quintiles seem to be above the lower ones over time. Panel C and D look at the data dichotomized into the two highest quintiles compared to the three lower ones. When comparing the highest and lowest quintile of cannabis use the TPCIR in the highest quintiles is significantly greater than that in the lowest quintile (t = 5.038, df = 299.6, P = 8.15 × 10 − 7 ). Comparing the two dichotomized cannabis quintile groups they are   Table 3 are found. Table 4 presents results from increasingly complex robust inverse probability weighted marginal structural models. Results for additive, interactive with drugs only, interactive including drugs, race and income and interactive including cannabinoids, drugs, race and income models are shown. It is particularly noteworthy that in a simple additive robust model (listed first in the table) cannabis is independently highly significant (β-estimate = 9.55 95%C.I. (3.95, 15.15), P = 0.0016).
Since these robust models are not accompanied by a model variance it is necessary to also use a mixed effects model system in order to be able to calculate e-Values subsequently. Mixed effects modelling was also conducted after inverse probability weighting (Table 5). Again a series of increasingly complex models is shown progressing through additive, drug-interactive, full models including drugs, income and ethnicity, and a full model including the two cannabinoids THC and cannabigerol. Importantly in the first three models cannabis is independently highly statistically significant (from βestimate = 79.27 (56.77, 101.78), P = 1.2 × 10 − 11 ).
Since the data are gridded in space and time they are well suited for panel linear modelling, a technique which, in addition to inverse probability weighting, allows the added refinements of instrumental variables and temporal lagging. Temporal lagging is pathophysiologically important in such studies as it is likely that any procarcinogenic or environmental exposure takes some time to work before the clinical and epidemiological impact of genotoxicity becomes evident. Again a series of increasingly complex models is presented at increasing lags ( Table 6). Cannabis is again highly significant in many terms, including being independently significant in additive models (from βestimate = 5.31 (1. 68, 8.95), P = 0.0042).
Data is also evidently oriented in space and time and is thus eminently suited for formal spatiotemporal analysis. Map-graphs of the data over the 16 years 2002-2017 are shown in Fig. 8. Fig. 9 shows the geospatial relationships between the contiguous American states and the manner in which links to Hawaii and Alaska have been edited in to define the final spatial neighbourhood network based on "queen" (edge and corner) contiguity. This neighbourhood sparse weights matrix is utilized in all the spatial regressions which follow. Table 7 shows the initial results from a series of additive and increasingly complex unlagged interactive spatiotemporal models. The table includes the log of the maximum likelihood ratio (Log.Lik.) at model optimization, and the specifically geospatial model coefficients phi, psi, rho and lambda (see Methods). Since all four of these parameters are generally highly significant this confirms that the full model specification (denoted 'sem2srre' in splm::spreml) is appropriate. The Table also lists the standard deviation of each model which is a required input for E-Value calculation. Again cannabis is noted to be independently highly significant in each model. Table 8 shows the results of models lagged first just with cannabis and then for all drugs. Interactive terms including cannabis continue to be highly significant.
Interactive terms including cannabis are significant from β-estimate = 658.72 (396.60, 920.84), P = 8.40 × 10 − 7 for cigarettes: cannabis: alcohol interaction at 2 years of lag. Table 9 presents results of models lagged in space for cannabis and in time for the other drugs. Table 10 presents the results of temporally lagged interactive space-time models including the two cannabinoids THC and cannabigerol. Cannabigerol is independently significant at 2 lags, and the THC:cannabigerol interaction is significant at zero, two and six lags.
As mentioned in Methods, well described ethnic disparities exist for many tumours including total cancer. However it is important to consider to what extent such drug use disparities might account for the known epidemiology of TPCIR . Table 11 presents an interactive geospatial regression of the TPCIR against THC exposure of five races as indicated with highly significant results. E-Values are an important way of quantitating the magnitude of co-association required of any unmeasured confounder with both the exposure and outcome variables to explain away the observed effects. Table 12 presents selected E-Value calculations from linear, mixed effects and geospatial models presented in preceding Tables. The key variable to observe is the final number at the right hand side representing the minimum E-Value, and should be read in the light of the observation by one of its originators that E-Values in the literature over approximately 1.25 are considered noteworthy [56]. In general terms the E-Values fall in the sequence geospatial models > mixed effects models > linear models, related partly to the much smaller model variance of more complex models. Table 12 lists 56 E-Values related to cannabis or cannabinoids of which 24 are larger than 1000. Of the 33 E-Values originating from geospatial models, 20 are larger than 1000. The table lists six minimum e-Values of infinity, three deriving from mixed effects models and three from geospatial models.
Given the above compelling data demonstrating a link between rising rates of cannabis exposure and rising TPCIR an obvious extension of this study was whether the increasing use, availability and concentration of cannabis associated with more liberal legal paradigms [57] was associated with elevated TPCIR . One important caveat on such an investigation is that since the data only run to 2017 and many populous states had not yet been affected by the cannabis legalization movement, it may be considered that the data is premature for a full determination of this potential effect. Fig. 10a shows the rate of TPCIR under various legal paradigms. Whilst the few states involved with full cannabis legalization at that time were associated with broad confidence interval bands there is a clear impression in this Figure that the rate under decriminalization appeared to be at a higher levels than others. Fig. 10b dichotomizes the data into liberal paradigms vs. traditional policies of cannabis being considered illegal. Separation of the two regression lines towards the right hand side of the graph gives a clear impression for a significant interaction between time and dichotomized legal status.
These differences are formally assessed in Table 13 by linear regression. Decriminalized and legal status are both  , P = 0.0208). Table 12 lists the minimum E-Values associated with these changes as 1.60 and 1.98 for cannabis decriminalization and full cannabis legalization respectively (at the bottom of the Linear Regression part of Table 12).

Main results
The main results of this study confirmed that total Pediatric cancer rates have risen significantly nationally across USA and this trend holds for the commonest pediatric malignancies the leukaemias, Non-Hodgkins lymphoma, localized and distant sarcoma and testicular cancer. It was important to note across this period that the use of tobacco, alcohol use disorders, cocaine and analgesic abuse declined as measured in major national surveys whilst cannabis use alone was rising. The level of cannabinoids identified in Federal seizure data also rose for most cannabinoid analytes. TPCIR rose strongly and significantly as a function of cannabinoid exposure, but only weakly and non-significantly in bivariate analysis in relation to cannabis itself. TPCIR was significantly higher in the two highest cannabis use quintiles both overall and across time. Inverse probability weighting was used to equilibrate cannabis exposure across the cohort. Indices of ethnic cannabinoid exposure and seizure cannabinoid concentrations were variously used as instrumental variables to adjust panel models. Cannabis use was independently associated with TPCI R in additive robust marginal structural, mixed effects, panel and geospatiotemporal models. Cannabis use was independently associated with TPCIR in interactive mixed effects and geospatial models. Cannabis use was linked with TPCIR in various interactions in linear models, robust marginal, mixed effects, panel and geospatial models. Cannabis was independently linked with TPCIR in geospatial models lagged to zero, 1 and 6 years and featured in interactions lagged to 1,2,4 and 6 years. When the cannabinoids THC and cannabigerol were studied they were also linked with TPCIR at high levels of statistical significance at zero, 2, 4 and 6 years of lag.
On sensitivity analysis 49 of 56 minimum e-Values were above 1.25 which is a quoted threshold for likely causal relationships. Similarly 31 of 33 geospatial e-Values were above this threshold. The highest finite minimum e-Value was 4.14 × 10 89 . Six minimum e-Values were infinity.
The recent trend to cannabis liberalization was associated with elevated TPCIR both as a group and as an acceleration of the time-dependent trend in cannabisliberal states.
Our interpretation of these highly consistent and concordant findings obtained by several methodologies with instrumental variables, controlling for ethnic cannabinoid exposure, utilizing robust regression techniques, inverse probability weighting with high levels of association across both space and time together with very high e-Values is that the relationship of cannabinoid exposure to total pediatric cancer incidence fulfills the criteria of causality and explains the increasing rates of pediatric cancer under cannabis-liberal legislative paradigms, and that this statement is especially true for THC and cannabigerol the two cannabinoids which show the most consistent rises over time. Hence our study is closely concordant with other published series on the link between pediatric cancer and cannabis use [7][8][9][10][11].

Statistical comments and causal assignment
It is worth considering briefly the incisive logical power of space-time regression and commenting concisely on the theoretical underpinning of formal causal inferential techniques. To say that two variables are statistically associated carries a certain weight. To say that two variables are closely associated when their distribution is considered across both space and time simultaneously is strongly suggestive of a presumptively causal relationship.      Nineteen spatiotemporal models were presented. In seventeen the spatial error coefficient rho was significant. In eighteen the spatial error autocorrelation coefficient lambda was significant. And spatial errors adjusted in the manner of Kapoor, Kelejian and Prucha consistently had higher precision than those adjusted by the algorithm of Baltagi. The Kapoor, Kelejian and Prucha adjustment accounts for correlation between spatially correlated outcomes in addition to spatially correlated exposures. The presence of clear evidence for spatiotemporal progression of the risk factors (cannabis exposure) together with the many associations shared by the four US regions make spatiotemporal autocorrelation in both the exposure and the outcome a reasonable analytical presumption. Together this is indisputable evidence of effects operating in a spatially distributed manner, and represents in the data analytical environment a reflection of the orchestrated campaign across USA to legalize cannabis which operated in a coordinated manner from the west coast eastwards.
Some comments in relation to casual inference and causal assignment are pertinent. Inverse probability weighting is a method which is well established to correct for inconsistent exposures amongst groups. It is enjoys a strong theoretical and epidemiological evidence base [58]. One of the most serious and common limitations of observational studies is where differential exposure to the risk factor occurs differs across experimental groups. In the typical experimental scenario if the exposure of interest occurs differently between the control and treatment groups then the effect of treatment is necessarily confounded by the non-random risk exposure. The established technique of inverse probability weighting overcomes this major obstacle by having the effect of evening out the exposure of interest across all the groups therefore transforming a merely observational and potentially biased study into a pseudo-randomized trial design where causal inferences can more properly be drawn from group comparisons. The techniques of inverse probability weighting can also be extended to studies where the exposure occurs along a continuum as in the present study. One notes that all of our mixed effects, robust regression and panel models were inverse probability weighted so that they all enjoyed the advantage of this powerful modern innovation. Analysis of such models therefore allows truly causal conclusions to properly be drawn.
Similarly E-Values were recently introduced in a formal way to quantitate extraneous uncontrolled confounding from unmeasured covariates and provides a quantitative magnitude to the level of association required of unknown factors with both the exposure and the outcome to remove the impact of any described association [59]. The E-Value is expressed on the risk ratio scale. It is a classical criticism of multivariable studies that potentially the inclusion of other covariates beyond those which were measured might account for the observed effect. By quantitating the magnitude of the association required with both the exposure of interest and the outcome observed E-Values provide a quantitative measure of the magnitude of the effect which would be required. Very large E-Values necessarily imply that in the absence of some known major confounder uncontrolled confounding becomes exceedingly unlikely and the reported effect becomes more likely to be truly causal in nature. That is large E-Values are associated with truly causal effects [56,59]. The published literature on E-Values reports that levels in excess of 1.25 are generally taken in the literature as implying causal relationships [56]. By comparison the E-value for the relationship between tobacco smoking and lung cancer is 9.0 [60]. This is considered a very large effect [56,59]. Hence in our study where 49/53 E-Values were > 1.25, 33 were > 5 and six were infinite discussion of truly causal effects is also entirely appropriate. One notes further that data on some covariates, such as environmental pollution, dietary habits, education, parental age and prematurity rates was not available to the present analytical team. However in view of the very large size of the E-values presently reported it is felt to be highly unlikely that inclusion of further covariates would substantially alter the major conclusions of the present investigation. Naturally we would however be keen to see our studies extended by other groups who have access to more comprehensive datasets defined across space and time.
Our argument for causality relies not just upon the strength of the individual components of the cumulative case but upon their synergistic and syllogistic supporting and reinforcing relationship with each other.

Pathways and mechanisms
Of pivotal importance in linking associational findings with causal pathways is the issue of biological plausibility and the cellular and molecular pathways which might connect the exposure of interest with the outcome of concern. The subject of the pro-oncogenic activities and potential of cannabis, cannabis smoke and cannabinoids is complex major papers have addressed this issue [14, 26, 28, 32, 34-36, 42, 61-67]. In this paper we will provide a brief and concise overview of what presently seem to be some of the most important pathways which are likely to be implicated. They will be described under nine headings of: gametotoxicity, genotoxicity, epigenotoxicity, mitochondriopathy, immunomodulation, pro-aging, endovascular ischaemiahypoxia, sympathetically mediated effects on stem cell niches and non-linearity of the dose-response genotoxic effect curve. These domains are not independent but are themselves interdependent and intricately intertwined. Whilst most of the following observations have been experimentally defined the logical sequence has been filled out where this seems reasonable and concordant with the evidence base.
Cannabinoids have been detected in seminal fluid and have been linked with DNA nicking and fragmentation, abnormal sperm nuclear size, gross abnormalities of sperm morphology including sperm fragmentation, disordered DNA packing and re-packing, disorders of protamine synthesis, histone-protamine substitution and major disruption of sperm DNA methylation [15-17, 31, 37, 62, 68, 69]. Cannabinoids have been found in Graafian follicle and oviduct fluid and have been linked with oocyte nuclear blebbing, nuclear bridging, chromosomal fragmentation and large scale oocyte loss after the second meiotic cell division [14,15,17]. Cannabis smoke is known to contain all of the carcinogens of tobacco smoke including many tars and carcinogens including aromatic amines, polycyclic hydrocarbons, and tars [70]. Cannabinoid exposure has been linked with nuclear bleb and chromosomal bridge formation, chromosomal missegregation at the anaphase separation, micronucleus formation [71], transposon activation and chain and ring chromosome formation [14,32,34]. Cannabidiol, Cannabinol and THC have been implicated in in chromosomal translocation formation to the same level seen with cytotoxic drugs [13]. Cannabidiol and cannabidivarin have been shown to cause double stranded DNA breaks, micronucleus formation and nuclear buds and bridges in human cells which is worse under oxidative stress [67]. Cannabinoid-induced micronucleus formation is very important as it has been identified as a major engine of catastrophic damage to the genetic material and one-step chromothripsis, chromoanagensis and oncogenic transformation [61,72,73]. Cannabinoid exposure has been linked with large scale perturbation of DNA methylation, gross defects in histone synthesiswhich necessarily leave DNA more open and available for transcription which is a pro-oncogenic statealtered histone signalling, and an inhibition of ATP supply to genetic and epigenetic processesmost of which are energy dependentand an inhibition of epigenetic substrate supply [31,33,35,37,62,74]. Together these changes may be expected to advance the "epigenetic clock" which is believed to be one of the key determinants of cellular aging [75,76]. The profound implications of major epigenetic reprogramming were highlighted by a recent paper noting that despite the short half life of immune cells in the circulationjust a few days -the cellular basis for long lasting immunity is actually epigenetic changes in long lived myeloid precursor cells which record metabolic and immune activation responses in the coordinated patterns of their enhancers, promoters, long non-coding RNA's, DNA methylation and histone codes which determine chromatin conformation and the assembly of topologically transcriptionally active domains which functionally facilitate secondary responses to infection and vaccines [77,78]. The outer mitochondrial membrane not only possess CB1R's, but indeed the whole of the cannabinoid signalling transduction machinery found in the plasmalemma also resides in the inner and outer mitochondrial membrane and within the intermembrane space so that cannabinoids are an important direct modulator of metabolic state [79][80][81][82][83]. Several adverse mitochondrial processes are well described including a reduction in the transmembrane potential across the inner mitochondrial membrane, a reduced synthesis of key oxidative phosphorylation substrates including the F1-ATPase, increased electron shunting via uncoupling protein 2 activation, gross mitochondrial damage and swelling and impairment of mitonuclear cross-talk and mitonuclear genomic coordination [17,[84][85][86][87][88].
There is a rich literature describing both the pro-and anti-inflammatory actions of cannabinoids. In this context the proinflammatory CB1R-mediated activities seem to be especially important [89] as chronic inflammation is a well established cause of cancers in many tissue beds and occurs by many mechanisms. One pathway of particular interest is that cytoplasmic inflammation stimulates the transposons or "jumping genes" of the genome, to start "jumping" mobile segments and creating genomic havoc. Micronucleus disruption releases double stranded DNA into the cytoplasm where it potently stimulates the cytoplasmic GMP-AMP -STimulator of INterferon Gamma (cGAS-STING) pathway which further intracytoplasmically stimulates inflammation via interferon-γ and innate immune signalling and destabilizes the genome [90][91][92]. The immunosuppressive activities of cannabinoids may depress the immune response to the developing field change and nascent tumours. This cycle could potentially explain the many case reports of cancers occurring in adults at a younger age than usual and with increased aggressiveness in heavily cannabis exposed patients [93][94][95][96].
Cannabis exposure has been found to accelerate organismal cardiovascular aging clinically [97]. Cannabinoids are known to inhibit stem cell division [34,98]. This combination of impaired stem cell activity, reduction of mitochondrial energy generation and a proinflammatory milieu are all hallmarks of cellular ageing and the senescence-associated secretory phenotype [99][100][101] of growth factors and cytokines which is presumably stimulated and a key hallmark of aging. Aging of course is the leading risk factor for most adult tumours. In the light of the foregoing cellular changes it would seem that the quality of cannabinoid-exposed gametes may be broadly seen as defective and they may thus be said in general terms to likely be "aged" in metabolic, epigenetic and genetic terms. Cannabinoids are known to have important effects on the microvasculature and can induce tissue ischaemia [102][103][104][105] which is an important determinant of the hypoxic microenvironment which stimulates genomic instability and oncogenesis and promotes nascent and mature tumour growth.
Cannabis addiction is known to feature periods of cannabinoid withdrawal marked by agitation and manifest sympathetic hyperstimulation [106]. Sympathetic stimulation has been shown to have direct adverse activities on the stem cell niche of the hair follicle [107] and likely acts similarly in other stem cell niches.
Arguably the most concerning feature of this literature is the apparent threshold effect beyond which genotoxic and mitochondriopathic changes emerge relatively abruptly. This implies that the exponential dose-effect curve seen in many genotoxic assays for cannabinoids [35,63,108,109] can appear to be functionally an abrupt discontinuity in the dose-response curve at the epidemiological level. At the community level this implies that a doubling of daily cannabis use, as has been documented in USA in recent years [110], might reasonably be linked with a disproportionate response in genotoxic downstream sequaelae such as congenital anomalies including transgenerationally transmissible carcinogenesis.
From this brief overview it is apparent that a plethora of cellular oncogenic mechanisms exist linking exposure to cannabis smoke, cannabis and cannabinoids to the processes of carcinogenesis.
In 1965 Hill described nine criteria as being required of any association in order to assign causality to the relationship. Strength of association, consistency amongst studies, specificity, temporal sequence, coherence with known data, biological plausibility a biological response or dose-response curve, analogy with similar situations elsewhere and experimental confirmation were key features [111]. It will be noted that the above analysis, including the published literature and the cited experimentally demonstrated mechanistic links, fulfill all of these criteria for the relationship between cannabis exposure and TPCIR .

Generalizability
Our data are population level data derived from publicly available datasets from one of the world's most technologically advanced nations. The underlying population is also substantial. Given that our findings are robust to various different methods, fulfill criteria for causality and are consistent with the majority of the published work in the area we believe that our findings are robust and widely generalizable. However as it is clear that cannabis use is in a state of flux worldwide at the present with rises in the prevalence of use, intensity of use, and concentration of product we feel that it is important that on-going studies be conducted in this area to monitor the situation at higher levels of geospatial resolution.

Future directions
Further extensions of this work might include detailed dissection of the molecular and cellular level of the pathways mentioned particularly relating to mitochondrial cannabinoid signalling, mitochondrial electron leaks and shunts, free oxyradical flux, perturbation of mitonuclear cross-talk, cannabinoid induced disruption of metabolic supply of epigenetic substrates, cannabinoid-related disruption of histone synthesis and signalling and the histone code generally, cannabinoid epigenotoxicity generally and heritable and transgenerational epigenotoxicity specifically, proinflammatory cannabinoid actions, microvascular-disrupting and hypoxiainducing actions, chromosomal mis-segregation and anaphase disruption and the interaction of cannabinoids with the cGAS-STING cytoplasmic signalling pathway. Research into cannabinoid interactions with the germ cells, oocytes and sperm, is clearly of primary and foundational importance to these concerns and should be up-prioritized on research agendas. Analytically higher resolution space-time modelling based on more detailed datasets from CDC and SAMHSA is an obvious task for the near future. The incorporation of instrumental variables and inverse probability weights into the space-time and spatiotemporally lagged models of plm, splm and similar software would allow all the questions of interest to be addressed in a single modelling framework without the need for multiple model types as was necessitated in the present report and would likely only require minimal resources to enable the required programming code to be written for this very impressive, sophisticated and highly flexible software to be further optimized.

Strengths and limitations
Our study has several strengths including using data from a very populous nation, the use of publicly available datasets, the use of different statistical techniques, the application of inverse probability weighting and e-Values, two mechanisms well established in the causal epidemiological literature, the use of geospatiotemporal regression techniques with complex random error structures, the use of models lagged both spatially and temporally, the use of a variety of covariates, consideration of substance-exposure indices which is often absent from many studies, the use of various instrumental variables, the availability of a relatively lengthy panel data series for 15 years, and correction for ethnic cannabis exposure as a major underlying confounding factor. The absence of geospatial techniques from much of cancer epidemiology appears to be a major knowledge gap which the present study begins to redress. It may also be argued that for research enterprises to consume significant public resources but never be able to provide actual causal advice to their host community at once stretches public credulity and tests their patience, particularly when well established methodologies are available which can be used to fill this major knowledge gap. The deliberate application of the techniques of formal causal inference in this study thus comprises a major strength. The study's major limitation relates to the unavailability of individual patient-level data which is a common limitation amongst epidemiological studies. Due to the complexity of the present analysis we have not considered further subgroup analyses, either of individual tumours, or by fascinating sex or ethnic incidence differences. All of this remains to be done at higher geospatial resolution by subsequent investigators. Data relating to other covariates such as levels of environmental pollutants, parental age at childrens' birth, dietary changes and prematurity rates was not available to the present study. These areas remain to be addressed by subsequent researchers. On a methodological note one notes that since cannabis use an effect both educational attainment and occupational achievement one must be careful to avoid over-controlling or the use of "collider variables" in such future regression studies [60,112].

Conclusion
In summary our study confirms previous reports in the literature linking cannabis exposure with pediatric and testicular cancer [7][8][9][10][11][18][19][20][21][22] and answers both our opening hypotheses affirmatively. We extend and amplify earlier reports in many ways including with the use of national cancer census data and widely cited nationally representative drug use surveys, the application of geospatial techniques and the formal techniques of causal inference to the data series and various technical refinements including the use of several sets of instrumental variables and various forms of inverse probability-weighted and spatially weighted regression matrices and robust, panel and linear multivariable techniques. After including socioeconomic, ethnic and drug use variables we find robust associations across space and time for cannabis use and TPCIR and that cannabis, and particularly the cannabinoids THC and cannabigerol, are independently and interactively associated with TPCIR both in de novo space-time grids and in spatially and temporally lagged models. Moreover very high e-Values clearly indicate that the relationship cannot be explained away by unmeasured, unknown or hypothetical confounding variables. This analysis is consistent with five previously reported series comprising the majority of the published literature in the field [7][8][9][10][11], dozens of potential experimentally described mechanistic pathways and fulfill the paradigmatic Hill criteria of causality [111]. Findings are also consistent with reports of elevated rates of congenital anomalies following prenatal cannabis exposure [25-28, 42, 43] and thus are broadly concordant conceptually with wide ranging and far reaching heritable cannabinoid-related genotoxicity. Our analysis also begins to provide insights into the previously mysterious major differences in cancer incidence between various ethnicities by indicating that varying ethnic exposures to cannabinoids are of particular concern. It is important that this thread be further explored in the future. Such formal demonstration of strong evidence of a presumptively genotoxic cannabiscancer causal link is highly relevant for the ongoing and currently controversial story of the relationship of cannabis use with malignant tumourigenesis in adults. Strong evidence of a robust causal relationship of cannabis exposure to pediatric and thus transgenerational inheritable genotoxicity carries far reaching implications for the ongoing public debate relating to the most appropriate forms of regulation of cannabis and cannabinoids. Moreover the present analysis powerfully informs the broader discussion regarding cannabis-related genotoxicity as it relates to adult tumourigenesis and many congenital anomalies encountered at birth [25-28, 42, 61, 62]. required to access the data which was used and collated in this study, e.g. NSDUH study. Data including shapefiles and R programming script is made publicly available on the Mendeley Data Archive at this URL: URL: https://doi. org/10.17632/cnwv9hdspd.1.

Ethics approval and consent to participate
The Human Research Ethics Committee of the University of Western Australia provided ethical approval for the study to be undertaken 7th June 2019 (No. RA/4/20/4724). Ethics approval was not required to access the data in the first instance. However Ethical approval provided permission to access, analyze and publish all the data obtained.

Consent for publication
Not applicable.