Factors associated with Anaplasma spp. seroprevalence among dogs in the United States

Dogs in the United States are hosts to a diverse range of ticks and tick-borne pathogens, including A. phagocytophilum, an important emerging canine and human pathogen. Previously, a Companion Animal Parasite Council (CAPC)-sponsored workshop proposed factors purported to be associated with the infection risk for tick-transmitted pathogens in dogs in the United States, including climate conditions, socioeconomic characteristics, local topography, and vector distribution. Approximately four million test results from routine veterinary diagnostic tests from 2011–2013, which were collected on a county level across the contiguous United States, are statistically analyzed with the proposed factors via logistic regression and generalized estimating equations. Spatial prevalence maps of baseline Anaplasma spp. prevalence are constructed from Kriging and head-banging smoothing methods. All of the examined factors, with the exception of surface water coverage, were significantly associated with Anaplasma spp. prevalence. Overall, Anaplasma spp. prevalence increases with increasing precipitation and forestation coverage and decreases with increasing temperature, population density, relative humidity, and elevation. Interestingly, socioeconomic status and deer/vehicle collisions were positively and negatively correlated with canine Anaplasma seroprevalence, respectively. A spatial map of the canine Anaplasma hazard is an auxiliary product of the analysis. Anaplasma spp. prevalence is highest in New England and the Upper Midwest. The results from the two posited statistical models (one that contains an endemic areas assumption and one that does not) are in general agreement, with the major difference being that the endemic areas model estimates a larger prevalence in Western Texas, New Mexico, and Colorado. As A. phagocytophilum is zoonotic, the results of this analysis could also help predict areas of high risk for human exposure to this pathogen.


Background
Dogs are susceptible to infection to numerous tick-borne rickettsial pathogens including Anaplasma phagocytophilum, the etiologic agent of granulocytic anaplasmosis in people, dogs, horses, sheep and other animals [1]. A closely related pathogen, A. platys, causes infectious cyclic thrombocytopenia in dogs and cross-reacts with antibodies to A. phagocytophilum. Clinical signs of canine granulocytic anaplasmosis range in severity, but commonly include fever, thrombocytopenia, lethargy, and polyarthritis, while infectious cyclic thrombocytopenia, caused by A. platys, is generally considered a mild disease except when co-infection exacerbates other diseases such as ehrlichiosis [2]. People with A. phagocytophlium infections may have flu-like symptoms, but rashes are rare, unlike other tick-borne zoonoses such as Lyme disease or Rocky Mountain spotted fever [3]. Although considered a low risk for human infection, a recent case report suggested A. platys might also be zoonotic [4].
In the United States, Ixodes scapularis (the blacklegged tick) and Ixodes pacificus (the western blacklegged tick) are considered the primary vectors of A. phagocytophilum. Ixodes scapularis is found in at least 32 states in the eastern and central states, while I. pacificus appears limited to five western states [5], but evidence of autochthonous transmission of pathogenic strains of A. phagocytophilum to people and dogs has only been documented in the Northeast, Upper Midwest, and limited parts of the western United States [6]. Ixodes scapularis and Ixodes pacificus are also found northward into Canada. In contrast, Rhipicephalus sanguineus (the brown dog tick) is thought to transmit A. platys, although this cycle has not been confirmed in North America. The distribution of R. sanguineus is described as cosmopolitan, as these ticks can infest buildings in otherwise inhospitable climes [7]. Brown dog ticks also thrive in arid areas with high temperatures. Accordingly, populations of this tick are most intense and infestations of premises are more common in the southern United States.
Transmission by tick vectors is considered the primary means of canine exposure to Anaplasma spp., thus variation in regional risk factors is tied to presence and abundance of competent tick vectors and vertebrate reservoirs. Factors associated with the presence of tick vectors include vector amplification hosts, pathogen reservoir host population densities, climate, and topography [8,9]. Advances in testing and recording technologies have led to large datasets of diagnostic test results by county for canine exposure to Anaplasma spp. [6,10]. With support from a veterinary diagnostic company (IDEXX Laboratories, Inc., Westbrook, ME), the Companion Animal Parasite Council (CAPC) has compiled a dataset of diagnostic test results that were reported by veterinary practitioners and a network of reference laboratories within the contiguous United States. This database allowed us to conduct the first comprehensive risk factor study of canine Anaplasma spp. in North America. The CAPC also convened a workshop to identify factors that are putatively associated with canine seroprevalence of tick-borne pathogens, specifically focusing on risk factors for which data are available, so these factors could be quantitatively evaluated for predictive power with respect to spatialtemporal seroprevalence patterns [11]. The objectives of this investigation were to identify risk factors associated with canine seroprevalence of Anaplasma spp. and to incorporate these factors into a refined spatialtemporal analysis. These data allow for the creation of maps that indicate risk of Anaplasma infections of people, dogs, horses, and other wildlife.

Data collection
To spatially analyze the canine seroprevalence of Anaplasma spp., the results of 3,950,852 diagnostic tests performed during 2011-2013 were acquired by the CAPC from IDEXX Laboratories, who provided qualitative (positive/negative) results reported for each county in the contiguous United States. Test results were generated using SNAP® 4Dx® and SNAP® 4Dx® Plus Test kits (IDEXX Laboratories, Inc.) which are point-of-care ELISAs to detect antigen from or antibodies to several vector-borne pathogens. The tests were performed at both the clinic level and at reference laboratories. The performance of these test kits was reported elsewhere [12,13]. The Anaplasma portion of these tests uses a synthetic peptide from a major surface protein of A. phagocytophilum (MSP2/P44) and detects antibodies to both A. phagocytoyphilum and A. platys [13].

Data analysis
Spatial structure of canine exposure to Anaplasma spp. in the United States Two statistical smoothing techniques were applied to the data to generate a spatial prevalence map of canine exposure to Anaplasma spp. in the United States. A weighted head-banging algorithm was first used to reveal patterns in the data [14,15]. To account for counties not reporting data, kriging, an interpolation method, was subsequently used to construct a spatially complete map [16].

Risk factors
Previously, 15 posited risk factors were proposed for canine exposure to pathogens transmitted by I. scapularis, I. pacificus or R. sanguineus [11]. Of these, nine were analyzed for predictive power in explaining the observed regional canine seroprevalence. To be considered, a factor had to be quantifiable with currently available data; this limited the number of factors to climate (annual temperature, precipitation, and relative humidity), socioeconomic characteristics (human population density and household income), and local topography (surface water, forestation coverage, and elevation) [11]. Finally, nationwide county-level deer densities were not available; hence, a state-by-state estimated annual probability of deer/vehicle collisions was used as a surrogate risk factor [17]. Counties within a state were assigned the collision proportion for the entire state (Additional file 1: Figure S1). The premise was that regions with greater deer/vehicle collision reports support higher deer populations. A list of the considered factors and their sources is provided in Table 1.

Statistical methods
To assess the significance of the putative risk factors, let Y i,j denote the number of positive tests in the i th county during the j th year and n i,j the corresponding total number of tests performed. An estimate of the i th county's prevalence over the three study years iŝ Generalized linear models (GLMs) are used here with assumptions that the observed data are (1) independent and (2) follow a distribution belonging to an exponential family. For further details, see [18]. Here, it is assumed that the number of positive test results is a true random sample, obeying a binomial distribution (an exponential family member). Possible departures from this assumption are discussed later in the Conclusions. Consequently, a GLM can be formulated as where g is an invertible link function, X ij = (1, X ij1 , …, Xi jp )′ is a vector of risk factors from the i th county during the j th year, and β = (β 0,…, β p )′ is a vector of regression coefficients. Herein, g is specified to be the logistic : Models of this form are easily fit using standard statistical software. For a fixed county, it is unreasonable to assume that seroprevalence estimates are statistically independent in time. In fact, in endemic areas, infections persist in reservoir host populations; consequently, the number of positive test results from year-to-year in a given county may be highly positively correlated.
To allow for temporal correlation, a generalized estimating equation (GEE) was used to estimate regression coefficients [19,20]. GEEs are similar in form to GLMs, but account for the correlation between observations within a particular county over time by minimizing a "weighted" sum of squares to obtain parameter estimators [19,20] (GLMs minimize an "unweighted" sum of squares). To apply the GEE method a working correlation matrix has to be specified; e.g., independent, exchangeable, auto-regressive, etc. The specification of this matrix accounts for the temporal correlation within a given county. In order to prevent misspecification, an unstructured working correlation matrix was considered and its components were estimated along with the regression parameters. GEE models can be fitted using standard statistical software (e.g., SAS, Stata, Splus, and R) [21,22].
While GEE techniques account for temporal dependence within a county, they assume observations from different counties are independent. Consequently, the weighted head-banging and Kriging algorithms [23,24], which implicitly account for spatial dependence, were used to graphically display prevalence estimates. The weighted head-banging algorithm, which made use of 20 triples, was first used to smooth the county-level prevalence estimates. The weights were set as the reciprocal of the estimated standard deviation of the prevalence estimates. Thus, counties with more observations had more importance in the smoothing. Kriging was then applied to the head-banging estimates to infill counties not reporting data and to generate spatially complete prevalence maps. Kriging was implemented using the default settings within ArcGIS. Two main effects models, described below, were considered.
In describing model fits, estimated regression coefficients and their standard errors were obtained by fitting the proposed model in SAS. In order to retain model interpretability, this analysis considers only first-order models. Backward elimination was implemented, with a cutoff of 0.05, to complete model selection; i.e., the factor with the highest p-value greater than 0.05 was removed from the model at each step. Based on variance inflation factors, it was found that multicollinearity was not a significant issue. From these statistics, confidence intervals were constructed. To assess the quality of the model fit, a coefficient of determination, R 2 , is reported [25].

Endemic region and contiguous US models
Two models were posited. The first was an "Endemic Regions" model and only used data from regions where A. phagocytophilum was considered potentially endemic based on published reports and expert opinion (shown in Additional file 2: Figure S2). Although data to indicate a particular region is endemic are imprecise, we subsequently show that the conclusions are not heavily dependent on this region's definition. The second model considered was a "Contiguous US" model. Here, an indicator factor was added that demarcated whether or not a county was located within the A. phagocytophilum-endemic area (Additional file 2: Figure S2). This latter approach made use of all available data. Prevalence was highly variable and data were missing for many counties, thus, to improve map utility, these estimates were statistically smoothed using head-banging and kriging algorithms. The expected prevalence of canine exposure to Anaplasma spp. during a typical year by county is shown in Fig. 2. These data confirm that canine exposure to Anaplasma spp. was most prevalent in the Northeast, upper Midwest, northern California, and western Texas and eastern New Mexico.

Risk factor data
Several factors were significantly associated with the prevalence of Anaplasma-positive dogs, although the significant factors slightly change between the Endemic Regions and Contiguous US models ( Table 2). All factors except for water coverage were significant with 95 % confidence in the Contiguous US model. When just the endemic regions were considered, all factors except water coverage and elevation were significant with 95 % confidence. Temperature, population density, relative humidity, elevation, and deer vehicle collisions are negatively correlated with Anaplasma prevalence and precipitation, forestation coverage, and There was a significant correlation in the prevalence of Anaplasma spp. in dogs between years, regardless of the model ( Table 3). The highly positive correlations imply that regions experiencing high or low canine seroprevalence will likely experience similarly high or low proportions in the near future. Correlations between proportions two years apart were lower than those separated by one year.

Regional prevalence based on contiguous US and endemic regions models
Based on the Endemic Regions model, the highest prevalence estimates were reported for the Northeast followed by the upper Midwest, western Texas and central coastal California (Fig. 3). The Contiguous US model estimated higher prevalence in the upper Midwest but lower prevalence in Texas (Fig. 4). The model fits are summarized in Table 2. For the Endemic Regions model, prevalence estimates for counties in the endemic region were obtained from the fitted GEE model. This fit only uses data and factors for counties in the endemic regions. However, non-endemic regions were assigned the crude estimates depicted in Fig. 1 to coincide with the usual notion of prevalence (there are sporadic cases in nonendemic regions and some dogs also travel). The fitted models were similar and explain considerable structure: R 2 for the fits are 0.72 (Endemic Regions model) and 0.71 (Contiguous US model).

Conclusions
Like other tick-borne diseases in the United States, the incidence of human anaplasmosis has been increasing [26,27]. Although canine anaplasmosis is not reportable, the incidence of seropositive canine cases also appears to be increasing. Similar to Bowman et al. [6], we found the highest prevalence of Anaplasma antibodies in dogs from the upper Midwest and eastern New England. These data also correlated with areas where the highest incidence of human anaplasmosis were reported, supporting the suggestion that dogs can make useful sentinels for human risk [26,27]. Many of the dogs with antibodies reactive to Anaplasma are likely due to infection with A. phagocytophilum, given the general distribution and concordance with antibodies to Borrelia burgdorferi in dogs and human Lyme disease cases [6,26,28]. Further support comes from Qurollo et al. [29],who used A. platysand A. phagocytophilum-specific assays to find similarly low seroprevalence of both pathogens in the Southeast and West. In contrast, the prevalence of antibodies to A. phagocytophilum was significantly higher in other regions. But, notably, there were isolated areas that had unexpectedly high prevalence estimates for Anaplasma (e.g., Texas, New Mexico, and Oklahoma) where neither A. phagocytophilum nor known tick vectors are common. Possible explanations of these findings include (1) exposure to A. platys or a novel Anaplasma spp., (2) an unrecognized novel A. phagocytophilum vector-reservoir transmission cycle in that region or (3) a relatively high frequency of dogs tested that had previously traveled to endemic regions [6]. These data, while sometimes enigmatic, should not be ignored as demonstrated by similar unexplained foci in the upper Midwest, where a novel E. muris-like agent was ultimately found in association with an unexpectedly high seroprevalence of Ehrlichia spp. among dogs [6,30,31].
Data from both the Endemic Regions and Contiguous US models agreed well with each other and original serologic data. However, there were some minor differences between the two models that resulted in some regions having a higher or lower estimated prevalence. For example, the Contiguous US model had higher prevalence estimates than the Endemic Regions model in some regions of the upper Midwest (e.g., Wisconsin, Minnesota, and Illinois) where granulocytic anaplasmosis is considered endemic and other regions of the Midwest (e.g., Indiana, Kentucky, and Ohio) where granulocytic anaplasmosis is currently considered rare. Also, the Contiguous US model estimated a lower prevalence for Maine, where granulocytic anaplasmosis is common. Lastly, the Contiguous US model estimated lower prevalence in western Texas, which was arguably influenced by smaller sample sizes.
The estimated regression coefficient for the endemic risk factor in the Contiguous US model is positive and significant. This implies higher prevalence among dogs living in areas where human granulocytic anaplasmosis is endemic. Table 2 Estimates, standard errors, and odds ratios for the parameters corresponding to the factors found to be significantly associated with prevalence of canine exposure to Anaplasma spp. See Table 1  The CI column gives a 95% confidence interval for the odds ratios. Intervals not containing unity imply that the factor is significant at the 0.05 level Numerous factors were useful predictors for the seroprevalence of Anaplasma in dogs. Because rodents and white-tailed deer are important in the maintenance of A. phagocytophilum in nature, the association with increased forest coverage and decreased human population density is likely tied to suitable habitat for these critical wildlife species. Forest cover was also associated with higher prevalence of another tick-borne pathogen, E. chaffeensis, in white-tailed deer [32]. Importantly, forest fragmentation is highly associated with increasing Lyme disease incidence so these fragmented habitats will likely be important areas for A. phagocytophilum; however, the scale of this study was not fine enough to investigate edge effects [33].
Climatic variables such as temperature, precipitation and relative humidity have been associated with prevalence of ticks and tick-borne pathogens [34][35][36]. In both of our models, precipitation was positively associated with Anaplasma infections in dogs and temperature was negatively associated with prevalence. Although one previous study found no effect of precipitation on the density of I. scapularis, a more recent long-term study found that increased regional winter precipitation was associated with higher tick densities [37]. Ixodid tick survival and activity are tied to temperature, and a recent study found that I. scapularis survived better under temperatures more representative of northern states compared with those in the southern states [38]. Relative humidity is important for ixodid ticks to maintain moisture while off of the host, but both of our models found that increasing relative humidity was negatively associated with Anaplasma seroprevalence in dogs. A plausible explanation for this finding is that increased humidity may well be related to decreased tick densities. That is, higher humidity levels are conducive to mold and fungal growth to which ticks are fatally susceptible to as eggs and during molting. For example, [39,40] reported that I.ricinus densities on rodents decreased with increasing relative humidity.
The seroprevalence of Anaplasma spp. in dogs decreased as deer/vehicle collision reports increased, which was contrary to our initial hypothesis given the importance of deer to the life cycle of I. scapularis [41]. Unfortunately, this factor does not account for the rural/urban nature of the habitats or road types (e.g., secondary or tertiary) where the collisions take place; see [42] for a more in depth discussion of these issues. While further investigation is warranted to understand this negative association, other authors have also found "deer density associations" counter intuitive, see [32,40,[43][44][45][46] for some of the discussion and related literature. Another puzzling finding was the positive association of Anaplasma seroprevalence in dogs with increasing household income. It is conceivable that high Anaplasma spp. prevalence areas coincide with some of the richer areas of the United States, thus confounding the factor. While people in these richer areas may engage in behaviors that increase the likelihood of ticks feeding on their dogs, such as outdoor recreational activities, wealthier dog owners may tend to keep their pet predominantly indoors, thus minimizing their risk of acquiring ticks [47]. However, even dogs that spend only small periods of time outdoors can acquire vector-borne infections; thus, the use of tick preventives is recommended for all dogs. Dogs in poorer regions may never be taken to the vet, clearing the infection themselves or may be treated with antibiotics (and not tested). Overall, the confounding nature of socioeconomic status merits further study.
The fitted models explain much of the data, but better fits could be achieved by including additional factors. One difficulty is that these data may not have been a true random sample, with correlation existing between some of the tests conducted at the same location. A more problematic issue lies with sampling biases: dogs in different parts of the country may be tested for exposure to Anaplasma for different reasons. For example, veterinarians in the Upper Midwest and Northeast, where Lyme disease has a high prevalence, may be more likely to screen all dogs using this rapid test. However, in areas where canine anaplasmosis or Lyme disease is uncommon, it is possible that only dogs with clinical signs or with travel histories to endemic regions may be tested. Other dogs could be coincidentally tested when screened for other vector-borne pathogens (e.g., heartworm), as the SNAP 4Dx Plus Test simultaneously tests for four distinct pathogen genera. Diagnostic tests specific for exposure to A. platys and acquisition of travel histories of seropositive dogs could help answer these questions about areas where granulocytic anaplasmosis is not considered endemic. Unfortunately, such data were unavailable at the time of this study. Because of these issues, caution should be used when comparing prevalence at two different areas of the United States.
The spatial prevalence maps here should not be interpreted at too fine of a spatial scale, they are intended as rough guidance. A county's estimated prevalence is impacted by factor conditions in that county and by factor conditions in adjacent counties. For example, ticks are not expected to be numerous within New York City (say Manhattan), even though our mathematical model does not predict zero prevalence for Manhattan. Due to the zoonotic nature of anaplasmosis, one may compare the