Data on Leptospira interrogans sv Pomona infection in Meat Workers in New Zealand

The data presented in this article are related to the research article entitled “Comparison between Generalized Linear Modelling and Additive Bayesian Network; Identification of Factors associated with the Incidence of Antibodies against Leptospira interrogans sv Pomona in Meat Workers in New Zealand” (Pittavino et al., 2017) [5]. A prospective cohort study was conducted in four sheep slaughtering abattoirs in New Zealand (NZ) (Dreyfus et al., 2015) [1]. Sera were collected twice a year from 384 meat workers and tested by Microscopic Agglutination for Leptospira interrogans sv Pomona (Pomona) infection, one of the most common Leptospira serovars in humans in NZ. This article provides an extended analysis of the data, illustrating the different steps of a multivariable (i.e. generalized linear model) and especially a multivariate tool based on additive Bayesian networks (ABN) modelling.


Specifications
The data provide an extended analysis on the usage of protective equipment when working in meat abattoirs.
The data summarize important steps of a multivariate innovative approach called additive Bayesian network methodology.
The data show the effect and advantages of working with graphical models, thanks to visual representation of the interconnection and correlation between all the variables analysed.

Data
The data were collected from 384 voluntarily participating meat workers from four purposively selected sheep abattoirs in the North Island of NZ. The outcome was "Pomona" infection in meat workers and the main exposure variables of interest were "work position", the usage of "protective equipment" (PPE), "hunting", "home slaughter" and "farming". The total data set comprised of 17 variables with 15 binary and 2 continuous variables, listed in the descriptive Table 1 with their abbreviations used for the graphical model. The correlations between all these variables can be found in Fig. 1. Further variable names and their description can be found in Table 1 in [5], where the current variables "Lepto" and "Sex" have been respectively renamed "Pomona" and "Gender". A detailed description of the protective equipment worn by sheep abattoir workers in each work position category can be found in Table 2. The number of "Pomona" infected workers, stratified by each working position and by the number of protected gear worn, is shown in Table 3. Data were extensively analysed with multivariable (i.e. GLM and GLMM) and multivariate techniques (i.e. ABN: additive Bayesian networks). Table 1 Data of new infection with Leptospira interrogans sv Pomona in abattoir workers processing sheep in New Zealand: variable names and categories with their abbreviations used in the graphical model.

Variables and Categories in GLM and ABN model (Node label)
Work position 0 (Work0 1 ) 0 Not working in boning, chillers, office 1 Working in boning, chillers, office Work position 1 (Work1) 0 Not working in offal removal, pet food 1 Working in offal removal, pet food Work position 2 (Work2) 0 Not removing intestines or kidneys, not inspecting meat 1 Intestines or kidney removal, meat inspection Work position 3 (Work3) 0 Not working in yards, not stunning or pelting 1 Working in yards, stunning or pelting Abattoir 1 (A1) 1 0 Not working in Abattoir 1 (A1) 1 Working in Abattoir 1 (A1)

Experimental design
A prospective cohort study amongst 384 voluntarily participating meat workers from four purposively selected sheep abattoirs in the North Island of NZ was conducted. Study methods were described in detail by Dreyfus et al. [1,5]. Participants were blood sampled and interviewed at the same time using a questionnaire [5]). Sera were collected twice a year and tested by Microscopic Agglutination for Leptospira interrogans sv Pomona infection. New infection occurred where a worker sero-converted or had an anamnestic response [1,2,3].  1. Resulting Spearman correlation's matrix between all the 17 variables in the dataset. The variables' order is "Lepto", "Sex", "Hunt", "Farm", "Kill", "Glov, "Glass, "Mask, "Age, "Time, "Work1, "Work2", "Work3","Plant1", "Plant2", "Plant3" and "Plant4". From this first exploratory data analysis looking at the first column and last row in the data set, we can see that the "Lepto" (Pomona infection) variable is mainly linked to variables "Work3" and "Plant2", with an higher correlation (0.3) and with "Glass" with a smaller correlation (0.1).

Data on GLM and GLMM
Data were analysed using the software R, version 3.1.2 [4]. Crude associations between the risk of infection with Pomona and potential risk, protective or confounding factors, listed in Table 1 in [5], were calculated by univariable analysis. We used a multivariable generalized linear model (GLM) to test for significant risk factors for new Pomona infection, adjusting for the effect of others ( Table 2 in [5]). A multilevel generalized linear mixed model (GLMM) using abattoir as a random effect was also used, in order to evaluate the effect of clustering by abattoir on the model outcome, see Table 4.

Data on ABN model
All analyses were conducted using the R package "abn" [6]. A three-step procedure was utilized: Table 3 Number of "Pomona" infected workers, stratified by working position and protective gear worn for data of new infection with Leptospira interrogans sv Pomona in abattoir workers processing sheep in New Zealand. In bold are reported "Pomona" cases corresponding to the overall population and not to subset related to specific conditions.  Table 4 Odds ratio (OR), 95% confidence intervals (95% CI) and p-value of multivariable mixed effects logistic regression (GLMMwith abattoir as a random effect) assessing the association in meat workers between new infection with "Pomona" and the risk factors from the best fitting GLM model identified for data of new infection with Leptospira interrogans sv Pomona in abattoir workers processing sheep in New Zealand..     (1) The first step was to find an optimal model (Fig. 2), [7][8][9] using an order based exact search method [9]. The best goodness-of-fit to the available data was computed using the marginal likelihood method (Fig. 3) [6]. (2) In the second step, the model was adjusted by checking it for over-fitting [10,11] using Markov chain Monte Carlo (MCMC) simulation implemented in JAGS ('just another Gibbs sampler') [10,11]. A visual check of the marginal densities estimated from the initial ABN model (Fig. 2) was conducted, in order to verify that the posterior densities integrate to one (Fig. 4). Simulated datasets were generated with MCMC as iterations of an identical size as the original one, from the optimal model found in step one. It was repeated 2560 times (Fig. 5), arcs not covered at least 50% (dashed lines in Fig. 2) were retrieved from the final globally optimal ABN. (3) In the third step of ABN analysis, the marginal posterior log odds ratio and 95% credible intervals were estimated for each parameter from the posterior distribution (Fig. 6), expressed by the ABN model identified at the second step ( Fig. 1 in [5]).