Dataset on the spatial distribution of groundwater quality for pH, Electrical Conductivity (EC), Total Hardness (TH), Ca2+, Mg2+, HCO3−, F−, and NO3− in Dodoma, Singida, and Tabora regions located in central Tanzania

Groundwater is an important source of water for drinking and irrigation purposes in semi-arid regions like central Tanzania. Groundwater quality is degraded by anthropogenic and geogenic pollution. Anthropogenic pollution depends on the disposal of contaminants from human activities into the environment, which can leach and pollute groundwater. Geogenic pollution depends on the presence and dissolution of mineral rocks. High geogenic pollution is observed in aquifers that are rich in carbonates, feldspars, and mineral rocks. Consumption of polluted groundwater has negative health effects. Therefore, protection of public health necessitates the evaluation of groundwater in order to identify a general pattern and spatial distribution of groundwater pollution. A literature search uncovered no publications that describe the spatial distribution of hydrochemical parameters across central Tanzania. Central Tanzania is located within East Africa Rift Valley, Tanzania craton and is made up of Dodoma, Singida and Tabora regions. To fill the gap, this article contains a data set for pH, Electrical Conductivity (EC), Total Hardness (TH), Ca2+, Mg2+, HCO3−, F−, and NO3− from 64 groundwater samples collected from Dodoma region (22 samples), Singida region (22 samples) and Tabora region (20 samples). Data collection covered a total distance of 1344 km, which was divided into east-west along B129, B6, and B143 roads and north-south along A104, B141, and B6 roads. The present dataset can be used to model the geochemistry and spatial variation of physiochemical parameters across these three regions.


a b s t r a c t
Groundwater is an important source of water for drinking and irrigation purposes in semi-arid regions like central Tanzania. Groundwater quality is degraded by anthropogenic and geogenic pollution. Anthropogenic pollution depends on the disposal of contaminants from human activities into the environment, which can leach and pollute groundwater. Geogenic pollution depends on the presence and dissolution of mineral rocks. High geogenic pollution is observed in aquifers that are rich in carbonates, feldspars, and mineral rocks. Consumption of polluted groundwater has negative health effects. Therefore, protection of public health necessitates the evaluation of groundwater in order to identify a general pattern and spatial distribution of groundwater pollution. A literature search uncovered no publications that describe the spatial distribution of hydrochemical parameters across central Tanzania. Central Tanzania is located within East Africa Rift Valley, Tanzania craton and is made up of Dodoma, Singida and Tabora regions. To fill the gap, this article contains a data set for pH, Electrical Conductivity (EC), Total Hardness (TH), Ca 2 + , Mg 2 + , HCO 3 − , F − , and NO 3 − from 64 groundwater samples collected from Dodoma region (22 samples), Singida region (22 samples) and Tabora region (20 samples). Data collection covered a total distance of 1344 km, which was divided into east-west along B129, B6, and B143 roads and north-south along A104, B141, and B6 roads. The present dataset can be used to model the geochemistry and spatial variation of physiochemical parameters across these three regions.

Value of the Data
• The data set provided allows the use of pH, Electrical Conductivity (EC), Total Hardness (TH), Ca 2 + , Mg 2 + , HCO 3 − to elucidate the spatial distribution and dissolution of carbonate rocks within each region and across central Tanzania.
• The dataset can be reused to model the regulation of F − from TH, HCO 3 − , and pH within each region and across central Tanzania.
• The dataset can be used as a tool to model the geochemistry of the underlying groundwater aquifer in central Tanzania. • The dataset can be used as a tool to NO 3 − pollution in central Tanzania

Objective
A literature search shows that in central Tanzania the Dodoma region is the most studied, followed by Singida, and very little data was found for the Tabora region. Dodoma and Singida regions were reported to have elevated levels of Ca 2 + ,Mg 2 + , Na + , HCO 3 − , Cl − ,SO 4 2 − , NO 3 − and F − [2][3][4] . F − levels were found to be lower than levels reported in Arusha and Manyara regions located northeast of central Tanzania [5] . The sources of NO 3 − were reported to be anthropogenic [6][7][8][9][10] . In the Dodoma region, mineralization was found to exist in the gradient and concentrated to the south-southeast [3] . No data were found that assessed the spatial distribution of physiochemical parameters across Dodoma, Singida, and Tabora regions from the north to the south and east to the west. Lack of data makes it impossible to elucidate the spatial distribution of mineralization and identify which areas contain the highest and lowest levels of groundwater pollution across central Tanzania. The datasets presented in this work were covers all three regions and were collected along the B129, B6, and B143 roads and the A104, B141, and B6 roads, which cross these three regions from east to west and from north to south, respectively. The datasets generated were for pH, EC, TH, Ca 2 + , Mg 2 + , HCO 3 −, F − , and NO 3 − .

Data Description
The datasheet that contains the raw data presented in this work is deposited in the Mendeley Data repository under the file name "Raw data submitted" [1] . The first sheet named "Calibration Curves" shows the calibration curves that were used for the levels of electrical conductivity, NO 3 − and F − . The second sheet, named "physiochemical parameters," shows the GPS coordinates, the names of the roads where the samples were collected, the names of the area where the samples were collected, and the values of their physiochemical parameters. The third sheet, named "Pearson correlation," shows Pearson correlation tables from data obtained in Dodoma, Singida, and Tabora regions. This sheet can also be used to see the rows, columns, and formulas ( = PEARSON(Array1, Array2)) that were used to determine the correlations.
The second sheet with name Physiochemical parameters are presented in the Excel sheet named physiochemical parameter. Pearson correlation tables and figures are presented in the Excel sheet named Pearson correlation and figures respectively. The fourth sheet, named " Figures," shows the presentation of raw data in figure form. This sheet shows how the figures were constructed from the raw data, also shown in the same sheet.
The statistical summary, which shows mean, standard deviation, maximum, minimum, WHO guidelines, and the percent of samples that are above the WHO guideline, is presented in Table 1 . Fig. 1 a and b show the distribution of EC. Fig. 2 a and b show the distribution of TH. Fig. 3 a and b show the distribution of HCO 3 − . Fig. 4 a and b show the distribution of F − . Fig. 5 a  and b show the distribution of pH. Fig. 6 a and b show the distribution of NO 3 − . Tables 2-4 shows Peterson correlation tables from Dodoma, Singida, and Tabora regions respectively.

Study Area Description
Central Tanzania is made up of three regions, which are Dodoma, Singida, and Tabora ( Fig. 7 ). This region has major roads that cross it. In east-west directions, the major roads are B129, B6, and B143, and in north-south directions, the major roads are A104, B141, and B6. The map that shows these major roads and sample location sites that border each region is shown in Fig. 8 . According to 2012 population and housing census these regions make up 12.8% of the population of Tanzania (44.9 million) and have annual growth rates of 2.1%, 2.3%, and 2.9%, respectively. The population of Dodoma, Singida, and Tabora were reported to be 2083588, 1370637, and 2291623. The region is mostly a plateau, 1100 m above sea level, and its geolocation is between the East-     [11][12][13] . This region is tectonically active and contains fractured crystalline basement rocks [8] . Rocks that are found in this region include granodiorite, basalt, metavolcanic, granitoids, and granitic gneisses [14] . In the Dodoma region, where data are available, Ca 2 + + Mg 2+ was reported to exceed Na + + K + and the dominant hydrochemical facies is reported to be a mixed Ca-Mg-Cl-SO 4 [15] .

Analytical Procedures
EC, NO 3 − , and F − , and pH were analysed once. Ca 2 + , Mg 2 + , and HCO 3 − were determined using titration methods. Three titrations were done for each sample, and the average value is reported in the dataset. All volumes obtained from titrations were within the error margin of 2.0% of each other. pH and EC measurements were done using a High Range pH, Conductivity, and TDS Tester from Hanna Instruments (HI98130). Each sample was measured once for pH and once for EC. pH measurements was done after the instrument were calibrated with 4.0 ± 0.05, 7.00 ± 0.05 and 10.00 ± 0.05 buffer solutions. The instrument was washed with deionized water (DI) between calibration and sample testing in order to minimize cross contamination. Before EC measurements and the sample analysis, an instrument was calibrated by using a certified conductivity standard solution of 1413 μS/cm also from Hanna Instruments (HANNA HI7031) followed by a standard solution of KCl in the concentrations of 0.5 mM, 1.0 mM, 5.0 mM, 10 mM, 20 mM, and 30 mM. The phenol disulfonic method was used to determine the concentration of NO 3 − ions [16] . NO 3 − concentrations of 0, 10, 20, 30, and 40 mg/l from standard solution were used to prepare the standard curve. Both standards and samples were taken at 410 nm. Measurements were done using a Cary 60 Uv-vis spectrophotometer from Agilent Technologies. In a conical flask, 10 mL of water sample was placed, and 25 mL of nitrate-extracting solution was added and shaken for 10 min. It was then allowed to settle for 2 min and filtered through No. 42 filter paper. A clear solution (10 mL) was pipetted into a conical flask. Then it evaporated to dryness in the oven for 24 h at a temperature of 95 °C. After dryness, the residue was cooled, and then 2 mL of phenol disulphonic acid was added rapidly, covered, and shook gently so that the reagent came into contact with the residue. After 10 min, 17 mL of cool water was added and rotated in the flask to dissolve the residue. While the flask was still cool, drops of NH 4 OH (16 mL) were added until the yellow colour was observed. After that, the volume was adjusted to 50 mL, and the solution was well mixed by gently shaking before the concentration of NO 3 − was measured. The SPADNS method was used to determine the concentration of F − ions.      The standard curves used to determine F − concentrations were 0.0, 0.5, 1.0, 1.5, 2.0, and 2.5 mg/l obtained from commercially available F − standard solutions. Measurements were taken using a Hach DR900 handheld colorimeter (DR/890 colorimeter). A complexometric titration was used to determine TH using standard EDTA (0.01 N), 1 ml of buffer solution (1.179 g of EDTA and 0.78 g of MgSO 4 ·7H 2 O), 143 mL of ammonium hydroxide, diluted to make a volume of 250 mL, and 6 drops of Eriochrome Black T (EBT). The end point was determined by the colour change from pink to blue. The volume of samples used for the calculation was the mean value of three titrations. The formula used to determine TH is CaCO 3 mg/l. TH as CaCO 3 mg /l = volume of EDTA x Molarity of EDTA x 50 ×10 0 0 volume of ml of sample taken for titration.
Alkalinity was determined using the formula below.
Alkalinity (HCO 3 − ) mg/l = volume of H 2 SO 4 × Molarity of H 2 SO 4 ×molar mass CaCO 3 × 10 0 0 Volume of sample 10 mL of water sample was titrated against 0.05 M of sulphuric acid and three drops of POP indicator. Addition of POP indicator first followed by MO indicator caused the solution to remain colourless first followed by the colour to change from colourless to yellow and, when titrated with 0.05 M sulfuric acid, turned reddish pink, indicating the endpoint. Each sample was titrated three times and the mean value of sulphuric acid was used to determine HCO 3 − . At pH less than 8.3 which covers all samples presented in Table 1 the most predominant alkaline ion is bicarbonate (HCO 3 − ) ions while carbonate (CO 3 2 − ) and hydroxide (OH − ) are present at an insignificant amount.
Data processing, preparation of graphs and coefficient were done using MS Excel 2010.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Dataset on the spatial distribution of groundwater quality for pH, EC, TH, Ca2+, Mg2+, HCO3-. F-, and NO3-and in Dodoma, Singida, and Tabora regions located in central Tanzania

Ethics Statement
This work did not involve any human subjects, or animal experiments. The dataset presented in this work was not collected from any social media platforms.