Distributional Characteristics of Selected Chemical and Environmental Variables: Data from NHANES 2003-2004

Objective: Log-transformations are commonly used to normalize chemical data. However, log-transformations do not always normalize the data. Thus, the objective of this study was to recursively use Tukey’s exploratory techniques to erect fences towards the data extremes until normality or near normality was achieved for the data lying within these fences. Design: Data from National Health and Nutrition Examination Survey for the period 2003–2004 for 27 variables were used to conduct this study. Some of the 27 variables included for this study were: serum folate, serum transferrin receptor, urinary perchlorate, serum polychlorobiphenyl (PCB) 44, PCB-28, PCB-87, and PCB-52. Tukey’s exploratory techniques were recursively used to erect fences towards the data extremes until normality or near normality was achieved for the data lying within these fences. Following this, robust techniques were used to estimate statistical parameters for the reduced data lying within these fences. The statistical properties of the reduced data so obtained were evaluated and compared with the original log-transformed data. Setting: Cross-sectional data from National Health and Nutrition Examination Survey (NHANES) for the period 2003–2004 for 27 variables. Subjects: 1790 to 8363 depending up on the variable of interest who participated in NHANES 2003-2004. Results: The use of non-normal data for statistical analysis can lead to underor overestimation of the measures of central tendency (means and geometric means) depending upon the comparative mix and magnitude of the observations that are identified as potential outliers and trimmed from the lower and upper tails of the original distributions to achieve normality. The standard deviations are always over-estimated and the widths of the confidence intervals around the means are over-estimated. Additional insights into the demographic characteristics of those which were trimmed from extreme tails can be very valuable. Conclusion: To obtain correct estimates of descriptive data, it is worthwhile to temporarily trim certain percent data (probably, < 5%) to achieve normality or near normality. An evaluation of these trimmed data can provide insight into the characteristics for a given variable of the persons who have too low or too high concentrations of the chemicals of interest.


Introduction
The distributions of most, if not all, chemical and environmental variables are characterized by a few relatively large measurements, or in other words, distributions of chemical and environmental variables are positively skewed. For this reason, data for chemical and environmental variables are assumed to be log-normally distributed even though not all positively skewed distributions can be considered to be mathematically log-normal. Since most statistical techniques including t-test, analysis of variance, and regression analysis assume normality of the distribution, it is necessary to transform log-normally distributed variables to normality by taking logs of the original measurements.
To the best of our knowledge, the inferences based on x cannot always be converted back to the original variable, y. For example, it is unclear how to convert the value of the mean of x back to the mean or a comparable parameter for the original variable y. The mean for y cannot simply be back-transformed from the mean of x.
The issue of converting inferences based on x in a regression or any other modeling situation may even be more complex. If the differences between x for males and females are statistically significant, there is no clear way to determine if the same occurs for y and if so, what is the magnitude of the differences in the original scale. Some of the often used transformations to normalize non-normal data are special cases of Box-Cox transformations. For example, when λ=-1, it is equivalent to reciprocal transformation, when λ= 0.5, it is equivalent to square root transformation, and when λ=1/3, it is equivalent to cube root transformation.
Mateu [12] used Box-Cox transformations to normalize three environmental datasets, namely, for wind direction, SO2, and particle concentrations. Normalization of data was achieved for wind direction when λ=2, for SO2 when λ=0.5, and for particle concentrations when λ=0.5. According to Mateu [13], if a transformation can achieve symmetry, it is sufficient for practical purposes. Mateu [12] recommended logit transformations for percents and proportions.
If the presence of outliers or extreme values is an issue, then log transformation is a better choice than square root transformation (http://www.unm.edu/~marcusj/datatransforms.pdf). However, square root transformation was shown to perform better in achieving constant variance of the residuals and normality of the distribution for percent data [14].
Square root transformation has also been shown to stabilize variance for the counts data [15]. However, log transformation as compared to square root transformation was found to be more effective in reducing the skew and leptokurtosis that characterize the untransformed inter-individual EEG amplitude distributions [16].
Estimates of statistical parameters for the data that are not normally distributed can be biased. In order to obtain unbiased estimates to the degree it is possible, robust statistical techniques to estimate location and scale parameters have been proposed. Computations of trimmed and Winsorized means (http://www.statisticalanalysisconsulting.com/ measures-of-central-tendency-the-trimmed-mean-and-median/) are two of the many techniques that have been proposed to obtain robust estimates of location parameters. Trimmed means are computed by trimming x% observations from each tail of the ordered data. X can vary from 0.1% to as much as 25%. If X=25%, trimmed mean so computed is based on the middle 50% of the data.
If an ordered data of size 20 is written as Y1, Y2, Y3…, Y18, Y19, Y20, and if X=10%, then trimmed mean is based on observations Y3, …,Y18. On the other hand, in order to compute Winsorized mean, first observations Y1 and Y2 are set equal to Y3 and observations Y19 and Y20 are set equal to Y18 and the Winsorized mean is computed for all 20 observations after the values of the observations Y1, Y2, Y19, and Y20 have been modified. However, depending up the value of X, the modified distribution used to compute Winsorized mean may became fat tallied and as such, computation of Winsorized means may not always be a good idea.
In this paper, if the normality is not achieved for the logtransformed data, we approach the task of achieving normality or near normality as an outlier detection problem followed by robust estimation using trimmed means. However, instead of using same value of X for both lower and upper tails, the value of X is allowed to be different for lower and upper tails depending up on the results of the outlier analyses as described later on.
In other words, we try to achieve normality by temporarily trimming a certain number of the lowest and highest observations from the data. Estimates of statistical parameters are then based on the data that remains after certain observations have been trimmed from the tails. For the purpose of this communication, dataset that remains after certain observations have been trimmed from the tails of the original dataset is called a reduced dataset.
The dataset containing observations that are trimmed from the original dataset is called trimmed dataset for the purpose of this communication. Robust estimation procedures are used for the reduced dataset. For the purpose of this study, a modified trimmed mean is computed. The advantages and drawbacks of this technique are discussed. Recommendations are made about the applicability of this technique under specific circumstances. Additional insight into the data that can be achieved using this technique is also discussed

Material and Methods
We downloaded publically available data for about 100 chemical variables from the National Health Examination and Nutrition Survey (NHANES) for the years 2003-2004 (www.cdc.gov/nchs/nhanes/ nhanes2003-2004/lab03_04.htm). Data were downloaded for persistent organic pollutants (POPS), nutritional variables, and urinary and blood metals. Since, the percent values below the limit of detection (LOD) imputed as LOD/√2 can affect computations of skewness and also the log-transformation process, we used only those variables which had less than one percent observations below the LOD.
This selection process provided data for 27 variables for analysis purposes. A majority of these variables, 17, were measured in serum; seven were measured in urine; two were measured in plasma; and one was measured in the whole blood.
The sample sizes, skewnesses, and p-values for the Shapiro-Wilk test of normality [17] for the log10-transformed data for these variables are given in Table 1. The skewnesses prior to log10-transformation are also given. Since, for all 27 variables used in the study the Shapiro-Wilk test of normality W was statistically significant (p ≤ 0.01) for the log10transformed data, we used Tukey's exploratory techniques [18] to identify potential outliers. It should be noted that the W test computes the skewness of the data and evaluates if the skewness of the dataset was statistically significantly different from zero.
In order to use Tukey's exploratory techniques, we computed, Q1, the first quartile; Q3, the third quartile; IQR, the interquartile range computed as IQR=Q3-Q1; K=M*IQR, where M is arbitrarily called the fence multiplier; and lower fence FL=Q1-K; and upper fence, FU=Q3+K. All observations in magnitude below FL and above FU were considered potential outliers. When M=1. 5  As an example, when M=1.5, in a sample S={1, 12, 13, 15, 16, 18, 21, 24, 29, 71} of size 10, Q1=12.5; Q3=26.5; IQR=Q3-Q1=14; FL=12.5-1.5*14=-8.5; FU=26.5+1.5*14=47.5. Thus, assuming M=1.5; observations below -8.5 and above 47.5 were considered potential outliers. For this sample, since there was no observation below -8.5, there was no potential outlier on the lower side of the sample. However, there was one observation 71 above 47.5 which was considered a potential outlier. When M=0.5, FL=12.5-0.5*14=5.5 and FU=26.5+0.5*14=33.5. Since there was one observation below 5.5 and one observation above 33.5 in the sample, S, there were a total of two potential outliers in the data. It should be noted that as M decreases, the number of observations identified as potential outliers increases. When M=1.5, there was only one potential outlier in the sample. When M=0.5, there were two potential outliers in sample S. Higher values of M lead to smaller number of observations identified as potential outliers. Thus, higher values of M, for example, 1, will likely leave the reduced dataset with larger variability than will a relatively smaller value of M, for example, 0.5.
In the procedure proposed here, a specific value of M was used for the original log10-transformed dataset. Potential outliers below FL and above FU were trimmed from the original dataset, and the reduced dataset was tested for normality by using the W test. If the reduced dataset was found to be normally distributed, that dataset was accepted. If not, a different value of M was used for the original log10transformed dataset. This process continued until a reduced dataset was found to be normally distributed or near normally distributed, or a decision was made to discontinue testing for normality as described below.
The value of M we initially used varied from 3.0 to 0.5 in decrements of 0.5. The p-values for the W test for each of the 27 datasets before and after applying M are given in Table 1. The reduced dataset for which normality or near normality was achieved was evaluated further for the distributional characteristics. The subsets of the data that were below FL and above FU were also evaluated for their demographic characteristics. SAS Proc 9.3 (www.sas.com) was used to do statistical analysis.

Results
In the results presented below and throughout the manuscript, percent observations trimmed refers to the percent observations trimmed from the original dataset. For example, if there were 100 observations in the original dataset, and five observations were identified as potential outliers and trimmed from the lower tail, and 10 observations were identified as potential outliers and trimmed from the upper tail; then it will be said that a total of "15% observations were trimmed, 10% were trimmed from the upper tail and 5% were trimmed from the lower tail". The use of the words "lower" and "upper" tail always refers to the tails of the original log10-transformed data.
Distributions were not normal ( Table 1) for any of the 27 variables even after the log10-transformations (p ≤ 0.01). We could not find any observable pattern in terms of the size of skewness before or after log10-transformations that could be attributed to the matrices in which these variables were measured. Log10-transformations did substantially reduce skewness for all variables. For example, the skewness of serum Vitamin B12 was reduced from 27.8 to 0.43 (Table  1). But, for six variables, namely, urinary creatinine, serum Vitamin C, urinary enterolactone, PCB 49, PCB 87, and PCB 180, the distributions became negatively skewed after the log10-transformations. As the value of the fence multiplier, M, decreased, the absolute values of the skewnesses also decreased. However, because of relatively large sample sizes, even the smallest departures from the skewness of zero caused the p-values for the Shapiro-Wilk test to remain below 0.05. For example, when M=0.5, for serum PCB 153, the sample skewness was 0.03 but the p-value for the Shapiro-Wilk1 test of normality was still <0.01 (Table 1). The values of M below 0.5 were not considered because of the possible trimming of a substantial amount of data, probably as much as 25% or more. For urinary equol, normality was achieved when M=1.5, and for serum PCB 49 when M=2.0 (Table 1). For serum folate and serum PCB 44, the distribution became negatively skewed as M was reduced from 1.0 to 0.5 (Table 1). For serum transferrin receptor, the distribution became negatively skewed as M was reduced from 1.5 to 1.0 (Table 1). For urinary perchlorate, the distribution became negatively skewed as M was reduced from 3.0 to 2.5 (Table 1). For serum PCB 28 and PCB 87, the distributions became negatively skewed from positively skewed or vice versa as M was reduced from 2.0 to 1.5 (Table 1).
For each of the variables for which skewness switched signs from positive to negative or vice versa, further attempts were made to find a value of M for which normality or near normality could be achieved. For example, for serum folate and serum PCB 44, the values of M were explored between 1.0 and 0.5 in decrements of 0.1. In addition, while the value of skewness remained negative for PCB 52 both at M=3.0 and M=2.5, the skewness increased as M was decreased further. As such, for PCB 52, a value of M between 3.0 and 2.5 was considered in decrements of 0.1. The results are given in Table 2 for these variables. The values of M for the urinary equol and serum PCB 49 were accepted as given in Table 1 (1.5 for urinary equol and 2.0 for PCB 49). For the other 18 variables, near normality was considered to be achieved when M=0.5 or when M=1.0. Even though the absolute skewness was lowest at 0.5, the value of 1.0 was preferable because too much data may be trimmed before robust estimations when M=0.5.

Variable
Fence  For the variables given in Table 2 and for urinary equol and PCB 49, the weighted means with their confidence intervals and standard deviations before and after M were applied as well as the number and percent of observations trimmed due to the application of fence multipliers are given in Table 3. The means of the reduced data, i.e., the data remaining after certain observations potentially identified as outliers were trimmed by use of fence multipliers were higher or lower than the original log-transformed data depending upon the mix of observations trimmed from the lower and upper tails of the original The final mean was 0.797 ng/g compared to the original mean of 0.602 ng/g because, of the total of 16.4% observations that were trimmed from the original data, 16% were from the lower tail and only 0.4% was from the upper tail. As would be expected, the standard deviations of the reduced data were always lower than for the original data. For example, the standard deviation of the reduced data for log PCB 87 was 0.249, 49.6% lower than that of the original data, which was 0.494. For this reason, the widths of the confidence intervals of the means for the reduced data were always lower than that of the original data.
The percent of observations trimmed to obtain the reduced sample varied from 0.1% for urinary perchlorate to 16.4% for PCB 87 (Table  3). A relatively large percent of observations, 9.5%, were also trimmed for serum folate.

Variable
Original N  Table 3: Weighted means with 95% confidence intervals and standard deviations with and without application of fence multipliers for selected variables. Table 4 shows the weighted means, 95% confidence intervals of the weighted means, and standard deviations before and after the fence multipliers were used for the 18 variables not included in Tables 2 and  3. The number and percent observations which were trimmed to obtain the reduced samples are given in Table 5. Whether the means of the reduced data were higher or lower than the original log10transformed data depended on the mix of observations trimmed from the lower and upper tails. For example, for blood lead (Table 4), while the mean of the original log10-transformed data was 0.179 µg/dl, the mean of the reduced data when M=0.5 was 0.150 µg/dL since 9.6% of the observations were trimmed from the upper tail (Table 5). On the other hand, for PCB 180 (Table 4), while the mean of the original logtransformed data was 1.827 ng/g, the mean of the reduced data when M=1.0 was 1.842 ng/g, since a majority of observations removed were from the lower tail. In general, standard deviations of the original log10-transformed data were greater than or equal to standard deviations of the reduced data. The standard deviations were higher when M=1 than when M=0.5.
The widths of the confidence intervals for the means of the original log10-transformed data were greater than or equal to the widths of the confidence intervals of the reduced data The widths were higher when M=1 than when M=0.5. When M=1.0, the percent observations trimmed to obtain the reduced samples varied from a very low of 0.3% for PCB 153 to a high of 11.8% for serum Vitamin C ( Table 5). The percent observations trimmed for PCB congeners were much smaller than for non-PCB variables. When M=0.5, the percent observations trimmed to obtain reduced samples varied from 4.1% for PCB 180 to a high of 21.5% for serum Vitamin C (Table 4).   Plasma methylmalonic acid was selected for a detailed demographic evaluation of subjects in the lower and upper tails, because, for this variable, a majority of the trimmed observations were in the upper tail (6.1% vs. 1% when M=1, Table 5).

Statistics for Reduced Log10-Transformed Data after Outliers have been Fenced Out with Fence
Vitamin C was also selected for a detailed demographic evaluation of subjects in the lower and upper tails, because, for this variable, a majority of the trimmed observations were in the lower tail (10.9% vs. 0.9% when M=1, Table 5). The results are given in Table 6.
The distinction between the middle of the distribution and the lower tail was much sharper when M=1 than when M=0.5. This might influence the choice between M = 0.5 and M=1. On the other hand, those who were in the upper tail were predominantly non-Hispanic whites (64.8% when M=0.5, 68.3% when M=1), males (54.7% when M=0.5, 57.1% when M=1), and aged 50+ years (63% when M=0.5, 68.9% when M=1).
More specifically (data not shown), 63% (when M=0.5) of those who were in the lower tail were non-Hispanic black and Mexican American males and females aged ≤ 29 years. However, when M=1, 26.6% of those who were in the lower tail were Mexican American males aged 50+years (data not shown).
Thus, selection of M could bias the interpretation of the results. In the upper tail, 46% were non-Hispanic white males and females aged 50+years when M = 0.5, and 51.6% when M=1.
For serum Vitamin C (Table 6), for both M=0.5 and 1.0, the subjects who were in the lower tail were predominantly males (55.9% when M=0.5, 57% when M=1), non-Hispanic whites (52.5% when M=0.5, 55.8% when M=1), and aged ≤ 29 years or 50+years (71.5% when M=0.5, 62% when M=1). When M=1, the age group distribution in the lower tail was similar, 31.6% for aged ≤ 29 years, 30.4% for those aged 30-49 years, and 38% who were aged 50+years.   This is another instance where the selection of M could lead to different interpretations of the results. Those who were in the upper tail (Table 5) were predominantly female (62.3% when M=0.5, 60.3% when M=1), non-Hispanic white (63.2% when M=0.5, 85.7% when M=1), and aged ≤ 29 years or 50+ years (90% when M=0.5, 85.8% when M=1). More specifically (data not shown), more than 53% of those who were in lower tail were male and female non-Hispanic whites in all three age groups for M=0.5 as well as M=1. Among those who were in the upper tail, 31.1% were non-Hispanic white females aged ≥ 30 and 11.4% were non-Hispanic black males aged ≥ 50 when M=0.5. When M=1, Mexican American males aged ≥ 50 and non-Hispanic black males and females aged 30-49 years formed 67.5% of all those who were in the upper tail.
From the variables in Table 2, we selected PCB 87 and serum folate for a detailed study of demographic characteristics of those who were in the lower and upper tails. For PCB 87, 16% of the subjects which were trimmed were in the lower tail and 0.4% in the upper tail. For serum folate, 4.1% of the subjects trimmed were in the lower tail and 5.4% were in the upper tail. The results are given in Table 7. For serum folate (M=0.8), the subjects in the lower tail were predominantly males (54.5%), non-Hispanic whites and blacks (71.5%), and those aged ≤ 29 years (45.5%). For serum folate (M=0.8), the subjects in the upper tail were predominantly females (59.2%), non-Hispanic whites (72.9%), and those aged 50+years (63.9%). Specifically (data not shown), those who were in the lower tail were non-Hispanic whites and non-Hispanic males and females aged ≤ 29 years (32.8%). Those who were in the upper tail were predominantly non-Hispanic white males and females aged 50+ years (51.7%).  For serum PCB 87, while males and females (Table 7) were almost equally distributed in the lower and upper tails; there were 52.5% males as compared to 47.5% females in the middle of the distribution. While non-Hispanic whites were predominant in the lower tail (53.8%), non-Hispanic blacks were predominant in the upper tail (50%). The distribution of age groups was not substantially different in the lower tail than in the middle of the distribution. Non-Hispanic white males and females aged 50+years accounted for 23.8% of the subjects in the lower tail.

Discussion
We have described a simple method based on Tukey's fences to achieve normality or near normality when log10-transformations of chemical data do not achieve normality. This method involves identifying and trimming observations from the lower and upper tails of the distribution that may hinder achieving normality after log10-transformations. The reduced dataset obtained after trimming certain observations from the lower and upper tails had means which could be smaller or larger than the original log10-transformed data depending upon the percent mix and magnitude of the observations that are trimmed from the lower and upper tails. For example, when a large majority of observations were trimmed from the lower tail as compared to upper tail (16% vs. 0.4%), the means of the log10transformed reduced data for PCB 87 (Table 3) was higher (mean=0.797 ng/g, geometric mean (GM)=6.3 ng/g) than for the original log10-transformed dataset (mean=0.602 ng/g, GM=4.0 ng/g); the GM for the trimmed dataset was more than 50% higher. On the other hand, when a majority of observations that were trimmed from the upper tail (Table 4, M=0.5) as compared to the lower tail (11.1% vs. 2.2% for PCB 99); the mean for the reduced dataset was lower than for the original log10-transformed dataset. For example, the mean for PCB 99 for the reduced dataset was 1.418 ng/g (GM=26.2 ng/g) as compared to the mean for the original data which was 1.357 ng/g (GM=22.8 ng/g); the GM of the reduced dataset was about 20% lower than for the original dataset. Thus, statistical values from the data which are not normal can lead to under-or over-estimation of the measures of central tendency such as means or geometric means.
Conversely, as expected, we found the estimates of dispersion, for example standard deviations, were always lower for the reduced dataset than for the original non-normal dataset. For example, for PCB 87 (Table 3), standard deviation for the reduced dataset was 0.249 (geometric standard deviation=1.77) while for the original dataset, it was 0.494 (geometric standard deviation=3.12) or the geometric standard deviation of the reduced dataset was about 43% lower than that of the original dataset.
The principal issues with the approach we proposed to achieve normality or near normality are (i) what percent of data from the original dataset can be ignored/trimmed to obtain robust estimates, (ii) what is the cost if no further analysis can be done on the data that are trimmed from the lower and upper tails, (iii) what should be done with the data trimmed from the lower and upper tails, and (iv) what additional information or insight can be achieved by studying the observations that form trimmed data. It is not simple to determine the percent from the original dataset that can be ignored/trimmed to achieve normality or near normality. Individual researchers must use additional clinical insight and input to decide which variables and the percent of data that can be trimmed without unacceptably altering the dataset. For example, for the variables in Table 3, there were only a few observations that needed to be trimmed from PCB 49, PCB 52, serum transferrin receptor, urinary perchlorate, and PCB 28 to achieve near normality; in these cases, the outliers can be ignored. In general, loss of ≤ 5% of the original dataset may not be substantial. But it depends up on the individual research issues involved. However, if a large majority of observations are trimmed from one tail compared to the other, there may be some information contained in the trimmed data that should not be ignored. This was probably the case with PCB 87 (Table 3) for which 16% out of the total of 16.4% of the observations which were trimmed were from the lower tail. It may be important to understand the demographics, residential location, dietary habits, and risky behaviors of the subjects trimmed from the lower tail to understand which of these factors might lower the concentration of the variable under consideration. While we did not evaluate the residential conditions (for example, industrialized vs. non-industrialized areas), their dietary habits (for example, consumption of fatty fish that may have exposed them to excessive PCB levels), or behavior (for example, smoking and/drinking) of these 290 subjects, we did use 24 demographic groups (2 gender × 4 race/ethnicity × 3 age groups) to more accurately identify them. By doing this, we found that 69 (23.8%) of them were non-Hispanic white males and females aged 50+ years, and 87 (30%) were non-Hispanic white males and females aged ≤ 49 years; 44.9% of the total population were non-Hispanic whites in the middle of the distribution. It would be informative to investigate the differences in residential, dietary, and behavioral factors between those non-Hispanic whites in the middle of the distribution and those who are in the lower tail. Similar evaluations could be useful if there are a substantial number of subjects which are trimmed from the upper tail.
Another question concerns the differences in the outcome of statistical analysis when the non-normality of the log-transformed data is ignored and analyses are carried out as if the log-transformed data were normal. We have already shown that the GM of the data may be under or over-estimated and the geometric standard deviation will always be over-estimated. Statistically significant differences discovered for original non-normal log10-treansformed may become statistically insignificant for the normal or nearly normal for the reduced data and vice versa. It is not impossible for the direction of statistically significant differences to be different between original and reduced dataset. Also, it is difficult to generalize what will happen to the regression coefficients when non-normal log-transformed data are used in model fitting. The results of this occurrence will depend up on the degree of non-normality of the dependent variable, the number of covariates, the total number of cells in the data, the cell sizes, and other independent variables in the model and their distributions.
In this study, we implicitly assumed that a single distribution will be sufficient to describe all demographic groups. There may be circumstances when this may not be true, for example, different demographics groups, for example, non-Hispanic blacks and Mexican Americans may assume different distributions, or in other words the total population may be a mixture of several distributions. If that is the case, each individual distribution should be analyzed separately by using the methodology proposed here.
While we described the demographic characteristics of the distributional tails of methylmalonic acid, Vitamin C, PCB 87, and serum folate, other characteristics of tails should also be looked into, for example, how their dietary habits are different from those who are in the middle of the distribution.
The outlier detection methodology we proposed to normalize or nearly normalize the data is simple, but it is also crude. However, it affords us an opportunity to convert inferences which are based on normalized log-transformed reduced data.
Better methods to normalize data, for example, Box-Cox transformations have been proposed. However, until the issue of convertibility of estimated parameters for the transformed data using Box-Cox transformations to original scale can be resolved, data that remains non-normal after log-transformation can be normalized or nearly normalized using Tukey's exploratory procedures as defined here. This may, in fact, lead to additional insight into the data regarding the subjects at the tails of the distributions. Such, insight may not be possible using Box-Cox transformations.
The use of alternate non-parametric methods has been recommended when the distribution of the data to be analyzed is not normal (http://blog.minitab.com/blog/adventures-in-statistics-2/ choosing-between-a-nonparametric-test-and-a-parametric-test). However, before succumbing to this temptation, what non-parametric methods actually do needs to be understood. Essentially, nonparametric methods rank order the data before attempting to do any analysis. For example, let us see we need to compare the means of two datasets, say, X with observations {3, 12, 21, 34, 231} and Y with observations {2, 9, 23, 39, 49}. Then, with use of a non-parametric method, they will be ranked XR={2, 4, 5, 7, 10} and YR={1, 3, 6, 8, 9}. The sum of ranks for the two datasets will be ∑XR=28 and ∑YR=27. Then, by one or the other non-parametric methods, for example, Wilcoxon Rank SumTest, ∑XR will be compared with ∑YR and p-value for the test of statistical significance will be computed which will inform whether or not the "means" of the two datasets are statistically significantly different. To the best of my knowledge, there is no way to indicate by what magnitude X and Y are different in the original scale. In the opinion of this author, drawing conclusions based on ranks rather than the original scale is a serious drawback. In the clinical sciences, it is essential to know the differences in the original scale than in the scale based on ranks. The comparative powers of parametric vs. non-parametric tests is not an issue that should solely be used to make a judgment about the appropriateness of a statistical test of significance. Consequently, this author prefers to use parametric tests.
Transformations other than log transformation have been proposed to reduce right skewness of the data. It should be noted that the main issue with chemical and environmental data is the skewness of the data and only those transformations that reduce the skewness should be considered. Certain transformations like 1/X changes the skewness of the data from right skewed to left skewed and vice versa and as such are not of use for analyzing chemical and environmental data. Log transformations are not always capable of normalizing data but the use of Tukey's fences along with log transformations as described in this communication, can achieve near normality, if not normality of the data.