An approach for quantifying western blots: the case of signal intensity and the statistical analysis

procedure initiating from sample preparation, normalization, SDS–PAGE gel loading, protein transfer, primary and secondary antibody selection, incubations, and washes etc. Even under the assumption of flawless handling techniques, the quantitative analytical process receives questions In biological sciences, western blotting technique is widely used to quantify the expression of proteins in a given sample. However, there is no unified method for quantifying the expression of proteins. As a consequence, quantitative analysis of expression of protein through western blotting often suffers from data inconsistency. At the same time, extraction, based on the poor sample size (e.g. n = 3/5/7), turns such analysis into non-Gaussian and less robust to statistical errors. In present study, we tried to venture a noble approach while analyzing an image from western blotting using Gaussian blur as filter and thereby generating data in order to perform meaningful statistical analysis. The differences among various blots that correspond to the expressed target proteins are tested viably using appropriate statistical tools. This procedure of quantifying western blotting is comprehensive, simple and can be applied to collect data in compliance with statistical norms. Furthermore, repeating western blotting on a set of particular proteins may improve the analysis part as well.


Introduction
Western blotting is a widely practised laboratory based analytical technique, especially in molecular biology. It is manoeuvred to detect proteins of interest in a given sample. Since its publication in 1979 (Towbin et al., 1979), this detection technique has become the lifeline for laboratories dealing with protein expression, detection and isolation. It carries out SDS-polyacrylamide gel electrophoresis (SDS-PAGE) to separate various proteins contained in a given sample. The separated proteins are then blotted onto a nitrocellulose or PVDF membrane, where they are treated with appropriate antibodies that bind the target protein. The protein-antibody reactions are marked as black bands (often named as 'blot') and used for further interpretation.
The primary analytical approach for 'blot' is qualitative in nature. It explains whether a protein gets expressed through the visualization of blots. A faint blot indicates negligible expression of protein in comparison with dark blot representing strong expression of proteins. The next step is to interpret 'high' or 'low' expression of the target protein by dint of quantitative scale. This is performed through densitometry analysis comparing to the expression of 'housekeeping proteins' (e.g. Glyceraldehyde 3-phosphate dehydrogenase (GAPDH), β-actin or α-tubulin) which are considered as standard blot for normalization. In densitometry, the darkness of the blot is captured through chemiluminescence or fluorescence detection methods. The signal intensity (measured commonly through ImageJ, https://imagej. net) of the blots are considered equivalent to the expression of protein of interest. The result is furnished as the ratio of normalized signal of sample to the control (termed as fold change or percentage of change). Taylor et al. (2013) highlighted several challenges faced in every step of the western blotting procedure initiating from sample preparation, normalization, SDS-PAGE gel loading, protein transfer, primary and secondary antibody selection, incubations, and washes etc. Even under the assumption of flawless handling techniques, the quantitative analytical process receives questions An approach for quantifying western blots: the case of signal intensity and the statistical analysis figuring towards accuracy, reliability and reproducibility of outcomes (Graham, 2016;Bulter et al., 2019).
We try to venture a simple approach in order to overcome the above mentioned couple of problems. While outlining this approach, the data acquisition and processing are kept as simple as possible so that every laboratory may follow such approach without any technical hindrance..

Western blotting
The western blots are taken from Aquatic Ecology and Fish Biology Laboratory, Department of Zoology, Visva-Bharati University. It exhibits three lanes and four proteins as A, B, C and D. The housekeeping protein taken here is α-tubulin.
Image capture software and Image formats Images are captured and saved as .tiff format. All densitometry analyses are performed on images saved in the above formats.

Image analysis software
The image analysis software used is Fiji (https://imagej. net/Fiji/) (Figure 1a). While using imagej software, we assume that the ImageJ analysis is performed following the protocol described in http://www.yorku.ca/yisheng/ Internal/Protocols/ImageJ.pdf. The images are initially adjusted for background using Guassian blur filter available in the software. Once the background is adjusted for noise removal to the possible limit, the image calculation is performed where the filtered image was subtracted from the original image. Being the 8 bit image format, initially the background is adjusted for a pixel density of 255 and the foreground (here the blots) intensity is recorded using Region of Interest (ROI) option. At least 20 random plot areas are selected from each blot. The complete and stepwise procedure of image processing in Fiji is explained through Figure 1b. The mean deviation of pixel density from each blot area is kept minimum.

Normalization of pixel intensity
Each mean pixel intensity (MPI), obtained, is subtracted from 255 to infer on higher MPI for dense blots. In other words, dense blots represent high expression of proteins. The Mean MPI of housekeeping protein (here α-tubulin) is compared for any statistical difference (α = 0.05). The null hypothesis is that, the expression of housekeeping proteins of different lanes should not have any difference. In case of any difference, the whole blot is rejected. If we fail to reject the null hypothesis, the maximum Mean MPI of housekeeping protein is taken for normalizing MPIs for each blot across the lanes. The normalization is executed as follows:

Results and discussion
The filter Gaussian blur generates corrected image compared to the original one with background pixel density as 255 (Figure 2a and 2b). Here, the visible differences of Figure 2b with Figure 2a are very clear.
As stated earlier, at least 20 (n = 20) random plot areas are selected. The pixel of individual plot recorded are subtracted from 255 to obtain the actual pixel density of the reading. The data thus generated are normalized by square root transformation. The normalized data set thus generated are presented in supplementary file. Different statistical parameters verifying quality of the normalized data are presented in Table 1. It is to be noted that all transformed data assumes normality as per Kolmongorov-Smirnov statistics. Further, the compliance to the homogeneity assumption on variances is 0.00, Figure 4a). Similarly, difference between the lanes (1 and 2) of Protein D is also statistically significant (tdf = 38 = 20.149, p = 0.00, Figure 4c). The Kruskal-Wallis test of Protein B and C show statistically significant difference among the groups (χ 2 Protein B, df = 2 = 41.839, p = 0.00, Figure 4b; χ 2 Protein C, df = 2 = 41.513, p = 0.00, Figure  4d). Notably, the relative positions or 'ranks' of the blots are distantly placed too. In all the above cases, expression of protein A, B, C and D are explicitly interpretable and statistically justifiable.
Densitometry analysis is basically an acquired pixel based analysis assigned to the blots through image capture methods. While capturing images, the blots, with respect to the protein-antibody interactions pick up pixels and convert them to the digital images. These pixels are therefore assigning an intensity value corresponding to the expression of target protein. However, when multiple blots are created for comparison, which is common experimental demand in laboratory, the corresponding 'housekeeping proteins' are compared as well. Graham (2016) pointed out that a loading control actually does not represent the same underlying expression although such loading control might look same in simple image analysis tools. In cases of excess control loading, the corresponding expression of target protein may show significant difference from the rest. Such phenomenon would be interpreted as 'highly' expressed target protein, whereas there may not be difference in expression at all. This phenomenon is termed as 'unadjusted signal intensity' .
The proposed method showing the expression of α-tubulin as control loading is consistent across lanes since the statistical difference across the lanes (Lane 1, 2 and 3) are found to be statistically insignificant. In animal science research, inconsistency in the expression of control loading during western blotting is posed as confirmed using Levene's test. The result on it is reported in Table 2.
Upon the assumption of homogeneity of variances, next we perform independent t test in order to check the statistical differences between Lane 1 and 2 corresponding Protein A and D respectively. The differences among all the three lanes of α-tubulin are tested following one way ANOVA. Noticeably, the housekeeping protein, α-tubulin, considered as loading control across theses three lanes does not emerge statistically significant (ANOVA, F = 0.400, p = 0.673, Figure 3). Hence the maximum mean MPI of housekeeping protein (i.e. 22.13 ±1.64) is used to normalize all MPIs across lanes. The differences among lanes of Protein B and C are verified using Kruskal-Wallis non-parametric test since these datasets donot meet up the criterion of homogeneity of variances, examined through Levene's statistics (Table 2).
For Protein A and B, the Lane 1 and Lane 2 exhibit statistically significant difference (tdf = 38 = 34.343, p =

Figure 3
Mean MPI of α-tubulin of all the three lanes a major roadblock (Eaton et al., 2013). In general, western blot studies normalize the levels of expression of target protein to the levels of expression of housekeeping protein (that donot change much among comparison) (Welinder & Ekblad, 2011;Aldridge et al., 2011;Ghosh et al., 2014). Under the influence of 'unadjusted signal intensity' , such assumption cannot be ensured unless a full proof and proven support is available. Further, any inconsistency in the process of quantification may lead to accept false positive result. However, the present method affirms the consistency of expression of loading proteins (α-tubulin) which could be established statistically across the lanes. This technique of observation, therefore, confirms reliability in the outcomes from the western blot studies. What is more, the sensibility of uniform loading of α-tubulin or any other housekeeping proteins may be verified by applying this procedure.
Another major challenge in western blot analysis is that results from densitometry analysis are statistically less convincing (Kreutz et al., 2007). In fact, the usual methods applied blindly to compare difference between means of densitometry, are t-test and ANOVA. A random click on key word 'densitometry' in Google search engine pops up the information that the sample sizes for densitometry analysis range mostly from 3-7 (i.e. n = 3-7). Unfortunately, this sample size is too meagre to execute any parametric test like t-test and ANOVA, based on normality assumption. That is why, agreements to normality as well as homogeneity assumption on variances are essential for such parametric tests. Else, a note of justification should be forwarded whether to perform parametric or non-parametric test for statistical analysis of densitomtric data. This might be difficult to decide if the sample size is considerably small. Repeat preparation of 'blots' may further compromise with technical feasibility of the experiment. Thus, reproducibility, data linearity and consistency in proportionate measures are significant toolkits to ensure statistically about the acceptance/ rejection of the differences among the blots. The method presented here follows a simple normalized procedure for the blots representing the protein of interests. The differences recorded this way, are therefore, reliable and convincing. The conventional way of analysing data for gel electrophoresis with 3-7 replicates may also be tackled by this process. The total readings generated here for each blot of protein is 20 (of which Protein A and B complied to parametric test and Protein B and C to non-parametric test). The sample size may be increased further to adjust the requirement of sample size or increase the statistical power (P) to observe the need of parametric tests.
Studies related to genetic expression (or protein) often face a common problem of non-reproducibility due to poor sample size (Maleki et al., 2019). Thus, the procedure described for densitometric data analysis could be a statistical substitute of such method-dependent analysis.

Conclusions
Accuracy and reproducibility are the best criteria for any scientific quantitative data. The challenge in the interpretations of western blotting or in similar scientific experiment arises chiefly of smaller sample size. Also, this challenge gets doubled by the fault of parametric inferential analysis on non-normal data set. The method, detailed here, therefore, delivers a way to deal with the problem of smaller sample size. However, skilful use of Gaussian blur to set a uniform background plays significant role here. Another limitation in background setting is to distinct very faint blot from the background which may eventually drop to the background pixel intensity. Whatever, the proposed method would help to undergo statistical analyses of western blot data from blots with small sample size. However, the accuracy of outcomes, accrued through large sample size is preferred always.