Evaluation of image texture recognition techniques in application to wastewater coagulation

Flocs formation and growth are important characteristics in wastewater coagulation process. The shape and size of flocs highly affect further separation processes, therefore resulting treatment efficiency of wastewater after coagulation. Observed images of flocs tend to show strong relations to coagulation parameters: dose and coagulation time. In this article, three texture recognition techniques were evaluated for the ability to mathematically describe the relationship between the images of flocs and coagulant dosages. The easily computable texture analysis methods were found to be potential techniques for the characterization of the particles images. Ten out of eleven co-occurrence matrix-based grey level co-occurrence matrix (GLCM) texture features were found to be significant for the dosage prediction by a principal component regression model with only one principal component. Two features (Inverse difference moment and Variance) were selected for the multiple linear regression model. Test set prediction accuracy varied from 83 to 96% depending on texture analysis method and multivariate model. Best dosage prediction and image classification results were achieved by GLCM and angle measure technique. The results of image texture analysis coupled with multivariate modelling techniques indicate that it is possible to characterize and relate flocs images, captured during coagulation, with different coagulant dosages, as well as predict those dosages. *Corresponding author: Nataliia Sivchenko, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003-IMT, Aas 1432, Norway E-mails: nataliia.sivchenko@gmail.com, nataliia.sivchenko@nmbu.no


ABOUT THE AUTHORS
The authors are members of Water, Environment, Sanitation and Health (WESH) research group at the Norwegian University of Life Sciences (NMBU), Department of Mathematical Sciences and Technology (IMT). WESH group focuses on water and wastewater related issues and is heavily involved in teaching and supervision of MSc and PhD programmes in water engineering and technology. The group also has one of the largest externally funded research and educational project portfolios at IMT, and collaborates with partners from EU, North America, Eurasia, Asia and Africa. It also has a number of Research, Development and Innovation projects with Norwegian partners. The main research and development areas of the WESH group are: process control and optimization of coagulation and biological treatment processes; membrane fouling and filtration processes; microbial water quality and risk assessment; decentralized wastewater systems; modelling of sewer systems.

PUBLIC INTEREST STATEMENT
The performance of coagulation process is crucial in wastewater treatment. By observing changes in flocs' aggregation, it is potentially possible to predict treatment efficiency and optimize the coagulation process -on-line dosage control. In this article, we present the new approach in flocs images description. It is based on the characterization of the whole image (texture image analysis) instead of characterizing each floc separately. Texture image analysis methods applied to images of flocs and coupled with multivariate modelling seem to be a potential tool for on-line prediction of the optimal coagulant dosage and resulting treatment efficiency.

Introduction
Coagulation is a well known and essential water purification process for drinking water, wastewater, and industrial water treatment. Added to the colloidal suspension, organic or inorganic coagulant destabilizes system which leads to particles' aggregation and subsequent separation. Intelligent dosage control systems are required to optimize and control the removal of particles and phosphates, minimize the use of a coagulant, achieve better sludge management (by reduction of the sludge volumes) and enhance the plant availability of phosphates. Nowadays optimization and process control of coagulation are mainly done by simple flow-proportional dosing concept or based on data from jar tests (Ratnaweera & Fettig, 2015). Nevertheless, with the recent development of a big range of on-line sensors the different coagulation control systems appear (Annadurai, Sung, & Lee, 2004;Juntunen, Liukkonen, Lehtola, & Hiltunen, 2013;Ratnaweera, Lei, & Lindholm, 2002). However, the initial costs of such control systems are usually quite high. Furthermore, the control systems based on outlet water parameters have a perceptible time lag between dosing point and effluent, varying from 30 min to several hours depending on the separation method. This peculiarity makes such systems inappropriate for wastewater coagulation dosing control, where influent water parameters and its flow rate changes dramatically within a short time. Hence, water and wastewater treatment industry are seeking for simple, cheap, robust and precise real-time dosing control systems.
It has been relatively long since researchers started to model the mechanisms of flocculation (Vold, 1963). The developments in this area are summarized in the review by Thomas, Judd, and Fawcett (1999). Visualization, modelling, and understanding of flocs formation mechanisms reveals the ability to determine relationships between particles' properties and coagulation efficiency thus optimize the coagulation process. Many researchers have made their contribution to the understanding of the particles aggregation, breakage and regrowth mechanisms. Jarvis, Jefferson, Gregory, and Parsons (2005) summarized the development of floc strength and breakage mechanisms in their review. Influence of coagulation mixing conditions (e.g. share rates, impeller construction) into flocs formation, breakage and regrowth were also studied (Cao et al., 2011;He, Nan, Li, & S., 2012). However, due to complicated nature of flocs and the huge number of parameters influencing the coagulation process there is still no comprehensive or universally accepted mathematical description of floc formation mechanisms.
The modelling of drinking water treatment process, which included coagulation stage, had been recently performed by Juntunen, Liukkonen, Pelo, Lehtola, and Hiltunen (2012). The authors investigated both laboratory and process data. They found aluminium dose to be one of the most important factors affecting the treated water quality parameters (turbidity and residual aluminium).
Recently, computer vision image processing techniques started to be employed more frequently in order to study and characterize complex size, shape and features of particles. Photographic image acquisition techniques make it possible to capture and analyse flocs' images in situ (Chakraborti, Gardner, Atkinson, & Van Benschoten, 2003). Our hypothesis is that it should be possible to correlate certain floc properties with coagulant dosages and treatment efficiencies using images of flocs. The distinct effect of coagulant dosage on aggregates' features was detected by Lin, Huang, Chin, and Pan (2008) in kaolin water suspension. Authors used wet scanning electron microscopy in order to obtain morphology of flocs. Jin (2005) used the high-resolution digital camera to study the influence of temperature on flocs properties under different coagulant dosages for river water coagulation. The relation between projected area of particles and coagulant dosages were found. Wang et al. (2011) investigated the changes in flocs' characteristics due to different dosages and coagulation pH in humic acid suspension, obtaining images by digital CCD camera. Our approach in this article is to prove that images of model wastewater flocs captured in situ by the nonintrusive photographic method are descriptive in terms of coagulant dosage prediction. Thus, the technique could potentially be used for on-line coagulation dosage control. Advanced dosage control then will be possible in means of determining lowest coagulant dosage which leads to required solid, liquid and phosphorous separation.
Texture analysis is a well-known method of image recognition and especially classification in numerous computer vision applications. However, as far as we know, it has not been previously reported to be used in application to images of coagulated particles. Texture is a loosely defined term without an accepted or universal quantitative meaning. Texture can be defined as a measure of the image's surface roughness in a meaning of brightness, colour, shape and size variations within some region and its repetitiveness. Texture is a pattern which can be completely distinct or completely random. Furthermore, texture could be isotropic (without any preferred orientation) or anisotropic (has definite pattern structure). Images of flocs presented in this study were described by texture recognition techniques.
There have been many attempts in combining the digital image analysis with coagulation/flocculation process. The clear correlations between floc properties and the flocculation process parameters were found by Juntunen, Liukkonen, Lehtola, and Hiltunen (2014). The authors documented that, for instance, the surface area of floc and the number of floc particles highly correlate with the process data, such as lime feed, pH, turbidity etc. However, in contrast to our research, the authors performed the analysis for drinking water treatment plant with the characterization of flocs by object recognition methods in image analysis. In this paper, we evaluate the texture image analysis methods to be used for coagulant dose prediction by the images of flocs. It seems to be a promising tool for further coagulation-flocculation dosage control studies. Potentially, the texture analysis methods in application to particles separation processes may simplify the image characterization part.

Raw water
Model wastewater was used to perform all the experiments in this research. In order to have a better representation of the typical domestic wastewater characteristics, the synthetic wastewater was prepared according to previous studies Ratnaweera, 1991). The model wastewater contains both the organic and inorganic components. Dried milk, potato starch, and humic acid represent the organic part of the suspension. Different inorganic salts are used to well define amounts of orthophosphate and ammonium. Bentonite is used to imitate the inorganic particles. For this particular research, model wastewater was prepared with a medium concentration level of salts and particles and represents typical soft Norwegian wastewater. Components and their concentrations were as follows: dried milk (Nestle, Norway)-300 mg/l, potato starch (Hoff, Norway)-60 mg/l, bentonite (Alfa Aesar, USA)-80 mg/l, NaCl (Merck, Germany)-400 mg/l, K 2 HPO 4 (Merk, Germany)-50 mg/l, NH 4 Cl (Kebo, Germany)-100 mg/l, Humic acid sodium salt (Merck, Germany)-5 mg/l, NaHCO 3 (Merck, Germany)-60 mg/l. Ås city tap water was used as a solvent, which is typical soft Norwegian lake water with low concentrations of most components. The model wastewater was prepared for each experimental series and used within 1.5-4 h after preparation. Average model wastewater parameters were next: 8.03 ± 0.13 pH, 187 ± 10.7 mg/l suspended solids, 256 ± 12 FNU turbidity, 11.2 ± 0.24 mg/l total Phosphorus, 9.5 ± 0.07 mg/l Orthophosphates. The initial model wastewater particles had size distribution 0.52-150 μm with two pics at 1,88 and 14.5 μm, determined by Malvern Mastersizer 3000 (Malvern Instruments Ltd, UK).

Coagulation procedures
All investigations were performed in jar-test scale, Flocculator 2000 (Kemira, Sweden) with programmable mixer units and 1 l beakers. The mixing conditions during coagulation were: 1 min rapid mixing (400 revolutions per min-RPM), 10 min slow mixing (30 RPM) and 20 min sedimentation without mixing. Prepolymerized aluminium chloride coagulant PAX XL61 (Kemira, Sweden) was used in this research. All tests were performed with wastewater temperature 16-17 C and at a constant pH of 7.5 adjusted by required acid or base addition. Coagulant doses varied from 0.29 to 1.08 mmol Al/l, 11 dosages in total. After sedimentation stage, approximately 200 ml of treated water samples were taken from 5 cm depth by a peristaltic pump.
Each sample of treated water taken after sedimentation step was analysed for turbidity, total suspended solids, orthophosphates and total phosphorous concentrations.

Image acquisition and pre-processing
A nonintrusive optical sampling technique was used to capture images of appearing aggregates. The principal scheme is shown in Figure 1. The images of flocs were obtained during the whole slow mixing period of coagulation with repeatability 1 image per 20 seconds provided by free remote camera control software-DigiCamControl 1.2.0. Image capturing equipment used during the investigations was as follows: digital single lens reflex (DSLR) Nikon D600 camera, 105 mm Nikkor AF-S Micro 1:2.8 G ED lens (Nikkor, China), SpeedLite YN460 flash (Yongnuo, China). The size of the image-capturing zone in the beaker was 4.8 × 3.3 cm. In order to obtain flocs with the proper depth of field, the mixer unit was modified with an attachment of 4.5 cm width black curved plastic stripe, which also become a background for the images of flocs. The choice of the background colour was based on a fact that the particles, present in model wastewater, are white in colour. Thus, using a contrasting background it is easier to perform the further image analysis.
Images obtained during coagulation have a resolution of 24.3 megapixels each. They were processed in the open source image analysis software ImageJ v.1.49 (Rasband, 1997(Rasband, /2016) that bases on plugins and macros. For each image 3,690 × 3,690 pixels (3 × 3 cm) area was cropped by manual investigation of the area. Because of slight changes in lighting conditions during image acquisition for different dosages, all images had been pre-processed in order to have the same brightness intensity. Knut Kvaal wrote ImageJ plug-in to mean centre the images' grey-tone values.
In total, coagulation jar tests were performed using 11 coagulant dosages, 10 images from each experiment after 6 min of coagulation were selected as representative samples, resulting in 110 samples. Coagulant dose was set as the response Y parameter in multivariate models.

Fractal dimension of aggregates
Fractal dimension is a very common and widely used parameter to characterize the geometrical features of fractal particles, such as flocs. For a 2-dimensional (2D) projected particle image, the fractal dimension, D pf defines how the projected area of the particles rises with the perimeter (He et al., 2012): where A-projected area; P-perimeter of the particles. For the 2D projection of an image, the value of fractal dimension varies from D pf = 1 for the circle shape floc to D pf = 2 for a chain of particles (a line).
The size of irregular shape particle can be determined as the equivalent diameter, d p (He et al., 2012): Flocs' morphological characteristics were obtained using ImageJ and plug-in "Analyse particles". First, all images collected during the slow mixing period of coagulation were gathered in a stack. Then 3,690 × 3,690 pixels area was manually estimated and cropped. A stack of images was converted to 8-bit greyscale. Plug-in "Subtract Background" was applied on images in order to equalize background pixel values for easier thresholding. If needed, brightness and contrast were adjusted. Onwards, thresholding was used to obtain binary images, with a threshold value estimated manually or by Otsu method (Otsu, 1979). Flocs within this transformation had greyscale value 0-true black, while the background was set to value 255-true white. Finally, plug-in "Analyse particles" was applied to the binary images of flocs. Many geometrical and statistical parameters might be subtracted from the images of aggregates by this plug-in. For this research, significant parameters of flocs were: number of particles in the image, mean area, and perimeter of flocs. Knowing that 1,230 pixels in the image equal 1 cm, we were able to recalculate flocs' features from pixel values to quantitative values in centimetres.
Mean fractal dimension of flocs was calculated by equation: (3) where A pix -mean area of particles, in pixels; P pix -perimeter, a count of pixel edges; 2 is a constant number (Yu et al., 2009).

Image analysis by histogram
The histogram is a frequency distribution of the intensity values that occur in an image. The typical 8-bit greyscale image has 256 intensity values (grey shades). Histogram of a greyscale 8-bit image is a one-dimensional (1D) vector of length 256, which contain information about the number of pixels of particular intensity in the image. However, histograms do not contain the spatial information about pixels in the image. The histogram is a fast and simple way to illustrate statistical information of the image-distribution of intensity values, thus, it is a popular tool for real-time image processing. Histogram analysis had also been illustrated as a tool for elimination of poor quality images (Liukkonen, Hiltunen, & Hiltunen, 2015). The information from histograms could be enough for images comparison.
In order to decrease the number of variables (intensity values) the 3 sizes of histogram bins were tried in the further multivariate analysis: 256, 128 and 64 bins. It means that initial histogram was scaled 2 and 4 times using the plug-in in ImageJ created by Knut Kvaal "Compute hist stack bins". The resulting outputs were given as vectors of 256, 128 and 64 intensity values for each image in the stack. Hence, the resulting matrixes were: 110 × 256, 110 × 128 and 110 × 64.

Image analysis by GLCM
GLCM is a typical statistical method of measuring texture. Haralick, Shanmugam, and Dinstein (1973) invented this method, which bases on spatial-dependence GLCM of pixels with estimation of image features using second-order statistics.
ImageJ has a plug-in "GLCM Texture" v.0.4, created by Julio E. Cabrera and further updated to "GLCM Texture Too" v. 0.008 by Toby C. Cornish. Knut Kvaal has done some in-house revision of the plug-in and called the version 0.009, which was used in this analysis. GLCM algorithm was applied considering the distance between the pixel pairs: 1, 2, 3, 5, 10 pixels and the angles were 0°, 45°, 90° and 135°. The resulting output was given as a vector of the next 11 parameters per each image in the stack: angular second moment (ASM), contrast, correlation, inverse difference moment (IDM), entropy, energy, inertia, homogeneity, prominence, variance, and shade. Hence, 20 matrixes were obtained with the size 110 × 11 each. The detailed description, explanation, and equations for above GLCM texture features are given elsewhere (Conners, Trivedi, & Harlow, 1984;Haralick et al., 1973;Zheng, Sun, & Zheng, 2006). Andrle (1994) introduced angle measure technique (AMT) as a method to characterize the complexity of geomorphic lines. The purpose was to detect changes in coastlines complexity as a function of scale, so firstly it was an approach to analyse 1D data. Later Esbensen, Hjelmen, and Kvaal (1996) have inducted AMT into chemometrics for textural analysis of generic "measurement series" and have shown that this technique is also applicable for the images (2D data). Further successful implementation of AMT (Dahl & Esbensen, 2007;Fongaro & Kvaal, 2013;Fongaro, Lin Ho, Kvaal, Mayer, & Rondinella, 2016;Huang & Esbensen, 2000Kucheryavski, 2007;Kvaal, Wold, Indahl, Baardseth, & Naes, 1998) proved that the method could be used for images' texture description. AMT spectrums can also be used for interpretative purposes. The AMT is implemented as an in-house made ImageJ plug-in (Kvaal, 2014).

Image analysis by AMT
AMT is a transform-based texture recognition method, which in effect creates a completely new domain -scale domain. The AMT algorithm, regarding image analysis, can be described by five steps.
(1) Unfold the image (Kvaal et al., 2008;Mortensen & Esbensen, 2005) into a 1D vector of grey level values, also called generic "measurement series". Row by row or spiral unfold are most commonly used.
(2) Generate a set of random initial starting points, the number of these points should be chosen according to the image resolution.
(3) Develop the AMT complexity measures depending on a current value of the local scale (radius). The average value for a given number of starting points is calculated.
(4) Increase the local scale by 1 and repeat the last step specified number of times-manually selected value of scale.
(5) Transform the image into 1D spectrum, plotted as scale (radius) vs. measured mean angle (MA). Thus, the resulting complexity spectrum contains all the angle measures pertaining to all scales of the measurement series (unfolded image).
Further, in the analysis the AMT spectrums data were used as the X-input for multivariate statistical analysis. Obtained matrix consisted of 110 samples and 200 variables (scale value).

Mathematical modelling
Advanced mathematical and statistical tools are often used to explore, evaluate, model, calibrate and predict data from both laboratory-scale and full-scale experiments. Certain process can be optimized with the help of mathematical modelling. Multivariate regression analysis is a powerful statistical tool, which enables the ability to derive relationships between several input parameters and one or several output parameters. It is especially helpful when the initial data are broad, i.e. massive influencing X-parameters is huge and it is not easy to establish the relationships between different parameters by simple linear regression. Furthermore, in many cases, the input parameters are highly correlated.
The Unscrambler ® X 10.3 (CAMO Software AS, Norway) was used for the multivariate data analysis of 110 image samples.
Principal component analysis (PCA) was primarily used to explore the data, while principal component regression (PCR) and partial least squares regression (PLSR) were used to model the relations between input X parameters and corresponding Y responses. Models were first validated by cross-validation and then using a half of the data for calibration and half of the data for test set validation. The number of principal components (PCs)/factors was chosen according to the explained variance. Multiple linear regression (MLR) was used to predict the dosages by some GLCM feature vectors.
Linear discriminant analysis (LDA) with PCs was applied to obtain confusion matrixes of the histogram, GLCM, and AMT spectral data classification. Each data-set was divided into 3 classes according to total P efficiency (coded L, M, and H): L (low)-coagulant dosages which lead to low (20-50%) treatment efficiency after sedimentation; M (medium)-50-85% of total P removal; H (high)-above 85% of total P removal ( Table 1)

Images of flocs
The presented imaging technique shows the ability to investigate images of flocs during the process of coagulation-flocculation. Capturing images each 20 s gives an opportunity to observe and describe changes in aggregates formation. Figure 2 shows the evolution of flocs during coagulation. Even by human eye, there is a very noticeable difference between the images. At the very first moment after fast mixing, when mixing unit stops to change the rotation speed, first image was taken and it was called "zero seconds" of coagulation. Particles in water are quite small at this moment, initial flocs. However, within the next 20 s these initial particles rapidly form larger aggregates. The growth of flocs is distinctly visible until approximately 3-5 min of coagulation. After that, the flocs visually seem to be quite stable on their geometrical characteristics. However, according to the results of further image analysis, flocs are continuing to grow slightly with time (Table 2). Figure 3 shows the differences in flocs' structure due to coagulant dose (given in mmol Al/l). The figure describes relationships between dose and total P treatment efficiency. Images on the graph show how flocs looked during coagulation at some given time (400 s after the start of slow mixing). From Figure 3, the increase of coagulant dose results in a decrease of flocs' size, which is consistent with findings by Wang et al. (2011).

Coagulant doses and images of flocs
It is noticeable from the images that detection of physical characteristics (thereby fractal dimension) of aggregates in this particular case (image scale) is possible when coagulant dosages are quite low, but inapplicable for dosages with adequate treatment efficiencies. Closer to optimal coagulant dosages, flocs appear to be smaller, hence, they are overlapping in the image and accurate floc feature detection becomes inapplicable in this scale.
The results of coagulation are presented in Table 1.

Fractal dimension
Flocs' features as characteristics of particles were detected from the images of flocs for low coagulant dosages.
The results of one time series of experiments, dose-0.29 mmol Al/l, are shown in Table 2. Expectedly, the size (equivalent diameter) of particles is growing with coagulation time. The mean fractal dimension of initial particles is close to 2. As particles aggregate, fractal dimension rapidly decreases during the first minute of coagulation and then continues slowly decreasing. These results are consistent with findings of Chakraborti et al. (2003). Figure 4 shows the examples of 4 flocs (from 23,000 particles analysed for one dose) with different fractal dimensions.

Histogram analysis
Histograms provide the grey level distribution of pixels in the image. Images of flocs for each coagulant dose had their own typical histograms, so it was possible to distinguish which image's histogram corresponds to which coagulant dose. The results of classification are shown in Table 3.
PCA showed a poor explanation of X variance of 256 grey level histogram data. With 2 PCs 21.59% of calibration X variance was explained and 18.11% of cross-validation. Each next PC increased the explained variance by 1-3%.
The number of grey level values on histogram was decreased by 2 and 4 times, resulting in a better prediction of coagulant dosage. Thus, regression analysis of 64 bins histograms gave the best PCR and PLSR models, listed in Table 4.

GLCM analysis
PCR analysis was applied to all 20 GLCMs (5 different pixel steps and 4 directions) to find the best combination of pixel step and direction. X was set as 11 GLCM variables for each image, Y-coagulant dose. However, it was found that neither differences in pixel step nor direction give a significant change in coagulant dosage prediction by PCR. Cross-validation R 2 fluctuated between 95 and 96%, meaning that images of flocs are invariant and independent of the difference in pixel step. Hence, all further analyses were done for the GLCM obtained with the pixel step 1 and direction 0º.
The results of PCR model are shown in Figure 5. Coagulant dosages were quite well separated in the score plot with the first two PCs, where the total explained X variance equals 87.69% (PC1 = 73.82%, PC2 = 13.87%) for calibration and 84.17% (PC1 = 70.33%, PC2 = 13.84%) for crossvalidation. Validation method was based on systematic block cross-validation, which created 11 submodels where 5 samples (replicates of one dose) had been left out in each of the submodels. The total explained Y variance hence, the accuracy of prediction equals 95.50% (calibration) and 94.98% (cross-validation) with only one PC.
All variables in the correlation loadings plot ( Figure 5) fall within the circles, which depict 50-100% of explained variance. The uncertainties in various model parameters were estimated by jack-knifing for the model with 1 PC. The variables found significant are marked with circles in the correlation loadings plot and with stripes pattern in the regression coefficients plot. Ten out of eleven X variables are found to be significant at the 95% confidence interval. Variable "Entropy" has uncertainty limits crossing the zero line, so it is not significant at the 5% level for the model with 1 PC. For the real on-line applications, it is preferred to have a linear regression model as it is simpler to implement. The selection of variables and estimation of a relevant number of model inputs are important parts of the data analysis. In order to perform the correct variable selection, R console with the "NMBU" plug-in (Liland & Saebø, 2016) was used. Two results of the best subsets for a different amount of variables are shown in Appendix A. For instance, the detected best model with two variables "IDM" and "Variance" has the prediction R 2 = 0.967 and R 2 adj = 0.965. Since R 2 and R 2 adj are close to 1, the regression model shows the high agreement with the experimental results. With the increase in number of variables, R 2 and R 2 adj tend to slightly increase. However, the loadings plot ( Figure 5) shows that X variables are highly correlated and it is worth to keep as low amount of variables as possible. In this particular case, we would suggest choosing a model with two variables. Nevertheless, the best way of choosing the correct number of variables would be to make prediction analysis with the completely new test data-set.
The linear equation for the model with two variables is:   where β 0 -intercept, β 1 , β 2 -coefficients which determine the contribution of each independent variable to the estimation of the dependent variable (dose), ɛ random error. The coefficients were found by MLR with R = 0.983, R 2 (cal) = 0.967 and R 2 (cross-val) = 0.947, p < 0.001 and the final model equation was as follow: It should be noted that the big difference in the values of parameters ̂ 1 and ̂ 2 is caused by the difference in scale of variables "IDM" and "Variance" (Figure 6(a)). The comparison and equal significance of both variables can be seen from the plot "weighted regression coefficients" (Figure 5).
It is not easy to interpret the physical sense of this linear equation because the GLCM feature vectors are the second-order statistical parameters of the original images. When IDM = Variance = 0, the Dose is 3.09 mmol Al/l. However, the value ̂ 0 does not have a practical interpretation since under the normal conditions images of flocs have grey-tone pixel variations, thus, IDM and Variance differ from 0. When the IDM increases by 1, while Variance remains constant, the estimated mean change of Dose decreases by 2.44. When the Variance increases by 1,000, while IDM remains constant, the estimated mean change of Dose decreases by 0.9. The typical images with high IDM and Variance refer to low dosages (Figure 6(a)), which can be visually seen from Figure 3.
The results of MLR prediction are shown in Figure 6(b). The total explained Y variance equals 96.66% (calibration) and 94.74% (cross-validation) with 2 GLCM feature vectors-"IDM" and "Variance". It should be emphasized, that this research aims at showing that the texture analysis methods could be applied for determining the coagulant dose. The good repeatability of the results from the images of dynamic coagulation process should also be noted. Despite the high prediction results by PCR and MLR, the models have to be further verified and tested on the bigger data-sets, best-on the data from a wastewater treatment plant.

AMT
AMT is a potential tool for observing and describing images of flocs both for high and low coagulant dosages. It describes the texture complexity of an image, converting information of grey level values to descriptive spectrums. The spectrum shows the dependence of the MA (calculated for given number of random sample points, in our research it was set to 10,000 samples) in an unfolded image from the scale (radius). Figure 7(a) shows the AMT spectra (complexity) results of flocs images captured at the same coagulation time (400 s) for eleven coagulant dosages from 0.29 to 1.08 mmol Al/l. In a range of low coagulant dosages, the flocs tend to be bigger and detached from each other (see example images in Figure 3). AMT spectrums are not always easy to interpret. However, in application to coagulation, there are visible trends in images' complexities. For the flocs images related to low coagulant dosages the picks of AMT spectrums are lying within larger scale values and have lower measured angles (Figure 7(a)). In contrast, images of flocs associated with higher coagulant dosages have AMT spectrums picks lying in lower scale range and having higher measured angles. This trend-spectrums' vertexes shift to the left, will be easier to explain if we look into the graphs of images' unfold. Figure 8(a) illustrates the unfolding of the first row of an image related to low coagulant dose (0.29 mmol/l). The neighbourhood pixel grey level variations are not dramatic and the pictures of grey levels (which associate with the flocs on the image) are not very frequent. The biggest grey level pixel variations, thus high measured angles, between the neighbouring areas within the whole image occurs in the distance (scale) of approximately 90 pixels, which is visible from Figure 7(a). In contrast, the unfolding of the image associated with higher coagulant dose shows huge and frequent neighbourhood grey-tone variations (Figure 8(b)). Thus, the biggest grey level changes on the image occur in a distance (scale) of approximately 50 pixels, resulting in highest measured angles. Figure 7(b) shows the AMT spectra results for a time series of flocs images for one certain coagulant dose-0.3 mmol Al/l. For the first seconds of coagulation, AMT spectrums of flocs images are quite sharp, with the maximum angle values within low scales. This can be explained by significant grey level changes in neighbourhood pixel areas in the images due to the high amount of small particles. With time, while aggregates are merging and growing the differences in grey-tone pixel values of close areas are decreasing, resulting in a shift of AMT spectra's vertex to the right, thus to higher scales (bigger distances). From some time, changes in flocs images' textures are becoming almost invisible. The aggregates and their spatial locations change from image to image, because of continuing coagulation, but overall textural characteristics, in other words-images' complexities, remain quite stable, what is also visible from AMT spectrums. Spectrums related to coagulation time 160-1,180 s are overlapping, indicating the stability of images' textures, thus certain stability of flocs' geometrical characteristics. The tendencies described above were observed for all coagulant doses. Summarizing all above, the AMT spectrums regarding images of flocs can be interpreted as follows: the bigger the flocs and the better they are separated from each other, the more gentle is AMT slope of the spectra; the smaller the flocs and the higher their amount on the image (overlapping), the sharper the resulting AMT spectra, which situates in the lower scales.
The spectrums' data from time series images were processed in The Unscrambler ® by PCA. The results of PCA showed a good separation of "stable" images from those taken during the first seconds of coagulation. Three principle components resulted in 97.5% of explained calibration variance and 96.3% of cross-validation variance. Figure 9 shows the scores on PC1 of AMT spectrum data for time series of flocs images. Results indicate that after approximately 200 s of coagulation images of flocs become quite stable.

Mathematical modelling
LDA has shown that all three methods of texture image processing give quite good results of data classification. The information about the coagulant dosages and treatment efficiencies was not revealed during LDA, only information from the images was used for classification. LDA was based on score vectors of PCA for classification with estimation of prior probabilities. Confusion matrix results for three image analysis techniques are shown in Table 3.
Data based on histogram analysis of flocs images were classified with 91.82% accuracy (Table 3(a)) using three PCs. One out of ten samples of dose 0.51 mmol Al/l was classified as a low tot. P efficiency (class L) instead of medium efficiency (class M). Also, 8 samples of dose 0.70 mmol Al/l that belongs to class H (high tot. P efficiency) were classified to M.
Data from GLCM image analysis gave the classification accuracy of 90.00% using 3 PCs (Table 3(b)). The classification results are very much similar to the results described above for histogram analysis. However, two more sample images of dose 0.51 mmol Al/l were misclassified.
The highest PCA-LDA classification accuracy of 96.36% was determined for the AMT spectrums data of images of flocs using 2 PCs (Table 3(c)). Only 4 samples of dose 0.51 mmol Al/l from class M were misclassified to L. All 40 samples of high treatment efficiency (class H) were classified correctly. This is a very good result, showing that AMT is a perspective tool for image analysis of flocs-AMT spectrums contain necessary data for modelling and precise classification. AMT has the potential to be used for further image analysis of flocs and development of coagulation dosage control system.  Table 4. PCR and PLSR were used for mathematical modelling. Response Y is the coagulant dosages in mmol Al/l, Xs are matrixes of histogram data, GLCM feature vectors, and AMT spectrums. Data-sets were divided into two equal parts for calibration and validation of the model. Hence, calibration and validation sets consisted of 11 values of coagulant dosages and 5 replicates of flocs' images per each dose resulting in a total of 55 samples. The obtained matrices of histogram image analysis consisted of 256, 128 and 64 variables (bins). GLCM data was tested with 4 (ASM, contrast, correlation, and entropy), 6 (ASM, contrast, variance, correlation, IDM, entropy) and all 11 variables. AMT data consisted of spectrums with 200  points, which were set as variables. R 2 of calibration and validation data-sets was used to compare the obtained regression models.
PLSR model based on AMT spectrums showed the best result-96.19% variance explained by 2 factors for validation data. PLSR of GLCM data with 11 variables gave the same result-96.09% Y variance explained. Slightly lower validation R 2 (93.80%) was calculated for PLSR based on 64 bins histogram of flocs' images. All PCR models showed from 1 to 10% lower R 2 compared to PLSR models. This can be explained by the model calculation procedure since PLSR technique models relations between X and Y considering Y during the modelling process, while PCR bases only on X matrix.
All validated models showed quite good results with a minimum of 83% variance explained. It means that images of flocs correlate with coagulant dosages and that the dosages could be predicted by multiple statistical methods. Images of flocs for particular dosage are unique, thus, potentially could be used for dosage control. However, it should be further tested and proved through bigger data with the wider range of coagulant dosages and more sample points. Further analysis should be also performed for other initial parameters of model wastewater and/or industrial wastewater. Among the limitations of the texture recognition methods, the dependence on the lighting conditions should be stated. The described texture analysis methods are dependent on the greytone pixel values in the images, thus, to be able to compare the images it is important to have stable lighting conditions or perform mean centring of the images during pre-processing stage. Despite lighting conditions which might vary during laboratory scale experiments, with the full-scale installation it is most likely all conditions are set to a steady state, so this problem should not arise.

Conclusions
Changes in flocs' structure depending on the coagulant dosage and treatment efficiency were observed.
According to the treatment efficiency, the 11 coagulant dosages were categorized into three classes for further PCA-LDA. Image analysis by AMT showed the best classification results.
The image analysis method which bases on fractal dimension of flocs has been widely studied. We have observed that there are some difficulties with the application of this method for highly contaminated water-such as wastewater.
Various texture image analysis methods, applied in the other scientific fields were tested for wastewater coagulation. GLCM and AMT were found to be the most promising methods of flocs images identification and correlation with coagulant dosages.
Ten out of eleven co-occurrence matrix-based texture features were found to be significant for dosage prediction. The linear regression model was built based on two of these feature vectors. IDM and Variance were found to adequately describe the flocs' images and their respective dosages.
Texture image analysis methods are easily computable and potentially could be used together or instead of conventional object-recognition image analysis techniques in order to simplify the image analysis methodology, especially in cases when the particles detection is complicated by the initial environment and could lead to under-or overestimation of the flocs' parameters.
Concluding, texture image analysis methods and regression models are the way to correlate images of flocs with dosages and treatment efficiencies, which enables optimal coagulant dosage control.