Perspectives of optical colourimetric sensors for anaerobic digestion

Although biogas is not a new approach to producing renewable fuel, it could further be developed to improve its potential as an alternative energy source. To achieve this, vast improvements in the efficiency and cost of biogas production are essential. These enhancements require detailed systematic monitoring to attain a near-optimal biogas production process. To date, there is a striking imbalance between the inherent biological complexity of anaerobic digestion, and the minimal information currently measured on-line. The objective of this review is to discuss how improvements in availability and cost of sensor technology used for determining the key compounds and their dynamics within the biogas processing plant will facilitate the further understanding of the biogas production process, preventing the biological process failure. In particular, colourimetric assays (sensor assays based on coloured dyes) for variable detection in anaerobic digestion provide a stable, multivariate system for the detection of Volatile Fatty Acids (VFAs), but also provide a much deeper insight into the process by assessing other parameters, which, to date have never been measured on-line. These sensor improvements will allow the biogas production, even on a small scale, to be guided in the optimum direction, avoiding the biological process from collapsing. This will result in improved efficiency, at a reduced operational cost. The potential of colourimetric assay methods for use in anaerobic digestion as a sensor technology with associated data analysis methodologies has not previously been observed. Here, a 23-dye colourimetric sensor array was experimentally assessed to exhibit the differentiation of 10mM acetic acid, 5 mM propionic acid and 0.3mM butyric acid. The feasibility of on-line, cost-effective, rapid, and efficient detection of VFAs together with other key parameters by these colourimetric sensor arrays is intended to be assessed to advocate their usage in AD.


Introduction
One route to production of renewable, clean energy is through biogas production, where biogas is derived from biological waste materials. Biogas is produced through anaerobic digestion (AD) of organic waste materials by a variety of microorganisms [1]. The inherent complexity of this ecosystem resulting from the dynamical interaction of several hundreds of bacterial and archaeal species [2], developing on a mixture of substrates, results in significant difficulty in determining the state of the digestion process [1].
The most straightforward information is provided by monitoring of gases (H 2 , CH 4 and CO 2 ) measured in the gas-phase of the anaerobic digester. Their fluctuations can provide information about the digesters' productivity [3]. Measurement of pH is also important. A pH outside the range of 6-8 often indicates process deterioration, limiting methane production and eventually collapse of the entire biological process [4].
Concentrations of volatile fatty acids (VFAs; mainly acetic, butyric and propionic acid) have been suggested as useful control parameters, as these acids are indirectly indicative of the activity of the methanogenic consortia [3,4]. VFA accumulation can be interpreted as organic overload or inhibition of the methanogenic microbial communities [5]. Acidogenic microorganisms transform hydrolysis products into VFAs, while acetogenic microorganisms then convert VFAs into acetate, H 2 and CO 2 . Methane is then produced by the methanogens [6], with the majority being produced by acetotrophic methanogens, transforming acetate into methane and CO 2 [7]. VFA concentrations are a required control parameter for biogas production monitoring [8,9], and it is essential that VFAs are monitored to understand the biological process. Additionally, the ratio of VFAs over total alkalinity is often suggested as an indicator of the processes stability, with the objective of this ratio remaining below 0.3 [10]. concentrations around 10, 5 and 0.3 mM for acetic, propionic and butyric acid, respectively. Furthermore, imbalances within the reactor can result in spikes up to 30, 20 and 3 mM; however, the specific concentrations of VFAs suggesting imbalances does differ between different systems and feedstocks. It has also been suggested that propionate is the main VFA indicating the stress status of the AD process [11]. Therefore, the discrimination between individual VFAs is essential and developments in sensor systems that can differentiate VFAs without a substantial complexity and cost are required. Cost-effective sensor technologies for monitoring different VFAs species within the reactor are crucial for improving efficiency, productivity and cost of biogas production. There are many methods for achieving monitoring of a variety of parameters within bioreactors [12][13][14][15]. These mainly rely on sample extraction, but in some cases, the sensor can be interfaced directly with the internal environment of the bioreactor. Despite this, there are still improvements required to make cheaper sensors with higher accuracy and specificity. Ideally, a monitoring system must have a set of sensors coupled to a treatment phase within a software program, where the measurements are carried out automatically with limited human intervention or expertise. The data can then be combined with numerical models to update an algorithm diagnosing the state of the digester and detecting erroneous working modes [10].
Optical-based chemical sensors (colourimetric sensors) appear to have the potential of providing such additional crucial information. They are low cost, require relatively simple instrumentation and straightforward sample preparation, and can be integrated within existing control systems. In general, there can frequently be a trade-off between sensitivity and robustness, and their use in bioreactors may be severely challenged by limited selectivity, repeatability, robustness and stability [16]. One path around this is to apply the artificial noses and tongues concepts [16,17]. Each sensor in the array is only partially selective but has distinctive responses to the various chemical entities of interest. Adding a multivariate chemometric tool results in quantitative responses for each entity. A few attempts have been made to apply artificial tongues, based on electrical or electrochemical sensors, for detection during AD [18,19]. However, the limited success of these technologies is due to the complex and poorly reproducible composition of process media [19], resulting in sensor contamination and biofouling.
The development of artificial tongues based on optical sensor arrays is another option [17]. For optical sensors, the analyte interacts with the sensor material, resulting in changing the materials optical properties. The sensor array is a combination of a variety of dyes, each with different specificities for different analytes ( Fig. 1 and Table 1). Light probes the sensing material. A wide range of optical techniques are available for probing (e.g., refraction, scattering, reflection). Probing of multiple properties can be used to enhance sensor performance. Successful application of optical tongues to metal ions, food and beverages, amino acids, proteins, bacteria, cancer and disease diagnostic have been reviewed [17]. Despite the inherent advantages, optical tongues have so far not been sufficiently considered for AD monitoring.
In this review, a contemporary sensor technology that can combine AD with on-line data to accurately picture the state of the complex anaerobic ecosystem is proven, discussed and compared to current industry methods. Overall, the contemporary technologies linked with the analysis methods presented have the potential for improving the efficiency, economic costs and productivity of biogas plants.

General principles of anaerobic digestion sensing
There are different ways that sensors can be incorporated into the biogas production plant to fulfil the on-line monitoring requirement. The sensors used for measurements of variables within an anaerobic digester can require the extraction of samples from the bioreactor, filtration, and external measuring (at-line sensor), the sensor to be within the digester (in-line sensor), or for the sample to be extracted from the digester and analysed in a laboratory (off-line sensor) [20]. Moreover, at-line sensors are those that are connected to external sampling loops or extraction piping, where the digestate will, in theory, represent the internal environment of the digester, but is technically no longer in the internal environment of the digester. The alternative is to utilise an inline sensor, that is a sensor which is interfaced directly with the internal environment of the digester, giving the most accurate, up-to-date sensing of the internal environment of the digester. Furthermore, measurements can be either on-line or off-line. This refers to the time taken for a measurement process, and whether it requires extraction and measurement outside of the digester/sample-loop environment. A measurement that can be rapidly and continuously detected in either an at-line or in-line configuration is considered on-line, as it provides 'live' data. A measurement that requires extraction of a sample, external  1. Schematic Representation of a Colourimetric Sensor Array. This representation shows a 36-dye colourimetric sensor array before exposure (left), after expose (middle) to a specific analyte, and a difference map (right). The difference map is achieved by subtracting the two images (before and after exposure), resulting in a difference vector in 108 dimensions, (changes in 36 dye colours in the green, red, and blue colour channels).
processing of the sample and finally measurement is considered an offline measurement. This is because there is a significant time delay between sampling and measurement.
The in-line sensors must be resistant to pressure (slightly above atmospheric) and temperature (between 35°C and 55°C) fluctuations, as well as being able to be cleaned (in some cases this means autoclaving, or γ-irradiating the sensor). Furthermore, they should be substantially resistant to interference by fouling. Sensors should provide appropriate quality, having high accuracy, specificity, sensitivity, and quantification. The data provided by sensors requires an appropriate data analysis method that can be automated as much as possible to reduce the requirement for expertise. After data analysis, the computed measurements should produce great consistency, and be robust, stable and linear.
For live (on-line) monitoring of a biogas plant to be useful, the response time of the sensors is an important characteristic [20]. Although the delay for measurements is specific to each sensor and application, the time required to obtain a result must be small (< 10 min) relative to the essential process dynamics. More specifically, the time required to obtain a result will depend significantly on the retention time of the AD reactor, analysis duration and the dead volume of the filtration system used. Moreover, it must be much faster than the time required for the accumulation of VFAs to occur in order to detect a stress event. Therefore, the sensor system must be tailored to the process with adequate sampling locations. Otherwise, significantly low efficiency of the process control will occur [21].
The location within the digester is essential, as the digester can be either homogeneous or inhomogeneous. This depends significantly on the feedstock type, process size and AD technology. Moreover, there are AD technologies that are intended to have inhomogeneous chemical gradients within the reactor (e.g., up-flow anaerobic sludge blanket (UASB) & fluidised bed reactors), and this is exacerbated with the increase of reactor size. If the reactor is homogeneous, the sensor can be placed anywhere within the bioreactor, and the signal will represent the actual state; however, if it is inhomogeneous, many sensor systems and arrays may be needed to interpret the condition of the digester fully.
There are three main phases (solid, liquid and gas), inside the anaerobic digester. These phases have different properties, variables and requirements when considering in-line sensor technologies. The liquid phase of the anaerobic digester is a complex mix of various organisms, substrates, nutrients, products, metabolites, and dissolved gases. Due to the diffusion rate of components from liquid to gas, it is desirable to make measurements directly from the liquid phase of the digester. This can drastically reduce any lag time between changes in the digester dynamics and detection of these changes.

Traditional VFA sensing
The most common techniques used to assess digester state is the measurement of the VFA bulk by titrimetry (off-line), various gas chromatography (off-line) techniques and Near-infrared (NIR) and infrared (IR) spectrometry [5].

Titrimetry
The cheapest method for VFA detection is through automated titrimetric units [22]. Titrimetry is a standard laboratory method of quantitative chemical analysis that is used to determine the unknown concentration of an identified analyte. Titrimetric methods of analysis allow the quantification of VFAs without the need for extensive sample filtration (depending on the consistency of the digestate). These can take a sample from the anaerobic digester, perform a few straight-forward sample treatments (typically filtering of centrifugation), then use titrimetric techniques to determine the acidic concentration of the sample. A computer is used to control the titration analysis, allowing automatic VFA quantification, as well as automatic adjusting of the protocol based on expected VFA concentrations. The robustness and simplicity of the automatic titration procedure allow it to be readily applied to any AD process to perform total VFA analysis. Furthermore, off-the-shelf units for measuring the VFA concentration in AD are readily available. Despite this, the accuracy of titration methods, when compared to analytical laboratory measurements, is still debatable [23][24][25], and the quantification of individual VFA is unable to be performed by this method. There are many interfering components present in large quantities in AD, including acids or bases (e.g., lactate & phosphate), that can probably explain the reduced accuracy of titrimetric methods [23][24][25].

Gas chromatography
In analytical chemistry, gas chromatography (GC) is used as a technique for the separation and analysis of compounds. Analytical measurements using GC methods produce highly accurate results, and any method of VFA detection should always be calibrated using a GC method. These sensors are generally off-line and in a controlled laboratory; however, there have been examples of their integration to AD reactors for VFA analysis. A gas chromatograph connected to an anaerobic digester has been shown to allow automatic VFA quantification with a frequency of 15 min in the range from 0 to 3000 mg L −1 of all the short-chain VFAs [26]. This form of measurement requires in situ filtrations of the digestate before analysis, and although the method was automated, intervention from technicians is needed daily to maintain operation. Attempts at removing the requirement for filtration methods that are susceptible to biofouling have been made by replacing the filtration step with a sample pre-treatment cell [27]. This pre-treatment adjusts the temperature, pH and ionic strength of a sample to lower VFA solubility, obtaining a gaseous sample of VFAs for analysis. This system has been shown to deliver on-line determinations of short-chain VFAs from a laboratory-scale AD reactor fed with cow manure for over six months [28]. It is now the basis for many non-filtration automatic GC sampling methods used to date for detection of VFAs in anaerobic digesters, but due to the requirement of a GC machine (high cost associated with machine purchase, maintenance and expertise), this approach is not practical for smaller scale biogas plants.

Infrared and near-infrared spectroscopy
Near-infrared and infrared spectroscopy (NIR and IR spectroscopy, respectively) involves the transmittance of infrared electromagnetic radiation through a sample. This form of spectroscopy can use a range of techniques (mostly based on absorption spectroscopy), and it can be used to identify and study many chemicals within the digestate. Both organic and inorganic compounds have specific spectral vibrational signatures within the IR spectrum, and the more of these vibrational modes that can be excited, the higher the specificity of analyte  [29]. IR spectroscopy, therefore, can offer rapid, sensitive, and robust multi-analyte data from the anaerobic digester. Furthermore, IR sensors are straightforward to incorporate into an in-line measurement system that is also on-line, using mostly non-invasive technologies via direct beam or optical fibre methods in a recirculation loop [30]. NIR spectroscopic methods are based on the combination and overtones of vibrational modes. Targets of interest are alcohols (O-H bonds), aliphatic and aromatic carbon compounds (C-H bonds) and proteins (N-H bonds). Glucose, biomass and various biological process products are therefore suited for monitoring by NIR [29,31]; however, the spectral bands are much broader in NIR and end up overlapping, therefore reducing the specificity [32]. The signal-to-noise ratio in NIR can be improved by increasing the path length [32], but mid-IR can give a more quantitative and precise measurement of molecules in biological processes.
Non-invasive spectroscopic methods using NIR have been shown to give an overview of the total VFA content within the AD process [33,34]. Furthermore, NIR spectroscopy has been demonstrated to determine VFA concentrations within industrial AD reactors [35,36]. Attempts have been made to reduce the cost of this methodology by removing the requirement for ultrafiltration by use of sample sedimentation and 150 μm filtration [33]. This method resulted in high uncertainties in the detected VFAs due to their low concentrations in the samples, and therefore low absorption in the spectra [33]. Furthermore, due to the complexity of the digestate, an extensive calibration of the spectroscopic procedure is required for the specific AD process [33]. The calibration issue hinders the tremendous advantage of these spectroscopic approaches as the nature of the feedstock generally varies over time, and expensive re-calibrations must be carried out frequently. Nevertheless, using this sensing technology, process upsets and failures were able to be documented by the analysis of the spectra of a 6-month AD process [35].

Remarks
These methods suffer from some disadvantages concerning sample preparation and biofouling [5]. Near-infrared and infrared spectrometry have been proven to be a real alternative, but they require an advanced system for automated sampling, sample transfer, and filtering to enable sufficiently sampling (< 2 h) [34]. Ultrafiltration of the digestate samples is required to produce a clear liquid free of particles for GC, NIR and IR spectroscopy. In the long term, this ultrafiltration will result in the biofouling of the filtration unit, a property that needs to be considered. Table 2 gives an overview of some VFA detection technologies in AD and assesses their benefits and drawbacks. The criteria used for this assessment include accuracy of the technology, differentiation between different VFAs, required pre-processing of the sample, requirement for human expertise during the measurement, post-sensor data computation and analysis, overall analysis duration, initial cost of the technology, usage cost over a year period, extendability to detect other variables and the technology readiness level (TRL) with respect for the technologies use for VFA detection in AD.

Software sensors
The information provided by some of the on-line sensors can be expanded thanks to mathematical approaches. For this, there are two types of strategies. The methods based on data analysis and those relying on a mathematical model representing the plant dynamics.

Observers
Software sensors are used to provide an on-line state estimation for accurately determining the state of the process, especially by estimating non-measured yet crucial variables. They are also vital in supporting a closed-loop control strategy. This approach (also named state estimators or state observers), is supported by a great deal of theoretical background. The software sensor is a way to assimilate the real-time data from the plant on-line, combine them with the theoretical process knowledge embedded into a dynamical mathematical model and eventually predict variables that are not measured or only available at a low sampling frequency. Different strategies for designing a software sensor exist, which depend on model accuracy, sensor information, reliability and sampling frequency.
The most popular approaches are an extension of linear frameworks. The extended Kalman filters (EKF) and extended Luenberger observers (ELO) are the most famous algorithms [37]. However, since these estimators require a local linear approximation of the process model, stability and convergence properties are generally not guaranteed over a wide operating range. Moreover, these algorithms assume perfect knowledge of the model and its parameters. It is often sensitive to inaccuracies in the model parameters. Different strategies have been developed to reduce the dependency on parameter uncertainty. They consist of basing the estimation scheme on the mass balance information. Mass balance partially represents the process, while dealing with the missing information in a robust way. In general, the gaseous flow rates support an on-line estimate of the kinetics of some of the model processes using the so-called asymptotic observer [38]. They rely on a change of variables cancelling the nonlinear terms of the systems, ending up with a linear observer [39]. The main drawback of this asymptotic observer is that the dilution rate pilots the convergence rate. Adaptive observers consist of simultaneously estimating model parameters and process states [40]; whereas, nonlinear observers are tailored to the process model using its parameters, where the algorithm accounts for the full process nonlinearity [41]. The complexity of the resulting observer algorithms is a limitation for its implementation and its calibration.
concentration is inherently a difficult factor to estimate the process state accurately; however, Jauregui-Medina et al., 2009 proposed a simultaneous input-and-state observer to estimate the influent concentrations [46]. Moreover, some of these observer designs have been extended to deal with spatial distributions within the reactor [47,48].

Multivariate process analysis
Multivariate data analysis uses a variety of methods from a range of scientific fields (e.g., computer science, statistics, and applied mathematics). These methods, in turn, allow interpretation of the massive amounts of data obtained from the bioprocess and support the detection of unusual working modes and their diagnosis.
Classification using principal component analysis (PCA) is a powerful way to determine the status of a bioreactor [49]. PCA allows the reduction of high-dimensional data to fewer, linearly independent components [20,50]. Assessment of the principal components can then be achieved by this method via assessment of the variance that is induced by the changes occurring within the bioreactor. Generally, PCA is used to analyse the difference, distribution, or structure of data sets to identify outliers and abnormal working modes. One typically seeks some new orthogonal dimensions sufficient to encompass at least 95% of the variance. Plots using the resulting set of principal components are often easier to visualise than the original dataset, but only if the original dataset has a low dimensionally in a statistical sense. Support vector machines (SVMs) can also be used to classify the on-line data, allowing the separation of the data by as much as possible [51,52]. Descriptions of the main changes within the bioreactor process that can be seen in the data can be made using these SVMs.
Like hierarchical cluster analysis (HCA), PCA is an unbiased method that is best suited for evaluation of data sets rather than prediction; however, PCA can make basic prediction methods possible, primarily if the data set has low dimensionality and has a significant separation among sample classes. The constraints of a PCA process also allows possible monitoring of the courses of various unspecific variables, as represented by the most relevant spectral changes [53]. Furthermore, the process trajectory can be identified eventually from similar 'ideal' bioprocesses. If the data set does not have a significant separation, PCA may not adequately be able to predict the identity of an experimental sample.
In addition to the methods outlined here, there are many more analysis methods that can be applied to multivariate process analysis. Gomes et al. [54] provide an in-depth assessment of multivariate methods for process analysis and how to incorporate them with bioprocess technology to achieve process analysis and control.
It is difficult to compare different theoretical models since they are all tailored to different situations and supported by different on-line information. As a rule, in all cases, software sensor algorithms should utilise all the available information and be tailored to the specific task that is desired to be achieved. For further discussion on mathematical model selection, control and instrumentation in anaerobic digestion, reviews by Jimenez et al. [55] and Donoso-Bravo et al. [56] provide highly detailed overviews.

Emerging colourimetric sensor technologies
Colourimetric sensors (optical chemosensory nose or tongue systems) consist of a matrix-embedded indicator that reacts with a specific analyte [57]. The matrix immobilises the indicator, with the immobilised indicator near a light source (usually a light emitting diode), which is directed to illuminate the sensor matrix, while an imaging unit is positioned above the matrix (Fig. 2) [14,58]. The interaction of the indicator with the analyte can cause a change in the optical properties of the indicator (e.g., changes in absorption, reflection, fluorescence or photoluminescence), which can be correlated to the identity and concentration of the analyte. Disposable sensor patches can be used as the chemical indicator-containing matrix, allowing a straightforward modular setup of the sensor. These can be placed directly into the anaerobic digester (in-line) or in an external closed-loop (at-line) configuration. A small window can be incorporated into such a setup, allowing an external optical component to illuminate and image the chemical indicators.
The interaction between the indicator and the analyte should ideally be of a chemical nature rather than that of a physical nature (e.g., temperature). This is because chemical interactions provide a higher dimensionality than physical interactions, leading to higher sensitivities of analytes. Interactions of interest for colourimetric and fluorometric arrays can be grouped into five classes. These are Lewis acid-base dyes, Brønsted acid-base dyes (i.e., pH indicator dyes), sizeable permanent dipole dyes for local polarity detection, and hydrogen bonding (i.e., solvatochromic, vapochromic, or zwitterionic dyes), redox-responsive dyes and chromogenic aggregative colourants [17]. Table 1 displays some of these groups of dyes.
During sensing, the illuminated colourimetric sensor array pad is imaged digitally on a regular basis. A difference map is generated comparing the original image before the AD process began, with the current image (Fig. 1). The difference map is achieved by straightforward digital subtraction of the red, blue, and green colour channels of the digital image, allowing the sensing of complex mixtures. The dye concentration and spot intensity have the potential to create discrepancies between individual sensor pad arrays; however, the use of the differences in RGB colours when analysing reduces these differences significantly [59]. Furthermore, the ease of visualisation of the colour changes in colourimetric sensor arrays is advantageous for analyte identification.
Recent developments in optical noses and tongues suggest that it may be possible also to determine and quantify individual VFAs using a colourimetric sensor [17,[59][60][61][62][63][64]. No previous observations have been made assessing the ability of colourimetric sensor arrays for the detection of VFAs present during AD. This was assessed experimentally using a 23-dye colourimetric sensor array with the digestate from AD. The digestate was split into three separate anaerobic vessels and spiked with acetic, propionic and butyric acid to concentrations of 10, 5 and 0.3 mM, respectively. The colourimetric sensor arrays were exposed directly to the raw (unfiltered) digestate from each vessel for 10 s, then imaged. After computational processing of the images, PCA of the data was performed to determine if the colourimetric sensor array could differentiate between the three VFAs (Figs. 3 and 4). It was observed that a clear differentiation between the three VFAs in raw digestate from an AD process could be achieved in lab conditions at realistic VFA concentrations. Furthermore, this would allow the detection of VFAs before they accumulate to concentrations that may have a negative impact on the production of biogas, preventing the collapse of the biological process.
Ammonia can also be measured by the use of colourimetric sensor analysis, where it can be detected below 50 ppb [65,66]. Another recent method developed uses colourimetric assays, which have been designed to sense H 2 S [67]. Dissolved CO 2 [68,69] have been observed to be detected using optical tongue methods to sensitivities of 0-20%. The measurement of pH using coloured dyes is a well-established technique and has been successfully established in bioprocess monitoring [58]. This is also apparent for sensing hydrocarbons, suggesting there may be potential for also sensing methane [17].

Calibration of colourimetric sensors
Reconstructing the concentrations of different species combining the signals extracted from the colourimetric sensors is one of the main challenges. The idea consists of using a learning data set, consisting of measurement data of compounds with known concentrations and chemical or biological species.

Data consistency checking
Before these methods can be undertaken, confirmation of the consistency of the data set must be completed. This means that the comparability of the different data sets must be established for successful data analysis. This process usually consists of a variety of pre-processing methods of the data sets, including deviation [70], filtering [71] normalisation, standardisation, centring, weighing and scaling [72]. The pre-processing of data is a powerful, yet sensitive component in data analysis [73], where the chosen method depends on the data used. Relevant information is the next target for the pre-processing of the data. The exclusion of unnecessary information within the data set (e.g., overlay of bands within the base-line spectra), is essential. The exclusion can be achieved by using methods of normalisation or baseline subtraction/correction, eliminating unnecessary data. Multivariate data analysis can then be performed after the pre-processing.

Advanced calibration
Advanced calibration is then essential to be able to predict, for unknown samples, the concentration of the different constituents it contains. This calibration stage for colourimetric sensors is very close to the requirement for IR spectrometry calibration, both of which can produce multivariate data that needs to be analysed for interpretation.
Colourimetric sensors have high dimensionality due to the large number of different chemical properties that the different dyes can sense [17]. To differentiate among all compounds and possible mixtures requires highly multidimensional data. There are a variety of statistical methods available to deal with high dimensional data well beyond the scope of this review [74,75]. In general, for chemometric data, there are two distinct statistical approaches: clustering vs classification [74][75][76]. Cluster analysis essentially tells one what resembles what, e.g., how close the vectors representing data are to one another in a high dimensional space. Classification analysis, on the other hand, attempts to predict to which category (among a fixed number of known categories) any particular (new) datum belongs.
Statistical methods can be either biased or unbiased (or model-free). Unbiased methods are typically used to evaluate a data set to provide a semi-quantitative idea of the quality of the data set and follow simple, straightforward algorithms. Biased methods, on the other hand, can provide significantly more power and utility with a concomitant increase in complexity, but at the cost of demanding datasets for which one already knows the answers. Biased methods can also be predictive, allowing for class assignment of new experimental cases by using a training set. The three most common approaches are HCA, PCA, and linear discriminant analysis (LDA).

Calibration with principal component analysis
Colourimetric sensor data often requires only two principal components to express a natural variability among the data, regardless of the number of different sensor dyes in the physical array. High reproducibility and safety of the bioprocess can be provided using this type of process control [77].
When dealing with a broad range of analyte classes, a sensor array designed with high dimensionality is highly desirable. If one is examining a narrow class of analytes, the then apparent high dimensionality of a sensor array becomes indicative of significant amounts of noise relative to total variance. Consequently, the dimensionality of the data is not determined directly by the number of different sensor dyes in the colourimetric array. Using a colourimetric array containing 16 redox-sensitive dye formulations to detect powerful oxidants and peroxy-based explosives, a PCA revealed that only two dimensions, as opposed to 16, were required to reach 95% variance Fig. 3. Scree Plot. Typical scree plot is showing the Eigenvalue of each component using data from a 23-dye colourimetric array. Each dye has three colour channels (red, green and blue), resulting in 69 components. The Eigenvalue is used to display the variance observed in each component, and the larger the magnitude, the higher the variance. Fig. 4. Score Plot. The score plot of a PCA of a 23-dye colourimetric array in response to butyric, propionic and acetic acid in anaerobic digestate. The plot uses the first two principal components determined by the PCA to visualise the results of the colourimetric array dye-colour changes in response to the three acids assayed. [78].
The number of dimensions needed to assess 95% variance within the sample provides information about the range of analytes being probed, as probes for one analyte may only need one dimension to achieve 95% variance, whereas probes from multiple analytes will require multiple dimensions to achieve 95% variance [17]. PCA is thus a powerful tool for calibrating sensor arrays, especially those with multiple disparate components, as it allows some insight into the sensor's chemical reactivity. A scree plot (Fig. 3), showing the cumulative contributions of each principal component, provides a quantitative measure of the contributions of different orthogonal reactivities to the variance of the array response. A score plot (Fig. 4) of the first two principal components can then display each measured variable, and further group them.

Calibration with hierarchical cluster analysis
The agglomerative clustering technique of HCA determines clusters from the Euclidean distance between experimental data. The 'nearestneighbours' are paired into a single cluster that is then paired with other 'nearest-neighbours' until all variables are connected [74,75]. The resulting dendrogram shows connectivity and some measures of the distance between each of the pairs. In the context of chemical analyses, these two essential pieces of data answer two questions, the connectivity explains relationship similarity, and distance explains magnitude. This analysis technique is commonly observed in evolutionary genetics, where different species can be related to one another through their connectivity and closeness. There are three primary limitations to the HCA technique. The first involves fundamental limitations of all unbiased methods, as HCA is not readily capable of predictive analysis. Second, dendrograms created using HCA must be re-created with each addition of a new analyte, so comparing dendrograms is typically only useful for rough qualitative purposes. The third limitation is that of interpretation of noisy data. Dendrograms can have rotations around clustering axes that do not represent meaningful differences.

Calibration with linear discriminant analysis
Like PCA, LDA is a dimensional reduction technique that constructs a set of orthogonal dimensions used to describe the data; however, LDA seeks to find a set of dimensions that best separates data into already known classes, rather than merely describing the total variance. Unlike HCA or PCA, LDA is a biased method. Statistical analysis using LDA requires inputting a class label for each sample. Components of each dimension are ranked to maximise the ratio of between-sample variance to within-sample variance. LDA can be used to predict the identity of unknown samples by using a training set, similar to PCA; however, because the dimensional components are optimised to maximise differentiability, LDA will show better ability to differentiate among sample classes. The primary weakness of LDA is related to the sample size. The covariance matrix tends to be unstable when the sample size is not significantly larger than the number of sample classes being analysed, and this is more problematic for high dimensional data. Consequently, LDA can give drastically fluctuating results with small sample sizes (compared to PCA or HCA, which can be unreliable with small sample size, but not unstable).
Tensor discriminant analysis (TDA) is an improvement on LDA [79]. TDA is an array generalisation of LDA that can take advantage of high dimensionality. It is used to classify multi-way array measurements, rather than one-way vector measurements [79]. Data collected using a colourimetric sensor array can be viewed as a 3-way tensor (the first way corresponding to the choice of the dye, the second way corresponding to the effects of the colour changes and the third way corresponding to time progression) [80]. Tensor discriminant analysis can significantly improve the sensitivity, specificity, and computational efficiency of discriminant analysis methods because of the dimensionality reduction.

Microorganism strain detection
Strain screening can be advantageous to determine the direction in which the AD process is heading. Specific microorganisms produce specific chemical profiles; therefore, it may be possible to determine the dynamics of each microorganism group within an AD reactor by sensing the metabolites produced. Use of colourimetric sensor arrays has been proven to be able to differentiate and identify a variety of microorganism species and strains based solely on the metabolites of their growth [81][82][83]. Although the environment within an anaerobic digester is very complex, it may be possible to target particular metabolites that are produced by the microorganisms of interest, allowing monitoring of the microbial environment.

Other compounds of interest in anaerobic digestion
Methanethiol is a robust and toxic odorant present in biogas produced by AD. It has been suggested that this may also be able to be detected using colourimetric analysis [17]. Glucose and fructose are common carbon storage molecules involved in biological processes. These have been shown to be detected using colourimetric analysis and can be adapted to detect glucose and fructose during AD [84][85][86]. Although there is a large variety of analytes that colourimetric sensors can detect [17], this sensor technology still requires evaluation in biogas production monitoring. Table 3 summarises some of the possible analytes that could be detected in AD.

Perspectives and challenges for the development of colourimetric sensors
The main limitation of a colourimetric sensor in AD is that the sensor provides a composite response to a complex mixture. Moreover, a component by component analysis of the AD process is not obtained Strain screening A colourimetric array with a variety of dyes was shown to detect metabolites produced by ten different bacterial strains. N/A [81][82][83] due to the nature of the colourimetric sensors cross-reactive sensor array, the natural assumption with complex mixtures, characteristic of AD, that a complete quantitative analysis of each component must be achieved. Contrary to this, in such complex mixtures, there are many analytical achievements desired. These are generally to do with comparisons between AD reactors to identify if they are the same, whether there are a few specific components within the mixture that have changed irrespective of the continuous complex mixture, or whether the material was processed correctly before measurement to achieve an accurate result. In contrary to this, the main strength of a colourimetric sensor in AD is that the composite response of the sensor array in a complex mixture allows the simplification of fingerprinting the AD mixture. As outlined previously, many recent advances in such technologies have resulted in the differentiation between largely similar complex mixtures by distinct colourimetric fingerprints.
Colourimetric sensor arrays also provide non-uniform intrinsic responses to analytes by probing their chemical properties rather than their physical properties. This makes such sensor arrays ideal for detecting volatile and toxic components, due to their high reactivity, making them straightforward to detect, even at extremely low concentrations. Concerning the AD process, one such component is the H 2 S content of the biogas produced. This is a very toxic and potent compound that is very reactive, with its measurement at low concentrations still challenging. This reactivity can be exploited by colourimetric sensors, allowing easy chemical detection. This enhanced detection of volatile and toxic compounds by colourimetric sensors eliminates many issues seen with traditional electronic noses and tongues, and solidstate chemical sensors, such as false-positive identifications. A disadvantage of this is when detection of less-reactive components are desired. Although by principle these are less reactive in their natural state in the AD process, a suggested way to overcome this is to pre-react the components to produce forms that are more reactive. For example, the partial oxidation of the components, resulting in products with a higher reactivity, allowing for straight-forward detection by colourimetric sensor arrays [87]. By improving the selectivity of such methods, the capabilities of colourimetric sensor arrays would be significantly improved, especially when dealing with such complex mixtures with a high level of interferents as seen in AD.
When applying colourimetric sensor arrays to the liquid-phase of AD, the potential solubility of the dye probes becomes a significant problem. Although the complex AD mixture requires direct access to the sensor dyes, these dyes cannot be allowed to dissolve into the AD mixture. This can be overcome by improving the immobilisation of the sensor dyes into sol-gel formations; however, this can also result in unfavourable results in the distribution of the dye within the sol-gel matrix. Another way to avoid this problem is by performing analysis of the AD liquid mixture parallel to the process in an at-line or off-line protocol. However, this could be a time consuming and undesired methodology, although automation of this method could yield sufficient results in some cases. Despite this, it is suggested that improvements in the attachment of the sensor dyes to a substrate to immobilise them in a liquid environment is the desired method for realising an online colourimetric sensor for AD.
The high turbidity of the liquid phase of the digestate is a challenge for optical detection of the colourimetric sensor arrays. If not compensated for, this would result in an ineffective sensor system. To avoid this, the sensor array can be designed to have the coloured dyes on a glass array. This would allow the dyes to be accessed through the back of the glass array, avoiding the problem related to high turbidity. This would also avoid any disruption caused by solid particles and fibres to the imaging of the sensor array.
The integration of colourimetric sensor arrays into an AD would allow direct detection of the chemical compounds within the gas and liquid phases. This direct integration (whether directly placed within the reactor, or near a port to the internal space of the reactor), and the limited time required to achieve analyte-sensor dye interactions, will significantly improve the ability to have instantaneous measurements of the AD state. It is realistic to foresee the development of an on-line sensor system based on colourimetric sensor arrays for use in AD, coupled with automated post-treatment of the sensor data.
The lifetime of colourimetric sensors is significantly dependent on the strength of the interactions between the sensor dyes and the analyte components interacting with them. Moreover, with weak analyte-sensor interactions, the reversibility of the sensor is achievable; however, strong interactions may have limited reversibility. By designing a colourimetric sensor array with the intention of weak analyte-sensor interactions, the reversibility may be improved; however, this will come at a cost to the dimensionality of the data as less variation in the colours of the dyes will be observed. Alternatively, by designing a sensor array for strong analyte-sensor interactions, the dimensionality of the data will be higher. Although the reversibility may be compromised by doing this, the dye arrays are rather straight-forward to produce and could be used in a semi-disposable manner, with a short lifetime.

Future perspectives
Future research on colourimetric sensor arrays for detection of VFAs in AD should focus on field trials in industrial AD plants, and increasing the lifetime of the sensor arrays. The increase in lifetime could be achieved by preferentially utilizing dyes that react with the liquid phase of the AD in a reversible manner, or by the development of further analysis software that can utilise data from dyes that show limited colour change due to irreversible reactions. Furthermore, more analysis of the sensor arrays response to different VFAs of different concentrations in different AD environments is heavily required. Moreover, this will allow the implementation of machine learning approaches to have a highly detailed assessment of the sensor arrays behaviour in relation to the AD environment and VFA concentrations.

Concluding remarks
Modern AD processes require cost-effective sensor technologies that allow the on-line measurement of key variables beyond VFAs, even for the smallest biogas production plants. These sensor and analysis improvements will give the operators of biogas plants the potential to gain not only a greater understanding of the complex environment within the production plant but also predict the direction of the reactions within the bioreactor. Colourimetric sensor array technologies can fulfil the requirements for future on-line biogas sensing. They will allow the operator to gain a greater understanding of the internal AD environment, facilitating the prediction of the direction of the process. The low economic requirement of colourimetric sensors will also allow for cheap, small and cost-effective devices able to detect and discriminate between various VFAs at low concentrations. With this low economic cost and the reduced requirement for technical expertise, colourimetric sensors will be able to be implemented in a variety of biogas plants, including small, low budget plants. The rapid timeframe from the sensor interaction with the sample to the computed result will essentially facilitate on-line detection of process deviations. This will significantly remove the detection delay as observed with current technologies. Furthermore, the flexibility of the colourimetric sensor systems will allow adaptation to detect a variety of other analytes within the digestate, potentially facilitating a multi-variate, on-line, cost-effective and robust sensor system. By providing a high fidelity depiction of AD, it is also likely to improve the knowledge of this complex process.
Norwegian Research Council (project 295912). Olivier Bernard gratefully acknowledges ENERSENSE for supporting his sabbatical stay at NTNU.