Integration of in situ Imaging and Chord Length Distribution Measurements for Estimation of Particle Size and Shape

Efficient processing of particulate products across various manufacturing steps requires that particles possess desired attributes such as size and shape. Controlling the particle production process to obtain required attributes will be greatly facilitated using robust algorithms providing the size and shape information of the particles from in situ measurements. However, obtaining particle size and shape information in situ during manufacturing has been a big challenge. This is because the problem of estimating particle size and shape (aspect ratio) from signals provided by in-line measuring tools is often ill posed, and therefore it calls for appropriate constraints to be imposed on the problem. One way to constrain uncertainty in estimation of particle size and shape from in-line measurements is to combine data from different measurements such as chord length distribution (CLD) and imaging. This paper presents two different methods for combining imaging and CLD data obtained with in-line tools in order to get reliable estimates of particle size distribution and aspect ratio, where the imaging data is used to constrain the search space for an aspect ratio from the CLD data.


Introduction
One of key steps in the manufacture of particulate products in the pharmaceuticals and fine chemicals industry is crystallisation, which is widely used for separation and purification of intermediates, fine chemicals and active pharmaceutical ingredients. The crystals come in different sizes and shapes. Subsequent steps in the manufacturing process, such as filtration, drying, blending and formulation of final products, require that the particle sizes and shapes lie within some desirable range. In order to provide monitoring and control of crystallisation processes it is necessary to develop techniques for estimating the shape and size Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/ces distribution of particles in situ. There are a number of offline tools (Washington, 1992) that can be used to estimate the particle size distribution 1 (PSD) of crystals produced in a crystallisation process. However, of particular importance to the control of a crystallisation process are tools that can be used in situ. These tools should be suitable for estimating size and shape information of particles dispersed in a slurry without the need for sampling and/ or dilution. Examples of such instruments are the focused beam reflectance measurement (FBRM), the three dimensional optical reflectance (3D-ORM) (Heinrich and Ulrich, 2012) and the particle vision and measurement (PVM) (Barrett and Glennon, 2002) sensors.
In-line sensors such as FBRM and 3D-ORM measure a chord length distribution (CLD) 2 which is related to the size and shape of the particles in a slurry. It has been a long standing challenge to be able to deduce the actual PSD and particle shape from experimental CLD data. In order to do this, an inverse problem needs to be solved. This is usually achieved by suitably discretising both CLD (which is already measured as a discrete distribution) and PSD and then constructing an appropriate transformation matrix relating these two distributions (Wynn, 2003;Agimelen et al., 2015). The transformation matrix depends on the choice of size bins used to discretise the two distributions and the corresponding size ranges as well as the shape of particles. The transformation matrix is usually not known in advance and needs to be estimated along with the corresponding PSD (discretised) so that the convolution of the transformation matrix with the PSD yields a CLD which agrees with the experimentally measured CLD. However, this problem is ill posed. There are a number of different transformation matrices and PSDs whose convolutions give rise to the same CLD. Hence the challenge is how to estimate a combination of transformation matrix and PSD whose convolution will agree with an experimentally measured CLD for a given slurry as well as the PSD estimated being physically reasonable and representative of the particles in the slurry.
The approach which was used in previous works (Ruf et al., 2000;Worlitschek et al., 2005;Li et al., 2014;Kail et al., 2009) when estimating the transformation matrix was to assume the same shape (quantified by a metric referred to as aspect ratio) for all the particles in the slurry, and then use a previously estimated 3 range of particle sizes in the slurry to construct the transformation matrix. This approach is not suitable for monitoring a crystallisation process where nucleation and/or growth of particles is present as neither the range of particle sizes nor their aspect ratio would be known in advance. A technique which was suitable for estimating the range of particle sizes in a slurry in situ was presented in our previous work (Agimelen et al., 2015). However, like in other previous works, the transformation matrix was constructed with a single aspect ratio for all particles. This leaves open a possibility that the transformation matrix is constructed with inappropriate aspect ratio or that there is a wider range of aspect ratios present for particles of same or different sizes. It was demonstrated in our previous work (Agimelen et al., 2015) that it was still possible to calculate different CLDs that all had a very good agreement with an experimentally measured CLD even though some of the transformation matrices were constructed at aspect ratios that were far from the shape of the particles described. However, it was also shown that as the aspect ratio deviated further from the true shape of the particles, then the corresponding PSD became increasingly noisy. This situation led to the introduction of a penalty function in order to eliminate unrealistic aspect ratios. However, when there is a wide variation of aspect ratios of the particles in the slurry, there is a need to introduce further constraints on the aspect ratio to reduce the search space and regularise the inverse problem. One way to do this is to get estimates of aspect ratio (within some reasonable bounds) using imaging, and then use this information to constrain the search for a representative aspect ratio. However, the imaging needs to be done in situ in order to develop techniques for estimation of PSD and particle shape which could be used for real time monitoring and control of particle production processes.
While it would be desirable to get good estimates of both PSD and particle aspect ratio using in situ imaging alone, this is currently not the case. The currently available in-line imaging tools (for example, the PVM used in this work) produce 2D projection images. Furthermore, the objects in the images may be partially or completely out of focus, parts of imaged object may cross the image frame or objects may overlap each other. 4 Although advanced measurement equipment have been developed which can be used to capture 3D images of particles in a slurry and make good estimates of PSD and shape of particles, it requires sampling and dilution flow loops 5 to allow capturing 3D images of individual particles in a flow-through cell (Eggers et al., 2008;Kempkes et al., 2010;Schorsch et al., 2012Schorsch et al., , 2014. Therefore this approach may not be generally applicable for in-line monitoring of particle manufacturing processes. Hence the current situation is that PSD cannot be estimated to a good degree of accuracy using routinely available in-line imaging tools. To overcome this challenge, we propose to combine in-line CLD measurements with imaging data to provide more reliable estimates of quantitative particle attributes.
In this paper, we present two different methods for combining imaging data with CLD data for particle size and shape estimation. The first method presented here calculates an estimate of the mean aspect ratio of all the particles in the slurry and then uses this information to constrain the search space for size and shape estimation from the CLD data. In the second method, a distribution of aspect ratios for each particle size is used for the PSD estimation. The distribution of aspect ratios is based on the data from the captured images.

Experiment
To demonstrate the technique for estimating particle size and shape information using a combination of the CLD and imaging data, experiments were performed in slurries containing particles of different shapes. The materials and procedure are described below.

Materials
Three different samples were used for the measurements. Sample 1 consisted of polystyrene (PS) microspheres purchased from EPRUI nanoparticles and Microspheres Co. Ltd. with batch number 2012-5-7, and 0.2 g of the PS microspheres were dispersed in 100 g of isopropanol (IPA) purchased from VWR (20842.323) giving a concentration of 0.2% by weight. Sample 2 consisted of cellobiose octaacetate (COA) particles obtained from GSK. The particles were dispersed in 1 The term particle size distribution is broadly used here to refer both to continuous analytical probability density functions for particle sizes and discretised probability histograms of the particle sizes.
2 Similar to the case of PSD, the term chord length distribution is used to cover both continuous analytical functions and discretised probability histograms. 3 The approach was to estimate the range of particle sizes in a sample by techniques such as sieving, laser diffraction, microscopy or use information supplied by the manufacturer before suspending the particles. 4 The issue of objects overlapping each other would not be a problem if an appropriate image processing algorithm which can resolve the objects is used. 5 The dilution is necessary to avoid instances of overlapping particles in images.
methanol (purchased from VWR (20847.307)) with the same concentration as in sample 1. Sample 3 consisted of glycine (Glycine) crystals obtained by cooling crystallisation from an aqueous solution.
The solution with glycine concentration of 340 mg/ml was prepared using glycine (purchased from Sigma-Aldrich (G8898, Z 99% TLC)) and deionised water (from an in-house Millipore Water System (18 MΩ/cm)). The solution was cooled from a temperature of 90 1C to a temperature of 43 1C at a rate of 3 1C=min. During this process the glycine crystallised out of solution until an equilibrium particle size distribution was reached. The crystallisation of glycine was monitored with the FBRM probe which showed an initial increase (in time) of chord lengths before eventually reaching a steady state.

Experimental setup
The suspension of particles for all samples was made in the Mettler Toledo EasyMax 102 system. The EasyMax system consists of a cylindrical jacketed vessel of volume 100 ml with different stirrer and blade options. An anchored overhead stirrer with pitched (45°pitch angle) blades was used in all the experiments in this work. The stirrer shaft and probes were inserted into the slurries through ports located at the top of the setup. The stirring speed was set at 400 rpm in all experiments.
The CLD measurements were made with a Mettler Toledo FBRM G400 probe. The images of the particles were captured with a Mettler Toledo PVM V819 probe during the period of CLD measurement. The FBRM sensor consists of a system of lenses which focus a laser beam onto a spot near the probe window in the slurry. The laser spot moves in a circular trajectory and the back scattered light is detected. The chord length is then calculated as the speed of the laser spot multiplied by the duration of the back scattered light as the laser traverses a particle. The FBRM sensor records the lengths of the chords for a pre-set duration after which the CLD is reported (Heinrich and Ulrich, 2012;Kail et al., 2007Kail et al., , 2008Worlitschek and Mazzotti, 2003;Agimelen et al., 2015).
The PVM is an in situ microscope which consists of eight laser sources enclosed in a cylindrical tube. The six forward lasers and two back lasers (achieved by reflecting two lasers off a Teflon cap at the probe window) illuminate the particles in the slurry. The back scattered light is detected by a CCD element from which greyscale images are constructed. The image frame of the CCD array consists of 1360 Â 1024 pixels with a pixel size of 0.8 μm.
The PVM V819 sensor has a maximum acquisition rate of 5 images per second, although lower rates of acquisition could be set depending on requirements. The depth of the focal zone is restricted to about 50 μm so that all objects that are in focus result in images that have identical magnification levels. Each of the lasers can be switched on or off so that different degrees of illumination can be achieved.

Experimental data
The CLD data measured with the FBRM sensor for PS is shown in Fig. 1(a). Images were captured with the PVM sensor during the CLD measurement. A representative image from the PS sample is shown in Fig. 2(a).
Similarly, the CLD data recorded for COA is shown in Fig. 1 (b) while that of Glycine is shown in Fig. 1(c). Representative images for COA and Glycine are shown in Fig. 2(b) and (c) respectively. A total of 600 images were acquired each for PS and COA, and 121 images for glycine.

Image analysis
As mentioned in the introductory section, an estimate of the PSD and particle shape can be made from images alone without the need to include CLD data. However, due to the reasons discussed in the introductory section, this approach is not always convenient. The techniques presented here utilise images obtained with an in-line measuring tool. However, due to the limitations in the images (as discussed in the introductory section), it is necessary to combine the imaging data with CLD to obtain reasonable aspect ratio and/or size estimates. The images captured with the PVM are processed in order to detect the objects contained in them and hence obtain information about the shape of the particles in the slurry. However, the image processing algorithm used in this work does not have features to resolve an object completely when it does not lie entirely in focal plane of the PVM sensor. Also, it does not have functionalities to resolve overlapping objects in images. For this reason the samples used in this work were deliberately prepared dilute 6 to reduce instances of overlapping objects. The parameters 7 of the image processing algorithm were tuned to reject most of the particles that were not entirely in the focal plane of the PVM sensor. However, the imaging data still has some degree of inaccuracy as seen in the error bars of the data (see Section 4.6 of the Supplementary Information). This limitation not withstanding, the data from the images was sufficiently accurate to demonstrate the techniques developed in this work. Furthermore, images which do not contain objects that are contained completely within the image frame were also discarded. This situation of having to discard some images reduces the number of data sets that can be gathered from the images. However, it can be shown (see Section 2 of the Supplementary Information) that with a sample size (number of objects) of about 500 the error incurred in estimating the aspect ratio is reduced to a reasonable extent. However, for a more robust estimate of the PSD using imaging data, a larger number of objects will be required. The number 8 of detected objects used in this work is just sufficient to demonstrate the methods developed here. The issue of objects not completely in focus can be dealt with if additional functionalities are added to the image processing algorithm, but this is beyond the scope of this work. The key steps for detecting objects in the images captured by the PVM sensor are summarised in Section 4.1.

Object detection
The raw greyscale images from the PVM sensor are passed through a median filter to remove speck noise from the image background which is homogeneous. At this stage objects on the boundary of the image frame are removed. Any object with surface area below 900 pixels 9 is considered noise and excluded from processing. Finally, a closing operation with a disk structural element is used to join broken edges. The resulting blob properties such as area, centroid, eccentricity, convex area, and major and minor axes can be obtained. These steps are summarised in Fig. 3. The tunable parameters of the image processing algorithm are summarised in Section 4.5 of the Supplementary Information. 6 Low slurry densities have been used here for the purpose of methods development. Future work will involve the investigation of the applicability of the methods developed here at higher slurry densities (with a more advanced image processing algorithm) using suspensions of particles of known PSD, then the degree of deviation of the results from the known PSD can be quantified at different slurry densities. 7 The parameters of the algorithm need to be tuned for different samples due to variation of contrast. 8 A total of 1393 objects were detected for PS, 1810 for COA and 526 for glycine. 9 The surface area of 900 pixels represents length dimensions of approximately 30 Â 30 pixels (assuming a square geometry). This implies that objects that are smaller than approximately 24 μm are rejected by the image processing algorithm.
The consequence is that there is no estimate of aspect ratio for these small objects.
However, the particles used in this work have sizes mostly in the range of 100 μm so that the effects of this are minimal.

Characterising particle shape
The following procedure is used to obtain a shape descriptor, which is then used to characterise the shape of each particle. Each boundary pixel j of object i has coordinatesẽ ij ¼ ½ẽ ij;x ;ẽ ij;y , where the Cartesian coordinates system has been used. The centroid of object i has coordinatesẽ i . The distanced ij of the centroid of object i to the pixel j is given as Then for each object i, locate the pixel (with index j n ) whose distance from the centroidd ij n is largest as The angle between the line joining this farthest pixel to the centroid and the horizontal axis is φ as shown in Fig. 4(a). Then, for each object i, transform the coordinates of each pixel by performing a rotation through the angle φ so that the line joining the centroid to the farthest pixel becomes parallel to the horizontal axis. This operation transforms the coordinates of each pixel j toê ij and the centroid toê i . The line joining the pixel j to the centroid now makes an angle θ with the horizontal axis as shown in Fig. 4 (b). Due to the rotation, the farthest pixel from the centroid corresponds to θ ¼ 0. The distance of each pixel j from the centroid is the same of course.
Since the sample rate with respect to θ is not uniform, the pixel distances were resampled with N p uniformly spaced θ values constructed as θ p ¼ pΔθ; Δθ ¼ 2π=N p ; p ¼ 1; 2; …; N p . This allows the vector of all pixels distances (from the centroid) for object i to be written as Finally, the average vector d of all pixel distances for all objects detected by the image processing algorithm can be calculated as where N obj is the number of objects detected from all the images analysed.
A plot of d versus angle (θ) can be made as shown in Fig. 5(a)-(c). For near spherical particles, the shape descriptor d is nearly constant as in the case of PS in Fig. 5(a). However, for elongated particles, the shape descriptor d has two minima and maxima as in the cases of CoA and glycine in Fig. 5(b) and (c), respectively. The shape descriptor shown in Fig. 5(a)-(c) is similar to the type described in Schorsch et al. (2012).
In an ideal situation, the shape descriptor for spherical particles will be constant at a value representing the radius of the spherical particles. However, since the PS particles are not perfectly spherical there is slight variation in the dimensions so that an average aspect ratio r (the ratio of the minor to the major dimension) can be estimated. Similarly, the maxima in the shape descriptors for the CoA and glycine particles (in Fig. 5(b) and (c)) represent the major dimension of the particles while the minima represent the minor dimension of the particles.
For the case of the PS particles the mean aspect ratio r is estimated from the minimum dimension L min and maximum dimension L max (see Fig. 5(a)) as In the case of elongated particles, the maximum dimension (average length of particles) is given as Fig. 5(b) and (c)) and the minimum dimension (average  width of particles) is given as L min ¼ L 1min þ L 2min . So that the average aspect ratio can then be estimated using Eq. (5).
However, the aspect ratios for individual particles will be different from r. The aspect ratio for each particle is estimated from the shape descriptor corresponding to that particle. Once the aspect ratios of individual particles are estimated, then a scatter plot of aspect ratio versus particle length can be made as shown in Fig. 6(a)-(c). The shape descriptors for individual particles are not always smooth as in the cases shown in Fig. 5(a)-(c). They contain different degrees of variation due to imperfections in the particles and images (see Section 4.6 of the Supplementary Information for details).
Furthermore, the aspect ratios estimated from the shape descriptor for individual particles contain some artefacts (as can be seen in Fig. 6) due to a number of factors. These factors include deformations in the particles (that is, particles whose shapes deviate from the majority of particles in the slurry), impurity objects in the monitored slurry and particles not completely in focus. 10 For example, aspect ratios as low as about 0.2 in the case of the spherical particles in Fig. 6(a) or the minor peak at aspect ratio r % 0:5 in Fig. 6(d) are artefacts. However, the number of data points (in Fig. 6(b) and (c)) corresponding to these large particles may not be representative of the  10 The image processing algorithm parameters are tuned to remove particles that are out of focus. However, when the particle is only partially in focus, the image processing algorithm detects only part of the particle and this leads to an error in the estimated aspect ratio for that particle.
actual number of these large particles in the slurry. This is because the image processing algorithm has been designed to remove objects making contact with the image frame, and larger particles have a higher probability of making contact with the image frame.
In the situation where the PSD is to be estimated from image data alone, this probability will need to be taken into account (BSI, 2014). However, since the objective here is just to estimate the aspect ratio of particles of different sizes, this is not a crucial issue. The histogram (in Fig. 6(d)) 11 for the spherical particles has a dominant mode close to the aspect ratio of 0.85 which is close to the average aspect ratio r % 0:8 estimated from the average shape descriptor (in Fig. 5(a)) for this slurry. Similarly the modes of the histograms (in Fig. 6(e) and (f)) for COA and glycine occur close to aspect ratios of 0.2 and 0.4 respectively. These values are close to the average aspect ratios r % 0:2 (for COA) and r % 0:4 (for glycine) obtained from their respective shape descriptors in Fig. 5 (b) and (c).

Modelling chord length distribution
The sizes of particles in a slurry can be represented by the equivalent spherical diameter as was done in Agimelen et al. (2015). However, a characteristic length L could also be used, which can be chosen as the distance between the two extreme points in the particles' geometry. Since the estimated sizes from the images is L, this metric is used here for consistency with the image data. Once the metric for particle sizes has been chosen, then the PSD can then be expressed in terms of the chosen particle size metric. The PSD X is related to the CLD C by means of a convolution function (Hobbel et al., 1991;Heinrich and Ulrich, 2012;Agimelen et al., 2015) and the relationship can be written in matrix form as Agimelen et al. (2015) 12 where s is the chord length, andX is the length weighted PSD (Agimelen et al., 2015) given as The PSD X i (which is actually a histogram) consists of N bins. The characteristic size L i of particle size bin i is the geometric mean of sizes L i and L i þ 1 as The bin boundaries are calculated as where L min is the left boundary of the first particle size bin and L max is the right boundary of the last particle size bin. In previous works (Ruf et al., 2000;Worlitschek et al., 2005;Li et al., , 2013Li et al., , 2014Yu et al., 2008) the values of L min and L max were estimated from suitable measurements. However, the technique of estimating L min and L max directly from the bin boundaries of the CLD histogram using a moving window technique has been demonstrated to yield more accurate results (Agimelen et al., 2015). This window technique is more suitable for estimating the sizes of particles in-line in a process where particle size information is obtained from the CLD (Agimelen et al., 2015).
The length weighting applied to the PSD X in Eq. (7) is necessary because the CLD for a population of particles is biased towards particles of larger sizes (Agimelen et al., 2015;Hobbel et al., 1991;Simmons et al., 1999;Vaccaro et al., 2007). The forward problem in Eq. (6) is implemented by considering a chord length histogram C j of M bins where the characteristic chord length s j of bin j is the geometric mean of the chord lengths of s j and s j þ 1 as outlined in  and Agimelen et al. (2015).
If the PSD for a population of particles is known, then the CLD can be calculated using Eq. (6). However, in practical situations, the particle size histogram X i is not known in advance resulting in the inverse problem of calculating an unknown PSD X i from a known CLD C j . For this reason, the forward problem in Eq. (6) is reformulated as where the matrixÃ is obtained from matrix A by multiplying each column of A by the corresponding particle length as described in Agimelen et al. (2015). Each column i of matrix A ji is calculated from the CLD of a single particle (single particle CLD) of length L i and given aspect ratio r i . In the current work, the single particle CLD used in constructing the columns of matrix A are obtained from the analytical Li and Wilkinson (LW) model . 13 The process of calculating the single particle CLD involves computing the relative likelihood of obtaining a chord of length s from a particle of a given length and aspect ratio ). The LW model gives a probability density function (PDF) which can be used in making this calculation for ellipsoidal shaped particles. 14 The PDF is derived from an ellipsoid of semi major axis length a, semi-minor axis length b and aspect ratio r ¼b/a ). The LW model gives the probability p L i ðs j ; s j þ 1 Þ of obtaining a chord whose length lies between s j and s j þ 1 from an ellipsoid of characteristic length L i ¼ 2a i (see Section 1 of the Supplementary Information and Agimelen et al., 2015; for the mathematical expression for p L i ðs j ; s j þ 1 Þ). Once the probabilities are calculated, then for each row j of matrix A the columns are constructed as 6. Incorporating aspect ratio from images As stated in Section 5 the calculation of a column of matrix A requires the characteristic size of the particle size bin corresponding to that column, as well as the aspect ratio r of the particle of that characteristic size. In the previous work (Agimelen et al., 2015), all particles were assumed to have the same aspect ratio, and its value was estimated using an algorithm based solely on CLD data. The single aspect ratio approach will be used in Section 11 The uniform bin widths of the histograms in Fig. 6(d)-(f) were estimated using the Freedman-Diaconis rule (Izenman, 1991). The number of bins was then estimated as 18 bins for PS, 28 bins for COA and 13 bins for glycine. 12 Note that the symbol L was used to represent the length of a chord in Agimelen et al. (2015). However, the symbol L is used to represent the characteristic length and s the length of a chord in this work. The CLD C and PSDX in Eq.
(2) have been discretised. As such they are not continuous probability density functions and the term distribution is used in this work for simplicity.
13 Even though the model by Vaccaro et al. (2007) gave estimates of particle aspect ratios that were closer to the estimates from images in Agimelen et al. (2015), the LW model is used in this work for the CLD calculation. The reason is that the Vaccaro model is restricted to small values of aspect ratios r≲0:4, whereas the image data in Fig. 3 cover aspect ratios of r % 0:1 to r % 0:9. The LW model covers the entire range from r¼ 0 to r¼ 1. 14 The shapes of the COA and glycine particles in this work have been approximated as ellipsoids. However, this is only an approximation as Fig. 2 (c) clearly shows that the glycine particles are faceted. The use of ellipsoids to represent faceted objects introduces some discrepancy between the single particle CLDs of both objects (see Section 8 of the Supplementary Information for details). However, the ellipsoid approximation used here is sufficient to illustrate the methods presented here.
6.1, where the single aspect ratio value is estimated from imaging data. This approach is most suitable for the case of spherical particles where the aspect ratios of the individual particles are tightly packed around some mean value. However, for the case of particles where there is a wider spread of aspect ratios, a variation of aspect ratios for different particle sizes can also be used. The corresponding technique is outlined in Section 6.2.
6.1. Method 1: population CLD with a single aspect ratio When the aspect ratios are tightly packed around some mean value as in the case of spherical (or near spherical) particles in Fig. 6(a) and (d), it may be desirable to use a mean value for all particles. Although, the mean aspect ratio estimated from the images provides the best estimate based on available data, there is an uncertainty due to sampling limitations and various artefacts discussed above. This necessitates a search (within a suitable confidence interval) around the mean aspect ratio estimated from the images for an aspect ratio which best matches the experimentally measured CLD. This results in the search space being narrowed down leading to significantly lower computation time and less uncertainty in subsequent calculations.
In this section, the transformation matrixÃ in Eq. (10) is constructed with a single aspect ratio for all its columns. The aspect ratio r is chosen from a range given as where σ r is the standard deviation of all aspect ratios estimated from images and N is the number of standard deviations chosen. The purpose of Eq. (12) is to constrain the search space for the inverse problem. The details of the inverse problem will be given in Section 7. Hence, Method 1 corresponds to the case described in Agimelen et al. (2015) where each column of matrixÃ consists of the single particle CLD of a particle of length L and aspect ratio r, where the search space for r is reduced by means of Eq. (12).

Method 2: population CLD with multiple aspect ratios
The Method 1 presented in the previous subsection assigns the same aspect ratio to all the particles in the slurry. This method is capable of getting reasonable estimates of the PSD. However, to take the variation of particle shape into account, a second method is presented here in which multiple aspect ratios are assigned to particles of the same size. This method is particularly relevant for particles whose shape is needle-like or near rectangular as illustrated on the left of Fig. 7. The second method is outlined below.
In Method 2, the particles in bin i are assigned the same characteristic size L i but different aspect ratios. The aspect ratios assigned to the particles run from r i min to r i max as illustrated in Fig. 7. The procedure is to divide the particles in bin i into N r subgroups (N r ¼ 50 in this case, see Section 4.1 of the Supplementary Information for details), and the particles in subgroup k are assigned the same aspect ratio r i k . The aspect ratios of the subgroups are uniformly spaced so that the aspect ratios are given as where Δr ¼ However, the aspect ratios r i k of the different subgroups need to be weighted by different amounts as the numbers of particles with given aspect ratios in the images may not necessarily be uniform from r i min to r i max . The data on the right of Fig. 7 suggests an approximately uniform distribution of the aspect ratios from r i min to r i max , hence the aspect ratios were assigned to each of the subgroup k with equal weight in this case. Once aspect ratios have been assigned to different particle size bins, 15 then a 3D probability array A jki is constructed. Each slice i of the array corresponds to a particle of size L i , and each column k of slice i contains the probabilities p L k i ðs j ; s j þ 1 Þ of obtaining chords whose lengths lie between s j and s j þ 1 from a particle of size L i and aspect ratio r i k . Hence the 3D array consists of M rows, N r columns and N slices. The transformation matrix A ji in Eq. (10) is then obtained from the 3D array by averaging over the slices as This simple averaging is carried out since the aspect ratios r i k are assigned to the subgroups k with equal weights. This is the simplest way to construct the matrix A ji from the 3D array A jki , and this approach is supported by the data from the images as shown on the right of Fig. 7. It is possible to introduce a probability distribution for the aspect ratios assigned to the subgroups, but this simple approach has been used here for the purpose of illustrating the technique. Once the matrix A ji has been constructed, then the forward problem in Eq. (10) can be solved for a given PSD X i .

PSD estimation
As mentioned in Section 5, the problem encountered in practical situations is the estimation of the PSD X i corresponding to an experimentally measured CLD C n j . This is the inverse problem to the forward problem given in Eq. (10). One of the key steps in the process of the PSD estimation is to determine the transformation matrixÃ in Eq. (10) as accurately as possible. The level of accuracy of the matrixÃ depends on the values of L min and L max as well as the aspect ratio(s) used in calculating its columns.
To determine the best possible values of L min and L max , the forward problem in Eq. (10) is rewritten as where ϵ is an additive error between the calculated and experimentally measured CLD. Then for given values of L min , L max , values of the fitting parameter γ are found 16 which minimises the objective function f 1 given as where A trial solution of γ i ¼ 0 was used in the calculation of the vector X i from Eq. (17). Once the solution vector X i is obtained, then it is used to solve the forward problem in Eq. (10) to obtain a calculated CLD C j ; j ¼ 1; 2; …; M, where M is the number of bins in the CLD histogram. 17 The procedure is repeated until the optimum values of L min and L max are found for which there is the best match between the calculated and experimentally measured CLD.
The objective function f 1 given in Eq. (17) is suitable for estimating the optimum values of L min and L max whether the same aspect ratio is assigned to all particles or a distribution of aspect ratios is assigned to particles of the same size. In the case where a single aspect ratio is assigned to all particles, the objective function f 1 is not suitable for picking out the best aspect ratio within the confidence interval of aspect ratios. This task is accomplished with another objective function f 2 (to be introduced in Section 7.1). The calculation for estimating the best aspect ratio (within the confidence interval of aspect ratios) using the objective function f 2 is carried out with the values of L min and L max estimated with the objective function f 1 . However, in the other case where a distribution of aspect ratios is assigned to particles of the same size, the objective function f 2 is not used as the problem of determining the best aspect ratio has been removed.
Once the optimum values of L min and L max and (in the case of Method 1) the best aspect ratio have been estimated, then the PSDs (both number and volume based) can be calculated.
However, these PSDs may not be reasonably smooth, showing non-physical oscillations as is often the case when solving illposed problems. In such cases, a third objective function f 3 (to be introduced in Section 7.3) is used to calculate smooth PSDs. The calculation is carried out using the optimum values of L min and L max obtained with the objective function f 1 and in the case of Method 1, the best aspect ratio obtained with the objective function f 2 . The calculation of the smooth PSDs with the objective function f 3 is done using suitable criteria described in Section 4.4 of the Supplementary Information.
As stated above, the objective function f 1 is used to obtain the optimum values of L min and L max for both Methods 1 and 2. A given pair of L min and L max is said to be optimum when the corresponding calculated CLD C has the best match with the experimentally measured CLD C n . The level of agreement between the calculated and experimentally measured CLD is assessed by computing the L 2 norm The values of L min and L max for which the L 2 norm in Eq. (19) reaches a minimum are chosen as the optimum values.

PSD estimation for Method 1
In Method 1 the search for the optimum values of L min and L max using the objective function f 1 is done at each aspect ratio within the confidence interval in Eq. (12). However, for particles of a given shape, the L 2 norm in Eq. (19) initially decreases with increasing aspect ratio and then becomes level (see Section 4.3 of the Supplementary Information for details). This leads to non-uniqueness in determining the optimum value of r (Agimelen et al., 2015). This problem of non-uniqueness is removed by using a modified objective function f 2 which contains a penalty term to control the size of the calculated PSD vector as where the parameter λ 1 sets the level of imposed penalty. The value of λ 1 is chosen by comparing the magnitudes of the terms in Eq. (20) (see Section 4.3 of the Supplementary Information). The aspect ratio at which the objective function f 2 reaches a minimum is then chosen as the optimum. The solution vector (which is a number based PSD) obtained from Eq. (20) is not necessarily smooth as the penalty imposed on the solution vector only restricts its magnitude. With this penalty function, the LM algorithm could settle on a solution vector that contains some local fluctuations but whose value of f 2 is slightly less than a nearby solution that is smooth. For this reason, a new objective function f 3 (see Section 7.3 for details) which contains a penalty term to control the second derivative (to improve the smoothness of the solution vector) of the solution vector is used to estimate a number based PSD whose corresponding CLD is compared with the experimental data.

PSD estimation for Method 2
In Method 2 (described in Section 6.2) particles of different characteristic sizes L i are assigned a range of aspect ratios as outlined in Section 6.2. This eliminates the need to search for the best global aspect ratio which is the situation in Method 1. Hence in Method 2, it is only necessary to search for the best values of L min and L max using the objective function f 1 . Once the optimum transformation matrix is obtained using the objective function f 1 , then the corresponding smoothed solution is obtained with the objective function f 3 (given in Eq. (21)). 16 The Levenberg-Marquardt (LM) algorithm as implemented in Matlab was used in this work to solve the optimisation problem here. The PSD X i is estimated by means of the parameter γ i . An initial value of γ i is passed on to the LM algorithm which then searches for the optimum value of γ i to fit the given CLD. Since the PSD X i is defined as an exponential function in Eq. (18), the parameter γ i can take values in the interval ðÀ1; þ1Þ and still give X i Z 0. This implies that the non-negativity requirement on the PSD is maintained by the formulation of X i in Eq. (18). Therefore the LM algorithm was run without the use of lower or upper bounds as the parameter γ i is defined in ðÀ1; þ1Þ.

Volume based PSD
It is often necessary to recast the PSD X i (which is number based) in Eq. (17) as a volume based PSD since most instruments for measuring PSD give the data in terms of a volume based PSD. A new technique which allows suitable penalties to be imposed on the calculated volume based PSD X v was introduced in Agimelen et al. (2015). In the current work, a smoothing penalty (referred to in Section 7.1, see also Roths et al., 2001) is imposed. This is because the estimated volume based PSD X v may contain significant non-physical oscillations even though the corresponding number based PSD contains none or minor fluctuations. The objective function f 3 (see Eq. (21)) used to impose smoothness on the volume based PSD can also be used to obtain a smooth number based PSD. Hence the function f 3 is given in Eq. (21) in terms of generic quantities depending on whether the number based or volume based PSD is being computed.
The function f 3 is given as In the case of a number based PSD, the CLD C þ j ¼ C n j (the experimentally measured CLD), the matrix A þ ji ¼Ã ji , 18 and the solution vector X þ i ¼ X i . However, in the case of the volume based PSD, the vector X þ i ¼ X v i (in the case where smoothing is not required, 19 then the volume based PSD X v i is obtained from an objective function similar to f 1 in Eq. (17) as described in Section 3 of the Supplementary Information, otherwise Eq. (21) is used), the matrix A þ ji will be scaled accordingly (see Section 3 of the Supplementary Information for details), and the vector C þ j ¼Ĉ n where the transformed CLDĈ n is calculated aŝ wherê The operator ∇ h 2 is a finite difference approximation to the second derivative of the vector X þ i given as (Veldman and Rinzema, 1992) where and X þ i has been treated as a function of the characteristic particle size L. The parameter λ 2 sets the level of penalty imposed on the second derivative of X þ i . If the value of λ 2 is sufficiently large, then the penalty on the second derivative causes the LM algorithm to search for a solution vector which is smooth thereby avoiding solutions with localised oscillations (see Section 4.4 of the Supplementary Information). The volume based PSD obtained from Eq. (21) is normalised and converted to a probability density distribution as

Results and discussion
The results obtained with the two methods outlined in Sections 6 and 7 are presented in this Section. More details of the choice of parameter values are presented in the supplementary information.
8.1. Results from Method 1 Fig. 8(b) shows the objective function f 2 as a function of the aspect ratio r for PS. The function reaches a minimum at r ¼1 suggesting spherical particles. This is consistent with the shape of the particles in Fig. 2(a) and the mean aspect ratio of r % 0:8 obtained from the shape descriptor in Fig. 5(d) for this sample. This is also in agreement with the histogram in Fig. 6(d) which suggests that the majority of the particles in the sample are near spherical. Hence the aspect ratio predicted with Method 1 gives a reasonable description of the shape of the particles in the population as previously established (Agimelen et al., 2015).
The calculation in Fig. 8 (21)) using the optimum transformation matrix from Eq. (17). Using this optimum transformation matrix and r ¼1, a number based PSD is calculated from Eq. (21) with the smoothness penalty set by λ 2 ¼ 10 À 2 . The CLD corresponding to this number based PSD is shown by the solid line in Fig. 8(a). Furthermore, the volume based PSD (calculated at r ¼1) is shown in Fig. 8(c). In this case, the volume based PSD was calculated from the objective function f 2 (with the CLD C n j replaced with transformed CLDĈ n j and the matrixÃ ji rescaled as described in Section 3 of the Supplementary Information) at λ 1 % 0:95. The objective function f 3 was not used in calculating the volume based PSD in this case as smoothing was not required.
The calculated CLD in Fig. 8(a) has a near perfect match with the measured CLD for PS which is shown by the symbols in Fig. 8 (a). The calculations in Fig. 8 were done at a value of N ¼ 2 (where N is defined in Eq. (12)). This value was sufficient to give a wide enough range of aspect ratios to find a good match to the experimentally measured CLD. If the value of N is not large enough, then the calculated CLD may not match the experimentally measured CLD as the particles do not have exactly the same shape and Method 1 only uses a single aspect ratio to describe the shape of all the particles in the population. The single aspect ratio chosen will then not be representative of all the particles in the population. However, the imaging data narrows down the search space for a representative aspect ratio, and hence reduces the risk of predicting an unreliable aspect ratio.
Figs. 9 and 10 are similar to Fig. 8 but for COA and glycine respectively. Fig. 9(b) shows the objective function f 2 (in Eq. (20)) with aspect ratio for COA. The function f 2 in Fig. 9(b) predicts an aspect ratio r ¼0.3 for COA. This is reasonable when compared with crystals in Fig. 2(b) and the shape descriptor in Fig. 5(b). Also the mode of the histogram in Fig. 6(e) is close to the aspect ratio r ¼0.3. The predicted aspect ratio of r ¼0.3 in Fig. 9(b) is also close 18 The matrixÃ ij is the transformation matrix in the forward problem in Eq. For the volume based PSD, the corresponding number based PSD from Eq. (21) is used to construct a trial solution as described in Section 3 of the supplementary information. 19 The vector X þ i is defined as an exponential function of a fitting parameter similar to Eq. (18) for X i . The optimisation is then performed to obtain the optimum value of the fitting parameter using the LM algorithm similar to the case of X i in Eq. (18). 20 The form of the central difference approximation to the second derivative of the vector X þ i given in Eq. (18) is necessary since the grid for L is non-uniform as seen in Eq. (8).
to the estimated aspect ratio of r % 0:2 from the shape descriptor in Fig. 5(b). Furthermore, the calculated (in a manner similar to the case of PS in Fig. 8(a)) CLD for COA shown by the solid line in Fig. 9 (a) has a near perfect match with the measured CLD for the sample shown by the symbols in Fig. 9(a). The calculations were done with N ¼ 4 in Eq. (12).
The volume based PSD for COA (calculated in a manner similar to the case of PS in Fig. 8(c)) is shown in Fig. 9(c). The calculated Fig. 8. (a) Experimentally measured (symbols) and calculated (solid line) CLD for PS. The calculated CLD was obtained by solving the forward problem in Eq. (10) using the number based PSD calculated with the objective function f 3 (using λ 2 ¼ 10 À 2 ) in Eq. (21). The calculation was done at the aspect ratio r ¼1 where the objective function f 2 (in Eq. (20) using λ 1 % 0:95) reaches a minimum. The matrixÃ in Eq. (10) was calculated by Method 1 (described Section 6.1). (b) The objective function f 2 (using λ 1 % 0:95) in Eq. (20) for different aspect ratios for PS. (c) Calculated (by Method 1) volume based PSD for the PS sample. The volume based PSD was calculated using the objective function f 2 (at λ 1 % 0:95). The objective function f 3 was not used in the calculation of the volume based PSD in this case as smoothing was not required. Fig. 9. Similar to Fig. 8 but for COA. The calculation was done at the aspect ratio r¼ 0.3 where the objective function f 2 (using λ 1 ¼ 0:54) reaches a minimum in (b). The number based PSD (used for calculating the CLD) was calculated using λ 2 ¼ 0:05 in Eq. (21) while the volume based PSD (shown in (c)) was calculated using λ 2 ¼ 10 À 7 in Eq. (21). PSD in Fig. 9(c) has a left shoulder extending to needle lengths of about 10 μm. This gives a hint of the presence of a significant number of short needles in the COA sample. Some of these short needles can be seen in the image of Fig. 2(b).
The calculations for glycine in Fig. 10 are similar to the cases of PS and COA in Figs. 8 and 9 respectively. The predicted particle shape represented by r ¼ 0.4 in Fig. 10(b) is consistent with the particles in Fig. 2(c) and shape descriptor (which yields r % 0:4) in Fig. 5(c) as well as the histogram in Fig. 6(f). The calculated CLD (solid line Fig. 10(a)) also matches the experimentally measured CLD for the glycine sample (symbols in Fig. 10(a)). The calculated volume based PSD for this sample at r¼ 0.4 is shown in Fig. 10(c). The volume based PSD in Fig. 10(c) also has a left shoulder extending to about 10 μm similar to the case of COA in Fig. 9(c).
The calculations in Fig. 10 for glycine were done with N ¼ 2 which was sufficient to get a good match for the measured CLD.

Results from Method 2
The aspect ratios in Fig. 6(b) and (c) show a spread over a significant range of particle sizes. Hence the technique referred to as Method 2 in Sections 6.2 and 7.2 was also applied in the analysis of the data from COA and glycine.
The solid line in Fig. 11(a) shows the calculated CLD for COA using Method 2. The calculation was done by searching for the optimum values of L min and L max while constructing different transformation matrices as outlined in Section 6.2. The search for the optimum values of L min and L max (and hence the optimum transformation matrix) is done by minimising the objective function f 1 in Eq. (17). Once the optimum transformation matrix is found, then a number based PSD is calculated by minimising the objective function f 3 (with λ 2 ¼ 0:1) in Eq. (21). The CLD corresponding to this number based PSD is shown by the solid line in Fig. 11(a). The transformed CLDĈ n j in Eq. (22) is calculated using the optimum transformation matrix and the number based PSD obtained with Eq. (17).
The calculated CLD in Fig. 11(a) (solid line) has a near perfect match with the experimentally measured CLD (shown by the symbols in Fig. 11(a)) for COA. This is similar to the situation in Fig. 9(a) where the calculation was done with Method 1. The degree of agreement of the calculated CLD in Fig. 11(a) with the experimentally measured CLD demonstrates the level of accuracy that can be achieved with Method 2. Note that the aspect ratios of each of the subgroups of each bin (in Fig. 7) were assigned equal weights; a simple approach that is sufficient for reasonable results in this case. The volume based PSD for COA (obtained by Method 2) suggests particle sizes from about 3 μm to about 400 μm (Fig. 11 (b)). This is close to the prediction of particle sizes from about 7 μm to about 400 μm by Method 1. Even though the ranges of particle sizes predicted by both methods are close, Method 2 has the advantage that aspect ratio is not used as a fitting parameter which removes the issue of estimating the optimum aspect ratio from the problem. The aspect ratio is assumed to vary according to imaging data available.
Although the particle sizes estimated from these 2D images are not very accurate because of the focusing problem highlighted earlier, a comparison of the estimated PSD from the images with the volume based PSD obtained by both methods can still be made. This comparison shows good agreement of the estimated volume based PSD from the images with the volume based PSD estimated by Methods 1 and 2. Details are given in Section 6 of the Supplementary Information. The peak of the volume based PSD obtained by Method 2 is higher than that obtained by Method 1 in this case. This is because the volume based PSD by Method 2 is slightly narrower within the size range of about 50 μm to about 200 μm (Fig. 11(b)) so that the main peak gets higher to satisfy the normalisation constraint in Eq. (26). The main peak is accompanied by a smaller peak at a particle length close to 30 μm (Fig. 11 (b)) suggesting a bimodal distribution for the COA particles. However, this feature of a bimodal distribution is not picked up by Method 1 (Fig. 11(b)). This could be because Method 2 is more efficient in picking out bimodal distributions in a population of particles where there is a variation of aspect ratio for particles of different sizes (see Section 7 of the Supplementary Information for details) than Method 1.  20) and (21) respectively for the number based PSD. The value of λ 2 ¼ 10 À 6 was used in Eq. (21) for the volume based PSD.
The situation for glycine is similar to that of COA. The solid line in Fig. 11(c) shows the calculated (calculated in a manner similar to the case of COA in Fig. 11(a)) CLD for glycine. The calculated CLD in Fig. 11(c) also has a near perfect match with the experimentally measured (symbols) CLD in Fig. 11(c). This is similar to the case of COA in Fig. 11(a). The calculated volume based PSD (by Method 2) in Fig. 11(d) for glycine also covers about the same range of particle sizes as in the case of Method 1 (Fig. 11(d)) with the PSD very similar from both methods.
The volume based PSD for COA obtained by the two Methods ( Fig. 11(b)) cover a size range of r 10 μm to about 400 μm, while the CLD data for COA (in Fig. 11(a)) shows a maximum chord length of about 300 μm. 21 The PSD in Fig. 11(b) agrees with the image data in Fig. 6(b) for COA where the scatter plot is dense in the region between about 30 μm and about 300 μm with a small number of particles of sizes ≳300 μm. Particles of small sizes below about 30 μm are not picked up by the image processing algorithm because objects smaller than that are rejected by the algorithm to reduce the risk of processing background noise as real objects. This is the reason why particles of sizes ≲30 μm do not contribute to the scatter plot of Fig. 6(b) even though the volume based PSD for COA in Fig. 11(b) suggests the presence of these particles. The situation with glycine is similar to that of COA as seen in Fig. 11(d). The volume based PSD obtained by both methods cover a size range from about 10 μm to about 500 μm in agreement with the CLD for glycine (in Fig. 11(c)) which shows the longest chord to be about 400 μm. The data in Fig. 11(d) also agrees with the image data in Fig. 6(c) which shows particle sizes up to about 500 μm.
There may be a larger number of large particles (of sizes close to 500 μm) in the glycine slurry than in the COA slurry so that their contribution to the CLD is more significant.

Conclusions
Two different methods have been developed to constrain the search space of aspect ratio(s) for particle size estimation using CLD and imaging data. Both methods estimate aspect ratio from images and then use the information in the estimation of aspect ratio and/or PSD from CLD data.
In the first method, the PSD estimation from CLD data is carried out using a single representative aspect ratio for all the particles in the slurry. However, the search space for this representative aspect ratio is reduced by means of data from the images of the particles captured in-line during the process. This reduces the risk of predicting an aspect ratio which is not representative of the particles in the slurry, and hence an unreliable PSD.
In the second method, a range of aspect ratios (also estimated from the images of the particles captured in-line) is assigned to particles of different sizes. This takes aspect ratio estimation out of the problem, and hence eliminates the risk of estimating a PSD at an aspect ratio which is not representative of the particles in the slurry.
The techniques presented in this work have been developed to be applied in situ, and an in-line imaging tool has been used in this work. The currently available in-line imaging tools are not suitable (when used on their own) for obtaining accurate PSD and aspect ratio due to various issues outlined in the text. The limitations of using images from these in-line tools alone to get aspect ratio and/or PSD estimates also show up in the large error bars in Fig. 14(d)-(f) in Section 4.6 of the Supplementary Information. Hence the methods presented here combine imaging and CLD data obtained in-line to obtain more robust estimates of PSD and aspect Fig. 11. (a) Experimentally measured (symbols) and calculated (solid line) for COA. The calculated CLD was obtained by solving the forward problem in Eq. (10), where the matrixÃ was calculated by Method 2 as outlined in Section 6.2. The number based PSD used in solving the forward problem was obtained from the objective function f 3 in Eq. (21) for λ 2 ¼ 0:1. (b) The blue diamonds are the calculated (by Method 1) volume based PSD for COA shown in Fig. 9(c). The black asterisks are the volume based PSD calculated by Method 2 for COA. The volume based PSD by Method 2 was calculated from Eq. (21) at λ 2 ¼ 3 Â 10 À 6 . (c) Similar to (a) but for glycine. The number based PSD was obtained from Eq. (21) at λ 2 ¼ 0:01. (d) Similar to (b) but for glycine, where the value of λ 2 ¼ 10 À 5 has been used for the volume based PSD calculated by Method 2. (For interpretation of the references to colour in this figure caption, the reader is referred to the web version of this paper.) ratio. Note that the methods presented here can be applied to combine CLD with imaging captured with any in situ tools. The images need to be of sufficient quality so that aspect ratio information can be obtained from them using a suitable image processing algorithm.