Improvements for volume self-calibration

For multi-camera volumetric flow measurement techniques like tomographic PIV or shake-the-box, volume self-calibration (VSC) has become the standard procedure to correct remaining calibration errors using actual recorded images with particles. For high seeding densities and large allowed triangulation errors necessary to detect large camera shifts, the number of triangulated ghost particles can exceed the number of true particles by orders of magnitude, which makes the detection of the true disparity peaks more difficult. VSC has been improved here using the distinction between true particles with disparities always inside the true disparity peak in the disparity map for all cameras, while ghost particles are distributed over random position. This VSC with ghost particle suppression (VSC-GPS) makes VSC significantly more robust with many orders of magnitude fewer ghost particles and enables detection of disparities even larger than 10 pixels. In addition, an alternative volume self-calibration method is presented based on standard image correlation (VSC-IC) between dewarped images of two cameras similar to Stereo-PIV self-calibration without the need of particle detection and triangulation. For each camera combination, a correlation streak becomes visible in the correlation map, where the position of the streak is used for correcting the camera mapping functions. VSC-IC can easily detect very large camera shifts  >10–30 pixels. For inline camera configurations, VSC-IC needs to be modified by some slower triple-image correlation technique. As a useful side-benefit, the intensity along the correlation streak provides directly the intensity profile across the measurement volume depth. Both methods, VSC-GPS and VSC-IC, have been tested on many experiments, and in particular for the most difficult case of vibrations where each camera image needs a modified calibration function. Guidelines are given to detect and correct vibrations.


Introduction
Multi-camera volumetric flow measurement techniques require an accurate perspective calibration between cameras and measurement volume. The calibration procedure computes a mapping function between world points (X, Y, Z) and image pixel locations (x i , y i ) for all cameras enabling the inverse (tomographic or triangulation based) reconstruction of voxel intensities or particle locations in the measurement volume from recorded pixel intensities or particle image locations. Many functional forms for the mapping function have been proposed, including polynomials (Soloff et al 1997, Prasad 2000, DLT (direct linear transformation) and full camera pinhole with distortion parameter (Tsai 1987, Zhang 2000, Willert 2006, pinhole with Scheimpflug corrections (Fournel et al 2004, Louhichi et al 2007, Astarita 2012, Cornic et al 2016, accounting for multi-media models with changing index of refraction (Belden 2013) and others. All mapping functions are equally suitable as long as they accurately accommodate all optical distortions.
Error sources during the initial calibration step include manufacturing accuracy of the calibration plate, inaccuracies in the calibration procedure e.g. during translation of the calibration plate, measurement errors in the position of the detected calibration marks, or inadequate functional form of the mapping function with e.g. insufficient number of free parameters to account for the optical setup. Furthermore, just before or during data recording, the cameras could have moved or are sagging slowly due to instable mechanical setup, or the cameras are even subject to mechanical vibrations.
On the other hand, tomographic PIV (tomo-PIV) requires calibration errors smaller than 0.4 pixel (Elsinga et al 2006), ideally less than 0.1 pixel, everywhere in the measurement volume. The same is true for the iterative particle reconstruction technique (Wieneke 2013) as part of the recently developed 'shake-the-box' (STB) particle tracking method (Schanz et al 2016).
For correcting remaining calibration errors, a volume self-calibration (VSC) method has been developed (Wieneke 2008), using recorded images with particles to refine the calibration mapping functions. VSC determines first the 2D-position of particle images in the different camera images, and searches for possible 3D-particle positions in the measurement volume corresponding to the 2D-positions, where the maximum allowed triangulation error ε should be larger than the maximum expected calibration error. The line-of-sights from the 2D-particle positions in each camera image through the volume do not intersect exactly in one world point. The triangulation procedure calculates a best-fit world point which projected back onto the camera images leads to pixel positions slightly different than the original particle image position (figure 1).
For each camera, these dx-/dy-differences ('disparities') are plotted as small blobs in a 2D-disparity map of size ±ε in dx and dy. Averaging over many particles, the true disparity peak for each camera emerges quickly, while particle disparities related to false particle triangulations ('ghost particles') are randomly distributed. Usually, the measurement volume is divided up into n x × n y × n z sub-volumes to compute disparity maps for each sub-volume such that spatially varying calibration errors can be detected and corrected. A single subvolume with more particles and better statistics is sufficient for detecting constant global camera shifts. Further details are provided in Wieneke (2008).
VSC has become an indispensable tool for tomo-PIV and recently STB to visualize and quantify calibration errors in the first place before correcting them. A successful VSC step is in most cases a mandatory prerequisite for tomo-PIV or STB to work at all. When one does not see clear disparity peaks, it is most often useless to try (S)MART-based voxelreconstruction (Elsinga et al 2006, Atkinson andSoria 2009) or 3D-particle triangulation (Schanz et al 2016).
The performance of VSC as reported in the literature is shown in figure 2. In 78% of all experiments the initial calibrations had maximum disparities above 0.4 pixel. Quite a few experiments had to cope with calibration errors above 4 pixel. After applying VSC, all disparities are reduced to acceptable levels, on average even below 0.1 pixel.
While more cameras are beneficial for the MARTreconstruction of tomo-PIV, Discetti and Astarita (2014) report on the detrimental effect on VSC when increasing the number of cameras. In particular, with a possibly large search area (e.g. allowed triangulation error ε > 3-5 pixel), the number of erroneous particle triangulations (ghost particles) increases dramatically so that in some cases it becomes difficult or even impossible to find the barely visible true disparity peak. On the other hand, Lynch and Scarano (2014) reported successful VSC correction in a 12-camera tomo-PIV experiment.
One circumstance, where VSC has proven very useful, is in the case of vibrations which can easily amount to camera disparities of up to 5-10 pixels changing from image to image. Michaelis and Wolf (2011) implemented a singleimage vibration correction. First, the complete volume was used for 3D-particle triangulation with a search area larger than the expected maximum disparity, only using the brightest few percent of particles, to compute some initial rough global shift for all cameras. This has been followed by more detailed passes with multiple sub-volumes to correct remaining locally changing calibration errors, once the large shift has been corrected. The authors report initial disparities as large as 12 pixels being reduced to remaining 0.02 pixel after several passes of VSC. It is certainly tedious and time-consuming to compute a new calibration function for every image processed, but under such conditions this is the only way to process such data. Earl et al (2015) also investigated the effect of vibrations on tomographic reconstruction, which are especially noticeable in derived turbulent statistics quantities. The authors suggest checking for vibrations using single-image VSC even in case where the apparent reconstruction quality seems to be good. Small-scale vibrations smaller than the diameter of the particle images may be hidden by VSC when summing disparity maps for many images of the data set. Single-image corrections can potentially improve the final vector field accuracy considerably.
Recently, Cornic et al (2016) investigated VSC in more detail and proposed some improvements. A volumetric pinhole calibration model with Scheimpflug angles was implemented which is more appropriate for the standard setup in tomo-PIV with Scheimpflug adapters between camera and lens so that all cameras view the complete tilted volume in focus. The authors point to the possibility of easily refitting only the external pinhole parameters in case of camera drifts. They also mention the fact that VSC actually changes the origin of the coordinate system when disparities are corrected.
In practice, small shifts in the coordinate origin of e.g. 1-2 voxel are seldom critical. It is anyway difficult during the initial calibration step to accurately place a calibration plate at a specific location and orientation within a few pixels. Ideally some reference marks should be visible in the recorded images to (re)set the coordinate system. Zunino et al (2015) fixed a transparent dual-plane calibration target just outside the test section for calibration and to account for possible relative motion between the imaging system and the experiment. This is especially useful when access to the measurement volume is difficult. VSC works with any calibration models. The choice of mapping function is mostly a matter of enough but not too many free parameters to accommodate all optical distortions. In air a pinhole model preferably with Scheimpflug angles (up to 16 parameters per camera) is often sufficient, but it may not be appropriate e.g. in water or looking through curved optical surfaces or changing index of refraction. Higher order (polynomial) models (e.g. with 50-100 parameters per camera) may be required to achieve necessary accuracies in difficult optical setups.
While VSC has been applied successfully to the majority of volumetric flow experiments, it is sometimes not straight forward to find true disparity peaks accumulating from real particles among e.g. 1000-times more ghost particles in conditions of large camera shifts and high seeding density. The basic 3D-particle triangulation procedure usually consists of picking a detected 2D-particle location in the first camera image and searching along the epipolar line in the image of camera 2 image for matching particles within a stripe of length L z and width ±ε which is user selected allowed triangulation error ε. The length of the epipolar line L z is related to the depth of the measurement volume and the angle between cameras. With matching particles for camera 1 and 2, a 3D-particle position is already determined, and matching particles are searched in a little square of size (2ε) 2 around the corresponding pixel location in subsequent camera images. Thus e.g. for four cameras the ratio R of ghost particles to true particles is given roughly (depending on details of the triangulation process) by: with seeding density n ppp in particles per pixel. This immediately points to the cases where VSC becomes difficult due to a large number of ghost particles: deep volumes and large angles between cameras and therefore large L z , large camera disparities requiring a large search range ε, and high seeding densities. For example, for L z = 300 pixel, n ppp = 0.05, and ε = 2 and 5 pixel, the ratio R is 38 and 3750, respectively. There are several possible strategies to improve the robustness of VSC. First of all, if possible one may record first a few images (e.g. 10-100 is often sufficient) with lower seeding density. This works well, but, ideally, one would like to avoid such an extra step. Also later recordings may be still different with possible additional camera movements or even vibrations. So VSC needs to use the real data after all. A second approach is to reduce high seeding densities artificially by only considering a fraction of the particles in each recorded image, namely the brightest ones. Taking e.g. only the 10% brightest particle images reduces theoretically the number of ghost particles by a factor of 1000 for a four-camera experiment. This routinely used trick improves the robustness of VSC considerably, enabling successful VSC which would simply fail if one takes all true 2D-particle images above  some intensity threshold for the further 3D-triangulation step. Nevertheless, the set of e.g. 1000 brightest 2D-particles in each camera image may not be consistent in the sense that these particle images actually may not correspond to the same set of 1000 particles in space because: -A bright particle image in one image may result from the overlap of two particles. They likely do not overlap in the other camera images and may not be included there in the set of brightest particles. -Particles may be non-spherical leading to higher intensities for some cameras and lower for other ones. -Camera viewing angles vary across the image and result in changing ratios of particle image intensity between cameras in different part of the images.
In practice, the number of final true 3D-particles may be less than 10%-20% of the detected 2D-particle images, but this is often just sufficient for successful VSC corrections.
Furthermore, choosing only the brightest particle images may result in selecting only the ones e.g. in the center of the measurement volume with higher light intensity. It may then not be possible to correct the mapping function everywhere in the volume. For larger camera disparities, a good strategy is to correct first the large global camera shifts using a larger search range ε, a single sub-volume (already more if the statistics allows it) and a low number of brightest 2D-particle images, followed by one or more refinement step with much reduced ε-e.g. down to less than 1-2 pixels-using more sub-volume and larger number of brightest 2D-particles to correct remaining optical distortions.
Finally, for all experiments, it is recommended to do a calibration before and after the experiment to be able to select the most accurate one. This is especially useful when cameras are drifting substantially over time e.g. due to weak mechanical mounting or somebody bumping into a camera just before or during recording. In any case, after recording some images, a quick check with VSC to confirm the validity of the calibration is almost mandatory.
In the following, some improvements to VSC are presented to increase the robustness in particular for more difficult cases. Section 2 shows a simple, but highly effective way to remove the number of ghost particles by many orders of magnitude, making VSC far more robust even for large disparities of 10 pixels or larger. Section 3 presents a volume self-calibration technique not based on 2D-and 3D-particle detection, but on standard image correlation on dewarped camera images. Section 4 discusses the case of vibrations requiring singleimage calibration correction.

Suppression of ghost particles in standard VSC
There is a simple (and so far overlooked) distinction between true particles and ghost particles which can be used to efficiently eliminate ghost particles. As shown in figure 3, true particles always have disparities for all cameras located inside the true disparity peak (grey blob), while for ghost particles the disparity for each camera is randomly distributed and may fall inside the true disparity blob for one camera, but very unlikely for all cameras. For four cameras, the task is then to find the cluster of true particle disparities (Δx i , Δy i ) in this 8D space, which is reduced to six dimensions by fixing one camera. Ghost particles would be randomly distributed in this space, while all true particles accumulate close to the true set of disparities.
Such a cluster search in 6-8 dimensions for up to possibly 10 4 -10 6 particles per sub-volume is computationally very challenging. Here a simple fast procedure is implemented by introducing a weight W for each particle and displaying the disparities of a particle as Gaussian blobs in the disparity maps with a maximum intensity of W. The weight W is determined by the number of other particles having the same disparities (Δx i , Δy i ) for all cameras within some neighborhood given by ±1/20 of the allowed triangulation error ε, but at least ±0.2 pixel to still have enough matches for low image quality and at most ±0.5 pixel to keep the number of matches sufficiently low for large ε > 10 pixel. The choice of 1/20 is related to the selected disparity map size of 40 × 40 (virtual) pixels and plotting disparities as a Gaussian blob of size 2. This value has been found to work well in all experimental cases tested.
Computing the weights for n particles needs n 2 /2 checks (if two particles have similar disparities for all cameras), which already becomes too slow for more than 10 5 particles on a standard 12-core PC. Here the checks are performed once 10 4 particles have been collected for some sub-volume. True particles immediately have weights typically much larger than two (assuming that at least three true particles are among the 10 4 total number of particles), while almost all ghost particles have no match or at most one. Restricting the match to ±ε/20 corresponds to 1/100 of the area of the disparity map. Thus, for 4 cameras the probability that a ghost particle has a match with another ghost particle is of the order of 10 −6 . It is not 10 −8 since the triangulation procedure has two constraints, similar to the sum over Δx i and sum over Δy i being zero, equivalent to directly fixing one camera position.
Next, all particles with zero matches are removed, and also particles with consecutively higher number of matches are removed until the number of remaining particles falls below 25% (2500). Typically, only the few true particles are retained together with even fewer random ghost matches.
Then further particles are added to the list of particles until again 10 4 particles are reached and another check is performed. Also at the very end after processing all 2D-/3D-particles and all images, a final check is done and the particles are displayed with their weights in the disparity maps. When the number of matches for some particles has reached 10-20 in some subvolume, this corresponds to the true disparity peak with a very high probability and adding further particles could be stopped already. Implemented here is a scheme, where, if the number of matches exceeds 50, new 3D-particles are still added to the list, but only if they are in the neighborhood of the true disparity peak given by the disparities of the particle with the highest number of matches. This still increases the accuracy of the true disparity peak and also reduces the processing time, since no more n 2 -checks are needed, except at the very end and unless the number of true particles exceeds 10 000. The complete flowchart is shown in figure 4. This volume self-calibration with ghost particle suppression (VSG-GPS) has been tested successfully for many experiments. Shown here are results using time-resolved volumetric data from Violato and Scarano (2011) with a water jet illuminated cylindrically (figure 5). The flow has been recorded with a wide range of seeding densities, which makes it an ideal test case for volumetric flow measurement techniques and related algorithms like VSC. One particular recording with a maximum particle density of about 0.05-0.07 ppp ('run 184') has been processed using only a single image and single (sub-)volume, where the allowed triangulation error has been varied from 1 to 10 pixels, and the number N of the brightest 2D-particles in the camera images used for the 3D-triangulation is varied from 1 k to 8 k. The real number of particles is around 30-40 k.
Disparity maps are shown in figure 6 for camera 1. Obviously increasing N or ε increases significantly the number of ghost particles until at some point it becomes very difficult to find the correct disparity peak. The new ghost particle suppression procedure turns out to be amazingly effective. Even for the two most difficult cases with N = 1 k, ε = 10 pixel and N = 8 k, ε = 4 pixel, only a few ghost particles with at most 5 matches remain which leads to maximum ghost intensities of around 25-not visible within the dynamic range of the color scale-compared to the true disparity peak intensity of 500 and 2000, respectively. The same is true even for the extreme case of ε = 10 pixel and N = 8 k (not shown) with true disparity peak intensities of 2000 and maximum ghost intensities of 70. The original number of ghost particles is around 10 7 , reduced as expected by a factor of about 10 6 by VSC-GPS. Standard VSC without GPS completely fails in this case.
VSC-GPS has also been applied using a single image, single sub-volume, N = 1 k and ε = 10 pixel on data from the same experiment with higher seeding densities up to 0.15 ppp. Again the same clear disparity peaks appear with at least two orders of intensity higher than the remaining ghost particles. Furthermore, many other experiments have been checked. So far VSC-GPS worked successfully in all cases.
VSC with ghost particle suppression is also slightly more accurate than standard VSC where random ghost particles disturb the true disparity peak. For some synthetic data with n ppp = 0.08 ppp and perfect calibration, VSC-GPS reduced the average measured disparity from 0.2 pixel to 0.1 pixel.

VSC using image correlation
A new method for calculating camera disparities is presented here, which does not rely on particle detection in the recorded  images and 3D-particle locations in the measurement volume. It employs standard image cross-correlation between cameras in the same way as in the stereo-PIV self-calibration technique (Willert 1997, Coudert and Schon 2001, Scarano et al 2005, Wieneke 2005) by dewarping the recording images of the cameras to some Z-plane and cross-correlating portions of two camera images. So far it has been used only the context of correcting global camera shifts, corresponding to a single sub-volume in VSC(-GPS).

Method
Any calibration mapping functions relates world points (X, Y, Z ) to camera i pixel locations (x i , y i ): The inverse line-of-sight function is given by: providing all (X, Y, Z )-points in the measurement volume corre sponding to some pixel location (x i , y i ). Image dewarping ('rectification') maps the recorded image onto some Z-plane and provides intensities I * i as a function of world coordinates X and Y: Units in recorded images are given in pixels and in the world coordinates are in length units, e.g. mm, which are mapped to some virtual voxels in particular for tomo-PIV, where the voxel-to-pixel ratio-not constant across the image-is usually set close to 1. The necessary sub-pixel interpolation is performed by bi-linear interpolation in order to save processing time. More accurate interpolation is possible using e.g. splinebased higher-order schemes (Astarita 2006). In the following, dewarping and image correlation is done around the middle of volume at (X 0 , Y 0 , Z 0 ). In case, the middle of the volume is occupied by some object, this point should be moved to some other region, of course. World point (X 0 , Y 0 , Z 0 ) corresponds to pixel locations (x i0 , y i0 ) in the recorded image of camera i. Correlation between dewarped images of camera i and j is given by: where the average interrogation window intensity has been subtracted as usual to improve the correlation quality. Just like for standard VSC, correlation maps from many image pairs can be summed-up to improve the statistics. Correlation is done here by FFT. Let us assume a perfect calibration and that all particles are located at the plane Z = Z 0 . Then the correlation leads to a single correlation peak at (0,0). Now, if the particles-still at the same location in the recorded image of camera i-are actually located at some position Z 0 + ΔZ, then the location of the particles in the dewarped image of camera i are shifted by the derivative ∂X i /∂Z times the Z-displacement ΔZ, where the derivative at reference position (X 0 , Y 0 , Z 0 ) corresponding to camera pixel location (x i0 , y i0 ) is computed by: (6) Therefore, the correlation peak for all particles at Z 0 + ΔZ is shifted to location: Thus, the complete correlation streak of particles at all Z-locations has a direction: The direction perpendicular to the correlation streak is given by the unity vector: For real experiments, the correlation streaks are shifted due to calibration errors. This is illustrated in figure 7 for a periodic hill experiment (ERCOFTAC test case 81) at TU-Munich with six high-speed cameras in water (Schröder et al 2015).
The size of the measurement volume is 95 × 90 × 20 mm with a seeding density of about 0.04 ppp. The correlation streak between camera 3 and 5 is shown with an interrogation window size IW = 512 × 512 pixel summing disparity maps over 10 image pairs. The distance between the streak and the (0, 0)-correlation center is determined by first rotating the correlation map back by angle α (equation (9)), followed by summing the correlation values horizontally (along dx) for all vertical positions: Horizontal summing is limited to at most ±IW/4. The correlation function (equation (5)) is computed here by fast cyclic FFT, which becomes unreliable toward the rim, where the weighting of the true signal drops to zero. One could use slower direct or zero-padded FFT to avoid this. Optionally, the summation is further restricted to the range corresponding to a user-specified depth Z 1 to Z 2 of the measurement volume, using the fact that a particular position on the correlation streak corresponds to a definite Z-location according to equation (7). For a thin volume and a correspondingly short correlation streak, one should avoid summing outside random correlation signals where there is no contribution from real particles.
The result is a profile as shown in figure 7 (right), where the highest value indicates the location of correlation streak, i.e. the distance d ij of the streak from (0, 0) along direction e ij . The exact sub-pixel position of the correlation peak is determined by a three-point Gaussian fit. Even barely visible streaks in the correlation map often lead to clearly visible and accurate peaks in the disparity profile. As a good indicator, it is found that a peak ratio between the highest correlation peak and the second highest in the disparity profile should be above 3-4 to provide reliable camera disparities. Increasing the interrogation window size as well as summing correlation maps over many images improves the statistics significantly. Figure 8 shows the sum of all correlation streaks of camera combination 3-1, 3-2 to 3-6 before and after VSC-IC. Maximum disparities are around 3-4 pixels. Standard VSC(-GPS) provides the same results, as also shown later for another experiment.
The correlation streak remains a straight line as long as the derivatives ∂X i /∂Z stay constant across the interrogation window and in depth. Since the line-of-sight is actually very close to a straight line, equation (3) for the center point is roughly given by: Thus the derivatives related to c x and c y are constant at least for a single pixel position in the interrogation window. So far, for all experiments tested, correlation streak bending or fanning has not been observed. It should only occur for inaccurate initial calibrations with severe distortions of the measurement volume.

Relating correlation streak locations to camera dispari-
ties. Let us assume in a region around the reference point (X 0 , Y 0 , Z 0 ) that camera i has an offset (disparity) (ΔX i , ΔY i ) in the world coordinate system: which corresponds to a disparity (Δx i , Δy i ) in image position: with When only determining a single offset, it is recommended to use equation (14) for finally correcting the camera mapping functions M i , since most (larger) calibration errors like camera movement will result in a constant shift (Δx i , Δy i ) in the recorded images, and not a constant shift in world coordinates. But first, it is easier to use (ΔX i , ΔY i ) in the following.
Any disparity (ΔX i , ΔY i ) for camera i and (ΔX j , ΔY j ) for camera j would shift the correlation streak between camera i and j simply by (ΔX j − ΔX i , ΔY j − ΔY i ). The projection onto the direction e ij perpendicular to the correlation streak yields the (measured) distance d ij of the streak from the correlation origin (0, 0): with vector d ij For all combinations between cameras i and j and n cam cameras, there are n cam (n cam − 1)/2 camera independent equations since d ji = −d ij is not adding extra information. This system of equations has multiple solutions, since adding any global shift (ΔX, ΔY) in the coordinate system to all world disparities (ΔX i , ΔY i ) and mapping functions would obviously lead to the same d ij . In the same way, adding a global offset ΔZ also does not change d ij : For four cameras with eight unknown disparities (ΔX i , ΔY i ), there are six measured distances of the streaks (d 12 , d 13 , d 14 , d 23 , d 24 , d 34 ) and three degrees of freedom related to the free choice of coordinate origin. The problem is solved in two steps suitable for any number of cameras (n cam 3) and degrees of freedom. First, one of the many possible solutions for (ΔX i , ΔY i ) is found-subject to adding an arbitrary global shift (ΔX, ΔY, ΔZ), which is then optimized by adding an optimal shift in the global coordinate system to reduce the camera disparities as much as possible.
The first step is solved iteratively by successively adding the average differences between the LHS and RHS of equations (16) and (17) to all ΔX i = (ΔX i , ΔY i ): Obviously, (ΔX i , ΔY i ) are not changing anymore when equations (16) and (17)  where the disparities (ΔX i , ΔY i ) are modified by: with ΔX i (ΔZ) and ΔX i (ΔZ) evaluated according to equation (6). Then (ΔX i , ΔY i ) are converted to (Δx i , Δy i ) using equation (15). As mentioned before, it is preferred to minimize image disparities, since camera movements usually correspond to a globally constant shift in the recorded image and not a constant shift in world position. This is the same optimization as done in the standard VSC-procedure since the particle triangulation method used there also minimizes the same L 2 -cost function in image space as recommended by Hartley and Sturm (1994). Finally, the computed disparities (Δx i , Δy i ) are used to correct the mapping functions according to equation (14).
VSC-IC has been tested successfully on a number of experiments. The main advantage is the robustness since the calculation is almost independent of the magnitude of the camera disparities and there are no parameters to set like the number of brightest particles or the largest expected triangulation error as in standard VSC. Since no particle detection is necessary, it is also more tolerant to different particle shapes (e.g. astigmatism) and particle overlap. One can easily find and correct shifts of 10-30 pixel or even larger. In general, once large camera shifts are corrected, it is followed by standard VSC(-GPS) with multiple sub-volumes to correct remaining spatially varying calibration errors. In principle, one can compute VSC-IC for many smaller local regions around any (X, Y, Z)-location by dewarping to this Z-location and selecting (smaller) interrogation windows around (X, Y), which would correspond to the multiple sub-volumes in VSC, but this has not been explored yet.

Single camera correction.
Often, only a single camera is moving due to mechanical instability. Equation (20) would distribute such a movement to all cameras. As shown above and pointed out by Cornic et al (2016), volume self-calibration suffers from an arbitrary choice of coordinate system. At least, the L 2 -cost function of equation (20) minimizes the shift in coordinate system. In case when only a single camera has shifted, it would be advantageous not to change the global coordinate system for all cameras but to correct only the one moving camera. For this purpose, one can first apply equation (20) and find the camera k with the largest disparity |ΔX k |. Then the L 2 -cost function minimization is performed again, but only for the other cameras: For a water experiment with four cameras in square setup, VSC has been performed first to achieve a perfect calibration. The size of the measurement volume is 200 × 150 × 70 mm with a seeding density of around 0.03 ppp. Then the images of camera 3 are artificially translated by (dx, dy) = (5, 8) pixel, which leads to shifted correlation streaks as shown in figure 9. The summed correlation maps of camera 3 shows that all streaks of 3-1, 3-2, and 3-4 intersect in one point but not (0,0) and for the other cameras it is only the combination with camera 3 which does not go through zero. This is a clear indication that camera 3 is the single cause of streak dislocations, and shifting only camera 3 back would move all streaks back through the center. Standard L 2 -cost function minimization with all cameras distributes the (5, 8) pixel shift of camera 3 to all cameras (table 1). Without camera 3, which has by far the largest disparity, the cost function minimization recovers the true displacements, and thus preserves the original location of the global coordinate system. This procedure can also be applied to standard VSC(-GPS) with a single (sub-)volume and suitable adaption for the case of multiple sub-volumes.
In general, highly accurate location and orientation of the coordinate system may be best achieved by locating specific features in the cameras images (see e.g. Zunino et al (2015)). It may be not sufficient to rely on accurate placement of a Figure 9. Inner part of summed correlation maps for camera 1-4 from left to right, summing maps 1-2, 1-3, 1-4 for camera 1 etc. Sum-ofcorrelation over eight images. Red cross indicates correlation center (0, 0). calibration plate, when VSC may shift the coordinate origin later by a few pixels.

Triple image correlation for inline camera configuration
Unfortunately, the above procedure does not work for cameras positioned along a line ('inline') or close to a line. For example, when all cameras are placed horizontally along the x-axis, this would lead to horizontal correlation streaks for all camera combinations. Streaks shifted to non-zero dY-locations can be used to eliminate camera disparities in Y-direction, but each camera could have an individual offset in X or Z, which would shift the correlation streak onto itself in dX-direction. It is not possible to determine individual ΔX i , since all e ijX in equation (16) are zero. Figure 10 shows the correlation streaks for the tomographic PIV experiment at TU-Delft (Violato and Scarano 2011) with inline camera configuration. The streak length depends on the angle between the two cameras. Cameras closer together (e.g. cameras 1 and 3) lead to shorter streaks with higher correlation values and better statistics.
A modification to VSC-IC with a somewhat more complicated triple-image correlation procedure is presented here, which, while slower, can equally determine all camera disparities in a robust way. At the beginning, camera 1 is taken as a reference with zero disparities. Then the X/Y-location of correlation streak between camera 1 and 2 provides a shift (ΔX 2 , ΔY 2 ) which is subsequently used to correct camera 2 such that camera 2 images are shifted perfectly onto camera 1 images. The next step computes the disparity (ΔX 3 , ΔY 3 ) of camera 3 involving a 2D-correlation with three images: camera 1 dewarped image I * 1 multiplied with camera 2 dewarped image I * 2 with additional shift of (ΔX 2 , ΔY 2 ) and correlating with camera 3 dewarped image I * 3 : where the sum over X,Y is over the interrogation window, and the sum over Z is summing the correlation maps with image dewarping of camera 1-3 to this Z-location. This process is illustrated in figure 11. Let us assume that dewarping is done here at Z = Z 0 , and multiplying the images of camera 1 and 2 (already shifted by (ΔX 2 , ΔY 2 )) picks out the red particles at this Z-plane. Then the correlation with the dewarped image of camera 3 provides a correlation peak (green displacement vector) for the match between the red particles of camera 1 × 2 with the open black circle particles of camera 3 at Z 0 . This process is repeated for all Z-positions between the user specified depths of the measurement volume Z 1 to Z 2 , adding to the same correlation peak and improving the statistics. The result is a single correlation peak providing the displacement (ΔX 3 , ΔY 3 ) of camera 3 relative to camera 1 and 2. This is repeated for camera 4, 5, etc providing directly a possible solution for the set of disparities (ΔX i , ΔY i ) which is then optimized in the same way as described before by minimizing the cost function of equation (20) or (22).
This triple-image correlation is the correlation equivalent of the three-camera individual particle triangulation used in standard VSC. With three cameras there is no longer any ambiguity in the individual ΔX or ΔZ disparities, since camera 1 and 2 serve as a reference coordinate system. Surprisingly, this scheme is also very robust and provides clear correlation peaks for all experimental test cases where it has been tested.
For the TU-Delft experiment above, the final computed disparities are below 0.5 pixel in magnitude for all cameras. Standard VSC-GPS, of course, can also quickly and accurately calculate such disparities. Again, the purpose of VSC-IC and triple-image VSC-IC is mainly to detect large disparities, which can be difficult with standard VSC. Tripleimage VSC-IC is order-of-magnitude slower, since it dewarps each raw image maybe 100-500 times. It can be made faster by restricting the Z-range to e.g. 10-50 planes, which is often sufficient to achieve correlation peaks with peak ratios above 4. Alternatively, GPUs are ideal for such image transformation, which would reduce the 90 s needed so far (12-core PC with Core i7 CPU) for one TU-Delft image to probably less  figure 12 is shown a comparison of VSC still without GPS, VSC-IC and triple-image VSC-IC for a water experiment (measurement size 150 × 115 × 120 mm, seeding density 0.04 ppp) with 4 cameras in square setup to prove that all methods lead to the same disparity values within 0.1 to 0.2 pixel.

Intensity profiles
The computed correlation streaks in VSC-IC provide information about the light profile along the Z-direction since each point on the line corresponds to a certain Z-position according to equations (7) and (8) using ∂ΔX ij /∂Z for each camera combination i − j. Plotting the intensity along the streak is shown in figure 13 for all 15 camera combinations of the 6-camera periodic hill experiment of Schröder et al (2015), here summed  over 200 images. The intensity needs to be renormalized since for FFT without zero-padding the signal decreases linearly toward the rim of the correlation map (50% at ±IW/4). One needs sufficiently large interrogation windows for the whole streak to be contained in the correlation map. It works equally well for cameras in square or inline configuration and provides useful information about the quality of illumination and the Z-depth of the measurement volume, which is often a range not known very accurately, or only after e.g. MART reconstruction as part of later tomo-PIV processing (see e.g. figure 16 in Scarano (2013)).

Vibration correction
Vibrations are the most difficult task for VSC since only a single image per camera is available for processing. For double-frame recordings with a second frame shortly after the first one, both images can usually be taken since the vibration time scale is typically much longer than the inter-frame time. The first task is to realize that the cameras were actually subject to vibrations. As pointed out by Earl et al (2015), vibrations with small magnitude of e.g. 0.1-0.3 pixels might escape attention completely. It only leads to some small broadening of the disparity peaks in VSC(-GPS) or of the disparity streaks in VSC-IC. Larger vibrations are more obvious since standard VSC fails to produce compact disparity peaks when summing disparity maps over many images.
In most cases a good strategy to detect vibrations consist of using first a single image and the complete measurement volume (one 'sub-volume') and VSC-GPS or VSC-IC to find the possibly large global shift of each camera by increasing the allowed triangulation ε error in VSC-GPS until a good disparity peak becomes visible. For VSC-IC, there is no need to specify ε, only to set the interrogation window size large enough to achieve a peak ratio of larger than about 3-4. This is followed by checking if the next images have the same disparities. In VSC-GPS, if again no clear disparity peak is visible for later images, ε needs to be increased further. If one suspects small vibrations, one could determine the standard deviation of the disparities for some number of images and decide if this is larger than the error expected for the given image quality and number of detected true particles, which is a number available when using VSC-GPS.
Once it is clear that vibrations are present, then for every single image of the recorded data set the camera shifts need to be determined by VSC-GPS or VSC-IC. Instead of generating a complete new calibration function for each image, it is easier to store the disparity values (camera shifts) along with each image, and use these values later whenever the mapping function is evaluated within some processing function like tomo-PIV, STB or even VSC itself. Indeed, single-image vibration correction only corrects the global shift of cameras, but not small remaining optical distortions, i.e. disparities which may vary across the measurement volume. Here, many sub-volumes need to specified for VSC(-GPS) and most often the disparity maps of many images are summed for sufficient statistics, using already the available individual image shifts from the vibration correction pass. It is also possible to refine the global vibration correction itself. After the first pass, reducing ε down to 1-3 pixels increases the accuracy of the newly computed image shifts to be added to the old ones. For large optical distortions, a few refinement steps may be needed, alternating between single-image vibration correction and multi-image multi-sub-volume optical distortion correction.
It is in principle possible that vibration expresses itself not as a linear translation, but as some rotation around a center point not too far away from the camera sensor. This has not yet been detected in any experimental data set. If so, at least 2 × 2 or 3 × 3 sub-volumes need to be specified in the XYplane for VSC-GPS to detect such rotations. Of course, with true particles distributed over many sub-volumes, the statistics becomes less reliable.
Below are shown results both for VSC-GPS and VSC-IC for the extreme case reported by Michaelis and Wolf (2011) with vibrations of up to ±10 pixel. The measurement volume is 60 × 34 × 13 mm with a seeding density of about 0.035 ppp. The authors managed to correct the vibrations using standard VSC on each single image with a hierarchical scheme from 1 sub-volume to detect the large global shift to n sub-volumes to detect remaining small distortions.
For the first image of the data set and using VSC without GPS, increasing ε up to 9 pixels shows more than 5000 random ghost particles (figure 14 (left), map for camera 1). VSC-GPS has only a few dozen ghosts at ε = 9. Starting with ε = 10 pixels, the true disparity peaks from more than 200 true particles become clearly visible (figure 14, VSC and VSC-GPS with ε = 12). Note that the color scaling in figure 14 is always such that the maximum is red. For VSC-GPS, the few ghost particles are 30 k times less bright than the true disparity peak at ε = 12 (not visible in the ε = 12 map).
For all maps, the maximum number n of brightest particles is set to 1000. VSC-GPS is rather insensitive to n producing the same map at ε = 12 with different maximum intensities but almost no ghost particles for N = 100 to N = 10 000 with less than 1 s to 1 min processing time per image, respectively. VSC without GPS works for a small range of N from 500 with barely enough statistics to N = 2000 where the true disparity peak becomes drowned in ghost particle background.
The same data has also been processed with VSC-IC. Shown in figure 15 (left) is the overlap of the correlation streaks for camera 1 for camera combinations 1-2, 1-3, and 1-4, IW = 512 × 512 pixel and summing over 10 images. Two somewhat smearedout correlation streaks are visible horizontally and diagonally, with a hint of several vertical lines. After correcting each image with the measured camera shifts, the correlation streaks become clearly visible in figure 15 (right). For correlation maps from a single image, despite the visually patchy correlation data, the average peak ratio in the profile plot is 8.5 (figure 16), more than enough for measuring accurate single-image disparity values (also IW = 256 × 256 is possible), which are within about 0.05 pixels compared to VSC-GPS.
So far, only very few data sets have been found with noticeable vibrations. In all cases, it was easy to correct the individual image shifts by VSC-GPS or VSC-IC.

Conclusions
For multi-camera volumetric flow measurement techniques like tomographic PIV or shake-the-box, volume self-calibration (VSC) has become the standard procedure to correct remaining optical distortions and camera shifts due to mechanical instability or vibrations. VSC detects 2D-particle positions in the camera images and triangulates the 2D-positions from all cameras into a best-fit 3D-position. This is projected back to the images to a position deviating slightly from the original 2D-particle position. These small differences (disparities) are averaged over many particles by plotting the disparities in a disparity map. Usually, the measurement volume is divided into many sub-volumes to compute camera disparities for each sub-volume enabling the correction of spatially varying calibration errors. For better statistics, the disparity maps of many images can be summed. For high seeding densities and large allowed triangulation errors necessary to detect large camera shifts, the number of triangulated ghost particles can exceed the number of true particles by orders of magnitude, which can make the detection of the true disparity peaks more difficult. One common trick virtually reducing the seeding density is to take only a small fraction of brightest detected particles in the camera images.
A previously overlooked distinction between ghost particles and true particles has been found: true particles have disparities always inside the true disparity peak in the disparity map for all cameras, while ghost particles are distributed over Figure 14. From left to right: VSC ε = 9 px, VSC-GPS ε = 9 px, VSC ε = 12 px, VSC-GPS ε = 12 px. Number of brightest 2D-particles = 1000. Color scale is adjusted such that maximum intensity is always red. The maximum intensity is 39, 5, 409, and 153 k from left to right. Figure 15. VSC-IC for camera combinations 1-2 + 1-3 + 1-4 with interrogation window size of 512 × 512 pixel and sum over 10 images, center inlet with increased color scaling (left). Same after vibration shift correction (right), sum over 10 images. random position. A simple separation scheme has been implemented which reduces the number of ghost particles by many orders of magnitude. This VSC with ghost particle suppression (VSC-GPS) is significantly more robust to enable easy detection of disparities even larger than 10 pixels.
An alternative volume self-calibration method is presented based on standard image correlation (VSC-IC) between dewarped images of two cameras similar to Stereo-PIV self-calibration without the need of particle detection and triangulation. For each camera combination, a correlation streak becomes visible in the correlation map. These streaks may not pass through the center point (dx, dy) = 0 due to calibration errors and the distances from the center are converted to global camera shifts used for correcting the camera mapping functions. The intensity along the streak provides directly the intensity profile across the measurement volume depth, which is a useful information often only gained much later after MART-reconstruction or STBhistograms of particle or track locations.
VSC-IC has been tested successfully on a number of experiments. The main advantage is the robustness since the calculation is almost independent of the magnitude of the camera disparities and there are no parameters to set like the largest expected triangulation error or the number of brightest particles as in standard VSC. Since no particle detection is necessary, it is also more tolerant to different particle shapes (e.g. astigmatism) and particle overlap. One can easily find and correct shifts of 10-30 pixel or even larger. In general, once large camera shifts are corrected, it is followed by standard VSC(-GPS) with multiple sub-volumes and small allowed triangulation error of e.g. 1-2 pixel to correct remaining spatially varying calibration errors.
Standard VSC(-GPS) as well as VSC-IC is prone to a shift-or rotation which is not considered here-of the global coordinate system, which is a free parameter. Usually, the L 2norm of the measured camera disparities is minimized by varying the global coordinate system shift. In case, only a single camera is the source of calibration errors e.g. due to mechanical instability, it can be excluded from the L 2 -minimization to maintain the original coordinate origin.
For inline-camera configurations, VSC-IC is unable to correct calibration errors along the axis of camera setup. Here, a modification to VSC-IC has been implemented by extending the cross-correlation between two dewarped camera images to a triple-image correlation. While considerably slower, it can equally determine the camera disparities with high accuracy and robustness.
Finally, the topic of vibrations of the optical setup is investigated, which requires the refinement of the calibration function for every single image. For small camera shifts, such vibrations-leading to some smearing of the disparity peaks when summing over many images-may simply remain unnoticed but can still degrade the accuracy of MART reconstruction or IPR-particle reconstruction considerably. While large vibration shifts are more obvious, first of all due to lack of a clear disparity peak when the chosen triangulation error was too small, both methods, VSC-GPS and VSC-IC, help to find the correct disparities in such a case of limited statistics. Guidelines are given to detect and correct vibrations.
Improving standard VSC by ghost particle suppression has reduced the need for the new and very robust VSC-IC technique. Further investigation is needed to find out in which situations which method is preferable.