International Journal of Multiphase Flow

The slug flow pattern is one of the most common gas–liquid flow patterns in multiphase transportation pipelines, particularly in the oil and gas industry. This flow pattern can cause severe problems for industrial processes. Hence, a detailed description of the spatial distribution of the different phases in the pipe is needed for automated process control and calibration of predictive models. In this paper, a deep-learning based image processing technique is presented that extracts the gas–liquid interface from video observations of multiphase flows in horizontal pipes. The supervised deep learning model consists of a convolutional neural network, which was trained and tested with video data from slug flow experiments. The consistency of the hand-labelled data and the predictions of the trained model have been evaluated in an inter-observer reliability test. The model was further tested with other data sets, which also included recordings of a different flow pattern. It is shown that the presented method provides accurate and reliable predictions of the gas–liquid interface for slug flow as well as for other separate flow patterns. Moreover, it is demonstrated how flow characteristics can be obtained from the results of the deep-learning based image processing technique.


Introduction
Multiphase flow phenomena are often encountered in different sectors of the energy industry, particularly in the oil and gas production, where the two phases of liquid and gas are flowing simultaneously through transportation pipelines . Field measurements of these flows have a high degree of uncertainty, reaching up to 20% (Elliott et al., 2021). Based on the operating conditions, different flow patterns can form, which describe the spatial distribution of the two phases in the pipe (Hanratty, 2013). One of the most common flow patterns in multiphase transportation pipelines is the slug flow pattern (Al-Kayiem et al., 2017). Slug flow is characterized by a continuous liquid phase with coherent blocks of aerated liquid, which are separated by volumes of gas, see Fig. 1, left. These aerated blocks of liquid are called slugs. They are moving downstream the pipe on top of a slowly flowing liquid layer at approximately the same velocity as the gas (Hanratty, 2013;Taitel and Dukler, 1977;Al-Safran, 2009). The slug flow pattern can cause severe problems in industrial operations. The pressure drop due M. Olbrich et al. is the mean slug frequency, defined as where denotes the number of slug units in the considered time interval, see Al-Kayiem et al. (2017), Baba et al. (2018), Dukler and Fabre (1994) and Olbrich et al. (2021a). The length and time scales of slug flow are illustrated in Fig. 1. These characteristics are often measured with non-intrusive imaging techniques, such as videometric approaches, where the flow is observed with a high-speed camera through a transparent pipe segment (do Amaral et al., 2013). The flow parameters provided by these type of measurement techniques were used for example to identify the flow pattern (Baghernejad et al., 2019), to verify and investigate flow pattern maps (Crawford, 2018), to investigate shapes of slugs and bubbles for specific operating conditions (do Amaral et al., 2013), to validate numerical simulations (Olbrich et al., 2018), as well as to investigate the effects of slug frequency on induced pipe stresses and develop predictive models and correlations (Mohmmed et al., 2019).
One way to obtain these characteristic parameters is to consider the time series of the vertical position of the liquid-gas interface at a fixed -position in the pipe, as illustrated on the right side of Fig. 1. This non-dimensional parameter has a range of [0, 1] with respect to the inner pipe diameter and is hereinafter referred to as the liquid level time series at a fixed position , denoted by ℎ ( ) Schmelter et al., 2021a). It reveals the dynamics of the spatial distribution of the two phases in the pipe and can therefore reliably indicate slugs, waves or other liquid structures, similarly to the hold-up parameter, see Olbrich et al. (2021b). Typically, for analyses of hold-up or liquid level time series, the conventional length and time scales of slug flow are calculated by simple thresholding procedures, see Zhao et al. (2015), Baba et al. (2018), Schmelter et al. (2021a) and Olbrich et al. (2020Olbrich et al. ( , 2021b and Fig. 1. In this paper, the liquid level time series are derived from highspeed video recordings of gas-liquid flows observed from the side through a transparent pipe section. These flows are two-phase gas-oil and gas-water flows, as well as three-phase gas-oil-water flows, where the liquid phase is a homogeneous mixture of oil and water. For this, a deep learning based image processing technique has been developed. In an earlier work, the time series have been extracted from the video data using a fixed sequence of image filters, see Olbrich et al. (2018). This was similar to other approaches in the field, see e.g., do Amaral et al. (2013) and provided reasonable results for the video data with (exactly) the same conditions it was developed for, such as the colours of the fluids and the background as well as lighting and reflections. However, changes in these conditions as well as noisy data led to incorrect liquid level estimations, and individual adaptations were needed. In contrast to this, deep learning models have the potential to overcome such difficulties and to provide reliable and more versatile image processing techniques.
Deep learning describes a family of learning algorithms in the field of machine learning and artificial intelligence (Emmert-Streib et al., 2020). It is used to learn complex and robust prediction models, e.g., multi-layer neural networks with many hidden units, directly from the data without the need of carefully engineering suitable features (Emmert-Streib et al., 2020;LeCun et al., 2015). Deep learning has a wide range of applications in science, business, and technology, e.g., image or speech recognition, see LeCun et al. (2015), Krizhevsky et al. (2017), Ronneberger et al. (2015) and Graves et al. (2013). However, in the field of multiphase flows, deep learning has only rarely been applied. In the following, we give a short summary on such applications.
For the numerical simulations of multiphase flows, deep learning models were trained for example to approximate the governing equations, estimate simulation errors, predict flow parameters, as well as closure coefficients, see e.g., Wang and Lin (2020), Bao et al. (2020) and Ma et al. (2015). Other applications of deep learning models are for example the correction or prediction of certain parameters for multiphase flow measurements, such as flow rates, phase fractions or velocities, see e.g., Yan et al. (2018), Alakeely and Horne (2021), Dang et al. (2019) and Li et al. (2021). Furthermore, in Lin et al. (2020), a deep learning model is used to predict different two-phase flow patterns in inclined pipes based on superficial velocities of the individual phases and inclination angles. Moreover, image processing techniques based on deep convolutional neural networks have been presented in Poletaev et al. (2020), Haas et al. (2020) and Cerqueira and Paladino (2021) for the detection, reconstruction, and analysis of gas bubbles in vertical pipes and micro-channels, for the recognition of flow patterns in micro pulsating heat pipes (Kamijima et al., 2020;Ahmad et al., 2022) as well as for the extraction of relevant water regions as pre-processing step for two-phase PIV-measurements in the field of ship and ocean engineering (Yu et al., 2021). For the quantification of separated and intermittent gas-liquid flow patterns in horizontal pipes, such as stratified wavy or slug flow, such advanced image processing techniques have not been reported. For this, image filter based methods are typically used, which are sensitive to changes in image quality, contrast, and recording set-up. The deep-learning based image processing technique presented here overcomes these problems to certain extent and provides a quantification for other separated flow pattern, namely, stratified, wavy, plug, and slug flow.
The proposed deep-learning model was trained in a supervised manner to correctly predict the liquid level time series from video recordings of gas-liquid flows. The model, a convolutional neural net, was extensively trained and tested with video data from real slug flow experiments and classifies each region in a video frame into its respective phase being either liquid or gas. For supervised learning the data has to be labelled. To do so, the respective video frames were used to create hand-labelled segmentation maps. Furthermore, the consistency of the hand-labelled data and the predictions of the trained model have been evaluated in an inter-observer reliability test. For further evaluations of the reliability and versatility of the trained model, data from experiments are considered, which differ from the ones used for training and testing. These data also include an experiment with a different flow pattern, namely a stratified wavy flow.

Methods
In this section, the architecture of the convolution neural network as well as the experiments, the data, and its acquisition are described. Furthermore, it is explained, how the deep learning model is used with pre-and post-processing steps to extract the liquid level time series from the video data of the experiments. Moreover, the training and test procedure for the deep learning model is described. Finally, the accuracy and error metrics are provided that are used to evaluate the trained model.

Convolutional neural network
Deep convolutional neural networks are state-of-the-art machine learning techniques for image classification and segmentation problems (Dhillon and Verma, 2020;Aloysius and Geetha, 2017). In this paper, the liquid level extraction is considered as an image segmentation problem, where regions of liquid and gas need to be identified and segmented in the video frames. For this purpose, a specific convolutional neural network, the so-called U-net, was chosen. This architecture was introduced by Ronneberger et al. (2015) and has been successfully applied in many image-to-image learning problems, e.g., computer tomography, see Mao et al. (2016), Dosovitskiy et al. (2015) and Jin et al. (2017). The U-net is able to achieve accurate results with only few labelled training data and was therefore taken in this paper. The structure was altered from its original form and adapted from Sterbak (2018). The final architecture is illustrated in Fig. 2. An RGB-image input with the dimensions of 128 × 1024 px is passed through several layers of convolutions comprising a contracting part (left part of the U-shape), a bottleneck with minimal dimension in the centre of the U-shape and an expansive part (right part of the U-shape). The contracting part transfers the input image into a feature map with lower spatial dimensions but a higher number of feature channels. The expansive part generates the resulting segmentation map with the same spatial dimensions as the input image from the lower dimensional feature maps. In Fig. 2, the blue rectangles represent multichannel feature maps. Here, their dimensions are also given, where the first two entries of the 3-tuple are related to the spatial dimensions (height and length) of the input image and the last entry corresponds to the number of feature channels. The basic operation in this network is the 3 × 3-convolution, which is followed by a batch normalization operation and a rectified linear unit (ReLU) as activation function. For the contracting part, the spatial dimensions are down-sampled with 2 × 2-max-pooling operations with stride 2. After each max pooling operation the number of features produced by the 3 × 3-convolution is doubled. For the expansive part, the feature-maps are up-sampled by 3 × 3-(up)-convolutions, which halve the number of feature channels, but double the number of spatial dimensions. For additional information in the reconstruction of the higher dimensional map, the feature maps from the corresponding level of the contracting part are copied and concatenated after each up-convolution. Finally, the segmentation map results from a 1 × 1-convolution and an activation operation using the Sigmoid function. The operations are illustrated as coloured arrows in Fig. 2. For details on the architecture and the operations, see Ronneberger et al. (2015) and Sterbak (2018).

Experiments
The used deep-learning model was trained and tested with data from video observations of horizontal gas-liquid flows. These flows Table 1 Laboratory that conducted the experiments, the set-up for the video recordings (see Fig. 3) and fluid properties for some operating conditions that are used in the experiments.

Lab
Set are two-phase gas-oil or gas-water flows, as well as three-phase gasoil-water flows, where the liquid phase is a homogeneous mixture of oil and water. The experiments were performed by TÜV SÜD NEL and DNV as part of the projects Multiphase flow metrology in oil and gas production (MultiFlowMet I) (Crawford, 2018) and Multiphase flow reference metrology (MultiFlowMet II) (Pieper, 2020). The experimental set-ups are illustrated in Fig. 3. They consist of a horizontal inflow section followed by a vertical measurement section, but here the latter part is of minor interest since the flows in the horizontal pipe are investigated. For the set-up of the MultiFlowMet I project, the horizontal inflow section consists of a straight horizontal pipe with an inner diameter = 0.0972 m and two different lengths inflow ∈ {100 , 500 }, followed by a transparent Perspex viewing section with a length of viewSec = 5 . For the set-up of the MultiFlowMet II project, the horizontal inflow section consists of a straight horizontal pipe with an inner diameter = 0.066 64 m and three different lengths inflow ∈ {100 , 300 , 600 }, followed by a transparent Perspex viewing section with a length of viewSec = 9 .
In Table 1, the laboratory that conducted the experiments, the recording set-up for the video data (see also Fig. 3), the fluids and their properties for some operating conditions are given for the two projects, for details see Crawford (2018) and Pieper (2020). In Table 2, the superficial velocities and lengths of the horizontal inflow sections inflow of the considered two-and three-phase flows of MultiFlowMet I and MultiFlowMet II are given, respectively. For all flows except the stratified wavy flow of Experiment Nr. 13, the slug flow pattern was observed. Please note that, for the stratified wavy flow experiment Nr. 13 and for the slug flow experiments Nr. 1-7 and Nr. 9-12, the interface is clearly visible from the side, which is necessary for the algorithm to work. These experiments have been considered in the training, testing, and evaluation procedure (see Sections 3.1 and 3.3). For the slug flow experiments Nr. 8, 14, 15, and 16, the gas-liquid interface is (partially) unrecognizable due to dispersed phenomena, such as foam or spray. These flows have been considered for the investigations on the limitations of the proposed image processing technique, see Section 4.
The flows were recorded at the viewing sections from the side using a high-speed RGB-camera with a frame rate of 240fps. For each experimental set-up, two different video recording set-ups were used, see Fig. 3. The set-ups 1, 2 and 3 were recorded at NEL, and set-up 4 was recorded at DNV. In set-up 1, the Paraflex oil has an orange-brown colour, the background is black and the viewing section is illuminated from the front. In set-up 2, the Brine water is of grey colour but slightly transparent, the background has a dark blue colour and the viewing section is illuminated from below. In set-up 3, the Paraflex oil (HT9) has a red-brown colour, the background is blue and the viewing section is illuminated from the front. And in set-up 4, the Exxsol oil (D120) is ocher-green, the background has a blue colour and the viewing section is illuminated from behind. Please note that nitrogen gas is M. Olbrich et al.  transparent and colourless, similar to other common gases in the oil and gas industry, e.g., natural gas and argon (Pieper, 2020). Hence, the background colour is visible through the gas for all recording set-ups. Furthermore, for the three-phase gas-oil-water flows recorded in setup 1, 3 and 4, the watercut is relatively small. Hence, the water does not form a separate liquid layer and the liquid phase appears in the video observations as a homogeneous oil-water mixture with similar colour as the oil. In Tables 3 and 6, the video recording set-up as well as the recorded time (length of the videos) are given for the considered experiments of MutliFlowMet I and II.

Pre-and post-processing
The U-net, described in Section 2.1, is used to segment liquid and gas regions in parts of the video frames. However, before the U-net is applied, the video frames need to be prepared in several pre-processing steps. Furthermore, to extract a time series from the gasliquid segmentation maps of the U-net, additional post-processing steps are necessary. In Fig. 4, the complete processing pipeline of the liquid level extraction from the video observations is illustrated. The first step is to extract a vertical line (pixelcolumn) through the pipe at a fixed -position for every frame, i.e., time step, and stack it. From this procedure, an RGB-pixelcolumn over time is obtained, which represents the phase distribution along the observed vertical line through the pipe and its temporal changes at a fixed -position. Because of this, the gas-liquid interface visible in this image is associated with the liquid level time series with respect to the inner pipe diameter. Under this construction, the frame rate of the video represents the sample rate of the time series. Therefore, the RGB-pixelcolumn over time provides the basis for further calculations. In the second step, the RGB-pixelcolumn over time is interpolated to a uniform height ( -component) of 128 px and cut into segments with a length ( -component) of 1024 px, to meet the input dimension criterion for the chosen U-net architecture. In the third step, the evenly sized segments are normalized to reduce the influence of disturbances in the video recording set-up, such as differences in luminance or colour. For this normalization, the z-score (also called statistical standardization or standard score, see Larsen and Marx (2012)) is applied RGB-component wise. This step completes the pre-processing and the standardized image segments are passed to the U-net to perform the segmentation. The output of the U-net is a continuous segmentation map with values in [0, 1], where 1 (white) indicates gas and 0 (black) indicates liquid. In the first post-processing step, the continuous segmentation maps are binarized (with a threshold of 0.5) to obtain a sharp gas-liquid interface. Afterwards, the segments are concatenated in correct order to get a segmentation map for the complete RGB-pixelcolumn over time. In the last step, the vertical position of the gas-liquid interface (edge between black and white regions) is detected in the binarized and concatenated segmentation map. In case of multiple vertical interface positions at one time step, such as for bubbles or droplets, the values are averaged to get a unique representation of the interface over time. This ensures the property of a mathematically well-defined function for the extracted liquid level time series in case of multiple vertical interface positions such that it can be used for further time series analyses. It should be noted that this averaging can lead to a misrepresentation of certain flow structures in the liquid level time series. With this procedure the liquid level time series are obtained from the video recordings of multiphase flows. The code for the extraction of the liquid level time series from the RGBpixelcolumns over time is available as Jupyter Notebook in Olbrich et al. (2021c).

Training and testing
The U-net was trained and tested with data from horizontal gasliquid slug flows. For this, the video data of 18 different slug flows were used. That includes data from 9 experiments of the MultiflowMet I project and 9 experiments of the MultiflowMet II project, specifically Nr. 1 − 6, as well as Nr. 9 − 11 for all inflow lengths, see Table 3. For the testing in the optimization process of the training procedure, a subset of the data is needed, which is disjoint to the training data. Here, the randomly chosen two experiments Nr. 5 -100 and Nr. 11 -300 are used for testing and the remaining 16 experiments are used for training. Please note that the naming of the individual experiments are given in the form of (Nr.inflow ).
To prepare the video data for the training of the U-net, the RGBpixel-columns over time are extracted and normalized as described in Section 2.3 and Step 1 − 4 in Fig. 4. Furthermore, binary segmentation masks are needed as reference in the training and testing process, which represent a correct classification into gas and liquid. These masks were generated from hand-labelled gas-liquid interfaces in the RGBpixelcolumns over time for all experiments. They have a sharp interface with values of 1 for gas and 0 for liquid. Since they are extracted from the RGB-pixelcolumns over time, the masks represent an approximation of the temporally resolved gas volume fraction fields in a vertical line through the pipe at a fixed position. In the training and testing of the U-net, the masks are compared with the predicted segmentation maps to determine an accuracy for the prediction. For this, the masks are also transformed into evenly sized segments of 128 × 1024 px (see Step 2 in Fig. 4). This results in 483 pairs of evenly sized RGB-segments and corresponding mask segments for the training set as well as 61 of such pairs for the test set.
Since the segmentation includes only 2 classes, i.e., gas and liquid, the binary accuracy function was chosen as accuracy metric, see Eq.
(2) and Chollet et al. (2015). Furthermore, for the training process the stochastic gradient descent optimization method adaptive moment estimation (ADAM) was set together with the binary cross-entropy-loss function and a dropout of 5%. Details can be found in Ronneberger et al. (2015), Sterbak (2018), Chollet et al. (2015) and Kingma and Ba (2017). For the training, the dropout-layer is located after every max-pooling operation in the contracting part of the U-net and after every concatenation-operation in the expansive part of the U-net, see Fig. 2. Furthermore, a mini batch size of 32 was used to train the Unet over a maximum number of 50 epochs. Early stopping (Prechelt, 1998) was applied. For details, see Sterbak (2018) and Chollet et al. (2015). The model was trained and tested using python version 3.7, tensorflow version 2.3 and Keras version 2.4.3, see Chollet et al. (2015) and Abadi et al. (2015). The implementation of the model, the code for the training of the model as Jupyter Notebook, and the weights of the trained model are available in Olbrich et al. (2021c).

Accuracy and error metrics
In this section, the metrics are given, which are used to evaluate the segmentation maps from the U-net as well as the extracted liquid level time series.
The binary accuracy of a predicted segmentation map pred and a corresponding mask mask is given by where and denote the number of pixels of the mask in -direction and -direction, respectively, and the pixelwise binary evaluation function is given by Then, indicates if a pixel in the predicted segmentation map was successfully classified as gas or liquid. This function is preimplemented in the open source software library Keras, see Chollet et al. (2015). The deviation of the liquid level times series, which are extracted from the hand-labelled segmentation map ℎ mask and the U-net output ℎ pred can be measured in terms of the mean absolute error (MAE), given

Results
In this section, the results of the liquid level extraction with the deep learning model are presented for the training and testing procedure, as well as for additional evaluations on different data sets. The results include the accuracy of the predicted segmentation maps with respect to the hand-labelled masks and the mean absolute error of the extracted liquid level time series. Furthermore, the consistency of the hand-labelled data and the prediction is investigated in an interobserver reliability test. Moreover, the limitations of the proposed image processing technique are demonstrated and discussed.

Training and testing
The U-net was trained on a set of 483 pairs and tested on a set of 61 pairs of RGB-images from horizontal slug flow and corresponding masks, as described in Sections 2.3 and 2.4. The training was terminated after 42 epochs due to early stopping. The best model was found after 31 epochs. For this model, a mean binary accuracy of 97.91% for the training set and a mean binary accuracy of 97.74% for the test set, was achieved. These accuracies are high compared to the reported training and test accuracies in between 89% and 98.7% of other successfully trained deep-learning based gas and liquid segmentation models, see e.g. Cerqueira and Paladino (2021), Yu et al. (2021) and Ahmad et al. (2022).
In Fig. 5, the RGB-pixelcolumn over time from Experiment Nr. 10 -100 , the prediction of the U-net, its binarization, the corresponding hand-labelled mask, as well as the extracted liquid level time series are given. Please note that, Experiment Nr. 10 -100 belongs to the training set. The prediction of the U-net and its binarization show good agreement with the mask. Furthermore, the gas and liquid regions are segmented in more detail in the prediction, compared to the handlabelled mask, as can be seen for instance at the slug between 52 s and 52.5 s in Fig. 5. Here, the hand-labelled mask shows one larger slug, but the prediction shows two slugs, which are separated by a short gas bubble and foam. Due to the foamy areas between the rear of the first slug and the front of the second slug, this (optical) separation is not obvious. Nevertheless, the separation of the two slugs can be verified in the RGB-pixelcolumn over time, see Fig. 5a. Hence, in this case, the prediction from the trained model is more consistent and detailed compared to the hand-labelled mask on which it was trained.
In Table 3, the binary accuracy (see Eq. (2)) of the predictions and the masks in full length, as well as the mean absolute error (see Eq. (4)) of the extracted liquid level time series from prediction and mask are given for all experiments used for training and testing. The binary accuracy of the segmentation maps varies from 96.86% to 98.85% and the mean absolute error of the liquid level time series varies from 1.15% to 3.12%. From these high accuracies and low errors for the training and testing, it can be concluded, that the model performs well for the considered types of data. Furthermore, since the net accurately predicts the segmentation maps also for the two test sets that were not used in training, it is able to generalize to unseen data from both experimental set-ups, respectively.

Table 3
The number of the experiment, the recording set-up, the recorded time (length of the video), the belonging to training or test set, the binary accuracy bin ( pred , mask ) (see Eq. (2)) of the predicted segmentation map and corresponding mask as well as the mean absolute error (ℎ pred , ℎ mask ) (see Eq. (4)

Inter-observer test
In this section, the inter-observer reliability is considered to evaluate the consistency of the hand-labelled data sets, which were used for the training of the U-net. For this, three independent observers have labelled the gas-liquid interface for the first 60 s of four chosen experiments from the training and test set, namely Nr. 5 -100 , Nr. 6 -100 , Nr. 9 -100 and Nr. 10 -300 (see Table 3). Please note M. Olbrich et al. that, the U-net was trained and tested with labels from Observer 1. The inter-observer or inter-rater reliability (IRR) is defined as the degree of relationship between the labels of different observers that are operating independently (Kottner and Dassen, 2008;Tinsley and Weiss, 1975).
Here, the IRR for the hand-labelled time series is quantified by the normalized cross-correlation coefficient. In the IRR-context, this coefficient is also referred to as Pearson's r or Pearson-correlation, see Kottner and Dassen (2008) and Berman (2016). For every experiment, the labelled data sets of the different observers show a strong correlation to each other with values in between 0.88 and 0.985, see Table 4. This indicates a high degree of relation between the hand-labelled liquid level time series of the different observers. Hence, the hand-labelled liquid level time series show consistency and reliability among the different observers and are therefore suited for the training of the U-net. This correlation was calculated using the function pearsonr of python's SciPy module (Virtanen et al., 2020). Please note that, it returned -values for the null-hypothesis significance testing of less then 0.001 for all cases. Furthermore, this analysis is applied to the prediction of the U-net to quantify the degree of relation between the prediction and the labels of the independent observers, see Xiao et al. (2017).
As given in Table 5, the Pearson-correlation values for the pairwise comparisons of the prediction and the labels of the different observers reach from 0.901 to 0.993 with -values of less then 0.001 for all cases. This shows a strong correlation in a similar range as for the observers (see Table 4), and therefore, indicates a consistency between the predictions of the U-net and the labels of the independent observers.
In addition to the Pearson-correlation, the pointwise errors between the hand-labelled time series of the different observers as well as between the labels and the prediction of the U-net are considered. For a quantification of the error between the observers, the three time series of the pointwise errors between the different observers |ℎ obs ( ) − ℎ obs ( )| =1,…, for ≠ ∈ {1, 2, 3} are ensembleaveraged (Walburn et al., 1983) to obtain one time series of the average pointwise error in between the observer for every experiment, i.e., The same is done for the comparison of ℎ pred and ℎ obs with ∈ {1, 2, 3}.
These ensemble-averages are shown as boxplots on the right side of Fig. 6. Here, the boxplots represent the distribution of these errors. It can be seen that, the statistical quantities (mean, median) and ranges (interquartile range and range between 5th and 95th percentile) of the errors are smaller for ⟨|ℎ pred − ℎ obs |⟩ compared to ⟨|ℎ obs − ℎ obs |⟩ for all experiments. On the left side of Fig. 6, the range in between the minima and maxima of the liquid level time series for every time point, labelled by the three observers, are given for the interval of 5 s. This range represents a tolerance in the observation of the gas-liquid Table 6 The number of the experiment, the recording set-up, the recorded time (length of the video), the binary accuracy bin ( pred , mask ) (see Eq. (2)) of the predicted segmentation map and corresponding mask as well as the mean absolute error (ℎ pred , ℎ mask ) (see Eq. (4)) of the liquid level time series, extracted from prediction ℎ pred and mask ℎ mask for the three additional independent evaluations. interface or the liquid level. In addition to this, the liquid level time series from the prediction of the U-net is superimposed. Here, it can be seen that, the predicted time series is near or in the tolerance range. Altogether, the hand-labelled parameters of the different observers have a strong correlation as well as low pointwise errors between each other. Hence, this parameter shows consistency in between the observers and is therefore a reliable parameter for the training of the U-net. Furthermore, the comparison of the predicted liquid level time series with the observers show not only similarly strong correlation values, but also smaller statistical quantities of the considered pointwise errors. Hence, the predictions of the U-net also provide liquid level time series, which are consistent with respect to the different observers.

Further evaluations on different data sets
For further evaluations on the reliability and versatility of the trained model, data from three additional experiments are considered, which differ from the ones used for training and testing.
The model was trained and tested for Paraflex oil-nitrogen slug flows with black or blue background (recording set-up 1 and 3), see Fig. 3 and Section 2.4. In contrast to this, the flows considered in this section are either recorded in a different set-up with different fluids, or for a different flow pattern. They inlcude the brine water-nitrogen slug flow experiment Nr. 7 -100 from recording set-up 2 with grey liquid colour and dark blue background, the Exxsol oil-nitrogen slug flow experiment Nr. 12 -100 from recording set-up 4 with an ochergreen liquid colour, and the stratified wavy Paraflex oil-nitrogen flow experiment Nr. 13 -100 from recording set-up 3. Please note that, also the lighting conditions and occurring reflections differ from the training and test set, e.g., reflections in the back of the pipe for Experiment Nr. 7 -100 and white colour on top of the slugs for Experiment Nr. 12 -100 , see Fig. 7. These differences are causing changes in contrast and RGB-intensity values. Together with the different flow pattern, this leads to a change in conditions for the model, compared to the training and testing data.
In Table 6, the binary accuracy of the prediction and the corresponding hand-labelled masks as well as the mean absolute error of the extracted liquid level time series are given. For the Brine waternitrogen slug flow of Experiment Nr. 7 -100 , the binary accuracy of 96.99% and the error value of 3% are in the same ranges as for the training and testing that are [96.86%, 98.85%] and [1.15%, 3.12%], respectively. The same holds for the stratified wavy flow of Experiment Nr. 13 -100 , with a binary accuracy of 97.68% and an error of 2.32%. The prediction for the slug flow experiment Nr. 12 -100 did not achieve such high accuracy as the other experiments. Nevertheless, with a binary accuracy of 95.41% and an error of 4.92%, it is still close to the other values and a reasonable result.
In Fig. 7, the RGB-pixelcolumn over time with the extracted liquid levels from the prediction and the hand-labelled mask are given for a time interval of 5 s for experiment Nr. 7 -100 , Nr. 12 -100 and Nr. 13 -100 . As it can be seen in Fig. 7(i), the liquid level from the prediction for the brine water -nitrogen slug flow experiment Nr. 7 -100 shows a frequent underestimation in foamy areas in between shorter slugs. Fig. 6. Inter-observer comparison with prediction for the slug flow experiments Nr. 5 -100 , Nr. 6 -100 , Nr. 9 -100 and Nr. 10 -300 . Left: range of hand-labelled liquid level time series by the three independent observers and superimposed prediction for 5 s. Right: Boxplots of the ensemble-averaged point wise differences between the observers and between prediction and observers. Please note, the whiskers in the boxplots represents the 5th and 95th percentile, the coloured box represents the interquartile range (Q3-Q1), the horizontal line in the box represents the median, and the black cross represents the mean. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

M. Olbrich et al.
For the Exxsol oil -nitrogen slug flow experiment Nr. 12 -100 in Fig. 7(ii), the prediction shows differences for slugs that are close to each other, see for instance the slugs at [50 s, 50.5 s] and [52 s, 52.5 s]. Furthermore, for this flow, the slug rears are predicted later compared to the mask. One reason for this is the liquid film that flows down on the inner walls of the pipe after a slug passed by. This also causes the top of the slugs to appear smeared out in the RGB-pixelcolumn over time and leads to the differences in between prediction and mask. Apart from the aforementioned deviations, the liquid level time series from the prediction and mask are in good agreement for all three flows.
In addition to this evaluation, the two previously unseen slug flow experiments Nr. 7 -100 and Nr. 12 -100 are further validated in the same manner as in the inter-observer test in Section 3.2. Therefore, the three independent observers have labelled the gas-liquid interface of these two slug flows. For the validation, the Pearsoncorrelation between the different observers, as well as between the prediction and the observers, are considered, see Tables 7 and 8. Furthermore, the ensemble-averaged pointwise errors ⟨|ℎ pred − ℎ obs |⟩ and ⟨|ℎ obs − ℎ obs |⟩ are considered for this evaluation, see Fig. 8. For the two additional experiments, the hand-labelled data sets of the different observers show a strong correlation to each other with values in between 0.922 and 0.974 and -values of less then 0.001, see Table 7 The Pearson-correlation values for the pairwise comparison of the different labels.

Experiment
(ℎ obs 1 , ℎ obs 2 ) (ℎ obs 1 , ℎ obs 3 ) (ℎ obs 2 , ℎ obs 3 )  Table 7. These values are similar to the ones obtained for the interobserver test in Section 3.2 and indicate a high degree of relation between the hand-labelled liquid level time series of the different observers. The Pearson-correlation values for the pairwise comparisons of the prediction and the labels of the different observers reach from 0.868 to 0.967 with -values of less then 0.001, see Table 8. This also shows a strong correlation and, therefore, indicates a consistency between the predictions of the U-net and the labels of the independent observers. As mentioned in the discussion of Fig. 7, the predictions of the liquid level time series for Experiment Nr. 7 -100 and Nr. 12 -100 show some systematic deviations from the hand-labelled data. For the previously unseen nitrogen-water slug flow Experiment Nr. 7 -100 , the prediction of the liquid level at the aerated liquid film region behind the slugs is often lower compared to the hand-labelled data. Moreover, for the previously unseen nitrogen-oil-water slug flow Experiment  Table 8 The Pearson-correlation values for the pairwise comparison of the prediction with the different labels. Nr. 12 -100 , the slug rears often appear later in the prediction compared to the hand-labelled data. This is also shown in Fig. 8. This behaviour leads to slightly lower (but still high) Pearson-correlation values between the prediction and the observers for Experiment Nr. 12 -100 compared to Experiment Nr. 7 -100 . Furthermore, this also leads to a slightly larger variation in the ensemble-averaged pointwise errors ⟨|ℎ pred − ℎ obs |⟩, compared to ⟨|ℎ obs − ℎ obs |⟩ for both experiments, see Fig. 8. Nevertheless, the liquid level predictions of both experiments are near or in the tolerance range of the observers, see left side of Fig. 8. Moreover, the statistical parameters of the ensembleaveraged pointwise errors between the predictions and the observers ⟨|ℎ pred − ℎ obs |⟩ are very similar to the ones between the different observers ⟨|ℎ obs − ℎ obs |⟩, see right side of Fig. 8. Hence, for the two difficult unseen slug flow experiments Nr. 7 -100 and Nr. 12 -100 , a strong consistency between the prediction and the observers can be concluded.
In the further, slug flow characteristics are calculated from the extracted liquid level time series for Experiment Nr. 12 -100 . The characteristics considered in this paper are the slug unit times , the slug body times , as well as their mean values, the mean slug frequencȳ, the mean slug unit length̄, the mean slug body length , as well as the mean translational velocity of the slugs̄s lug , see Fig. 1 and Eq. (1). Considering these slug characteristics that can be obtained from the predicted time series exemplifies the physical insights provided by the results of the proposed image processing technique. It also allows further validation of the predicted time series. In that regard, it is shown that the predicted liquid level time series provide reasonable slug characteristics, also for the unseen data of Experiment Nr. 12 -100 with the lowest accuracy and largest error values of all considered cases ( bin ( pred , mask ) = 95.41%, (ℎ pred , ℎ mask ) = 4.92%), see Tables 3 and 6. In Fig. 9a, the predicted and hand-labelled liquid level time series of Experiment Nr. 12 -100 are given for a time interval of 5 s, similar to Fig. 7(ii). For the calculation of the slug unit times and slug body times , a threshold value of 0.95 has been set to detect the slug fronts and slug rears in the time series, as illustrated in Fig. 9a. Please note that thresholding is the conventional procedure for this task, see Zhao et al. (2015), Baba et al. (2018) and Schmelter et al. (2021a), and therefore also applied in this investigation. Generally, the choice of the threshold values for the detection of slugs in the time series is not obvious and needs to be chosen individually for every time series. It should not be too high, otherwise larger slugs are separated by their entrained gas bubbles. However, it should also be chosen high enough to avoid the miscounting of large amplitude waves . Furthermore, the histograms and probability density functions (pdf's) of the calculated slug unit times and the slug body times are given in Fig. 9b and c for both, the hand-labelled and predicted  which is about 2.4 Hz for both time series, see also Table 9. In addition, the mean translational velocity of the slugs is calculated for the hand-labelled and predicted liquid level time series. This was achieved by a cross-correlation analysis, which is typically used to calculate the mean translational velocity and approximate the length scales and of the slugs from hold-up and liquid level time series, see Baba et al. (2018), Viggiano et al. (2018) and Olbrich et al. (2021a). For this, the proposed image processing technique is applied to extract the liquid level time series at two different positions 1 and 2 along the pipe in the video with a distance of 2.2 inner diameters , as illustrated in Fig. 10. For the comparison, also hand-labelled liquid level time series have been considered for these positions. Then, the time lags of the time series from position 1 and 2 are calculated by using the cross-correlation coefficient for the time series, resulting in 8 time steps (0.033 s) for the prediction and 7 time steps (0.029 s) for the hand-labelled time series with a sample rate of 240 Hz. Then, the mean translational velocity of the slugs̄s lug is obtained by dividing the distance between 1 and 2 (0.146 m) by the calculated time lag (0.033 s for the prediction and 0.029 s for the hand-labelled time series), resulting in̄s lug = 4.390 m s −1 for the prediction and̄s lug = 5.017 m s −1 for the hand-labelled time series, see also Table 9.
Moreover, the mean slug body length and the mean slug unit length are approximated by multiplying the corresponding mean time scales̄and̄with the mean translational slug velocitȳs lug . The approximated length scales for both, the prediction and the handlabelled time series can also be found in Table 9. Altogether, the slug flow characteristics obtained by the predicted liquid level time series are in good agreement with the ones from the hand-labelled time series. This holds in particular for the temporal scales (̄,̄,̄). For the spatial scales (̄and̄), on the other hand, slight deviations between the predicted and the hand-labelled data can be observed due to the one time step difference in the lag detected in the cross-correlation procedure. Hence, this analysis also gives insight into how much error propagation plays a role if parameters are considered that are not directly determined but calculated from other derived quantities.
The characterization of slug flow with length and time scales as well as frequency spectra of the complete liquid level time series, obtained from the results of the proposed deep learning based image processing technique have already been used in the investigation of flow experiments and validation of numerical simulations, see Schmelter et al. (2021b). Altogether, it can be concluded that the trained model can handle different types of data and provides reliable results for its specific task. Furthermore the proposed image processing technique provides accurate liquid level time series that allow a detailed characterization of the flow.

Limitations
The visual recognition of gas and liquid regions or the gas-liquid interface in the video data constitutes a major limitation of the successful extraction of the liquid level time series with the proposed deep learning based image processing technique. For flows, where the interface cannot be observed from the side, a meaningful gas-liquid segmentation cannot be provided by the trained model. This is for instance the case for dispersed or annular flow patterns as well as for flows with large amounts of liquid spray, e.g., for slug or wavy flows in the transition to a dispersed or annular flow pattern. Furthermore, for foamy/bubbly regions in the flow, the segmentation often includes foam into the liquid phase, leading to overestimated absolute values for the liquid level time series. This overestimation was investigated in detail in Olbrich et al. (2021b), where hand-labelled liquid level time series of slug flow have been compared to reference parameters of a conventional tomography measurement system.
To demonstrate the limitations of a meaningful liquid level extraction with the proposed image processing technique, the method is applied to four instances of slug flow with larger amounts of dispersed phenomena, such as foam/bubbles or spray/mist. These are Experiment Nr. 8 and 14 with high liquid and low gas flow rates leading to a high liquid level in the pipe with large amounts of short slugs and foam/ bubbles (see Tables. 2, 10 and Figs. 11i, 11iii) as well as Experiment

Table 10
The number of the experiment, the recording set-up, the recorded time (length of the video), the superficial liquid and gas velocities liquid , gas (see also Nr. 15 and 16 with high gas and low liquid flow rates leading to a low liquid level with fewer shorter slugs and large amounts of spray/mist (see Tables. 2, 10 and Figs. 11ii,11iv). Due to the dispersed phenomena, the gas-liquid interface becomes (at least partially) unrecognizable for the observer in the considered image data. Thus, a meaningful hand-labelled segmentation map as ground truth for a validation of the prediction is not obtainable. However, the predictions are presented with the RGB-pixelcolumns over time in order to give an impression about the limitations of the proposed deep-learning based image processing technique.
For Experiment Nr. 8 and 15, the binarized predictions show artefacts caused by the foam/bubbles or the spray/mist, see for instance the segmented bubble in the foam/bubbles at [51.5 s, 52 s] in Fig. 11i(b) as well as the segmented lump of liquid in the spray/mist at 53 s in Fig. 11ii(b). These artefacts lead to deviations for the extraction of the liquid level time series. Despite the fact that a detailed validation with hand-labelled segmentation masks cannot be made, the extracted liquid level time series show the dominant liquid structures in the flow and can therefore be used for a quantification of the slugs in these two cases. For Experiment Nr. 14 and 16, on the other hand, with more dispersed phenomena compared to Experiment Nr. 8 and 15, the binarized predictions show larger areas of segmented liquid compared to what can visually be observed in the corresponding RGBpixelcolumns over time (see 11iii(a,b) and 11iv(a,b)). This includes foam/bubbles or spray/mist, which are identified as liquid in the gasliquid segmentation by the trained deep learning model, leading to impractical liquid level time series approximations.

Conclusions
In this paper, an image processing method based on a supervised deep learning model was presented, which extracts the liquid level time series from video recordings of liquid-gas flows in horizontal pipes. This method consists of a certain type of a deep convolutional neural network, called U-net, and several pre-and post-processing steps. The U-net was trained and tested with video data from horizontal oil-gas slug flows for the task of segmenting liquid and gas regions in the video frame data. For further evaluations of this model, additional independent video data were considered, which show different fluids, recording set-ups, and flow pattern. It was shown that, the trained model provides an accurate segmentation of oil and gas in the video data, even for previously unseen video recordings. In that regard, the model has proven to be versatile and is also applicable for other transparent gases. Furthermore, the extracted liquid level time series from the predicted segmentation maps show low errors. For the quantification of accuracy and error values, hand-labelled data was used as reference. The consistency between these hand-labelled data and the predictions of the U-net was shown in an inter-observer reliability test. Moreover, it was demonstrated how flow characteristics can be obtained from the results of the deep-learning based image processing technique.
Altogether, the presented method accurately extracts the liquid level time series from the considered video data. It can handle different types of data, even unseen data sets. Furthermore, it can overcome various noise effects, which are generally included in such image or video data. Once, the net is successfully trained, it predicts highly accurate segmentation maps in very short time. Prospectively, this method can provide parameters for the analysis and characterization of certain types of multiphase flows, in particular for the wavy or slug flow pattern, where temporal and spatial scales of the slugs, waves and bubbles, as well as their translational velocities can be derived from the extracted liquid level time series. The achieved characterization helps to assess and quantify problems for industrial operations that are induced by specific flow patterns. In addition, the proposed model has the potential for a segmentation of more complex three-phase flows, such as gasoil-water flows with separate phases, on condition that it is trained with such data. Moreover, the proposed deep-learning based image processing technique can be used for monitoring multiphase flows for operation control if a transparent viewing section can be installed. This includes academic investigations under laboratory conditions as well as industrial applications, e.g., transportation pipelines in the oil and gas industry as well as cooling systems in the nuclear energy sector. In that regard, the link between the extracted liquid level time series and controlling parameters, such as pressure, flow rates, and phase ratio can be subject to further research.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
The Dataset of Experiment Nr. 11 and the source code related to this article can be found at https://gitlab1.ptb.de/mfm2/liquid_level_ extraction_unet, an open-source online GitLab repository hosted at Physikalisch-Technische Bundesanstalt (PTB) (Olbrich et al., 2021c).