Deghosting dual‐component streamer data using demigration‐based supervised learning

Ghost reflections from the free surface distort the source signature and generate notches in the seismic amplitude spectrum. For this reason, removing ghost reflections is essential to improve the bandwidth and signal‐to‐noise ratio of seismic data. We have developed a novel approach that involves training a convolutional neural network to remove source and receiver ghosts from marine dual‐component data. High‐quality training data is essential for the network to produce accurate predictions on real data. We have used the demigration of a stacked depth‐migrated image to create training shot gathers. Demigrated pressure and vertical velocity data are used to train the network. We apply the trained network on real pressure and vertical velocity data with ghosts. The network's output may be either source deghosting and receiver deghosting, or both. We test our method on synthetic Marmousi and real North Sea data with dual‐component streamers. The method is compared with conventional dual‐component deghosting using the summation of pressure and vertical velocity. Results show that the method can accurately remove the ghosts with only minor errors in synthetic data. Based on a decimation test, the method is less affected by spatially aliased data than a conventional method, which could benefit data with high frequencies and/or large receiver or cable separations. On real data, the results show consistency with conventional deghosting, both within and outside the training area. This indicates that the method is a viable alternative to conventional methods on real data.


INTRODUCTION
A seismic ghost is a reflection of seismic waves from the sea surface.The sea surface pressure-wave reflection coefficient is close to −1, meaning that the ghost has the opposite polarity with almost the same amplitude as the up-going waves.Towed streamer acquisitions generate three ghosts, one from the source side, one from the receiver side, and finally, a ghost from both the source and receiver side, as shown in Figure 1.Because of the extra distance the ghosts travel, they arrive at the receiver with a slight time delay relative to the primary.The ghosts elongate and distort the seismic signal and its frequency spectrum.Both the pressure and particle velocity measurements contain ghosts.However, pressure and particle velocity data have the opposite receiver ghost polarity relative to the primary polarity (Figure 1b,c).This is due to the fact F I G U R E 1 (a) Illustration showing four different ray paths from a source to a receiver where the black, blue, red and green ray paths correspond to the primary, source ghost, receiver ghost and source-receiver ghost, respectively.(b) Measurement of pressure.(c) Measurement of particle velocity.(d) The amplitude spectrum illustrating notches for the pressure and velocity data.The receiver ghost arrives later and has a lower notch frequency than the source ghost because it is located deeper than the source.Source: Adopted from de Jonge, Vinje, Zhao et al. (2022).that pressure measurements do not differentiate between upand down-going wavefields arriving at the receiver, whereas particle velocity is a vector, measuring opposite polarities in up-and down-going wavefields.In the pressure amplitude spectrum, there are ghost notches (Figure 1d) at frequencies given by (Aytun, 1999) where   , Δ and  are the water velocity, source or receiver depth and incidence angle (positive downwards, Figure 1), respectively.The amplitude spectrum of the vertical particle velocity (V z ) has receiver notches (Figure 1d) at frequencies given by (Carlson et al., 2007) The receiver notch frequencies are different for hydrophone and particle velocity data because the primary reflection and its receiver ghost have opposite polarities in pressure data but the same polarities in particle velocity data (Figure 1b,c).However, on the source side, both pressure and particle veloc-ity have the same notch frequency described by Equation (1).This is because the primary reflection and its source ghost are recorded with opposite polarities in both pressure and particle velocity data.Increasing the source or receiver depth results in a larger time separation of the primary and ghost arrivals and, consequently, a lower frequency notch.Ghost notches are problematic because they attenuate some frequencies, which reduce the temporal resolution (Carlson et al., 2007;Hammond, 1962;Schneider et al., 1964).Usually, the ghosts are removed during processing to improve the bandwidth, resolution and signal-to-noise ratio.Deghosting is also beneficial for seismic inversion and geological interpretation (Song et al., 2015).
Historically, the ghost problem in seismic acquisition was solved by localizing the sources and receivers at shallow depths, usually between 5 and 9 m, to push the notch above the usable frequency range.More recently, other acquisition methods were proposed, such as slant streamers (Bearnth & Moore, 1989), variable-depth streamers (Soubaras & Dowle, 2010), over-under streamers (Hill et al., 2006) and multi-component streamers (Carlson et al., 2007).Slant and variable-depth streamers aim to attenuate the receiver ghost using the notch-diversity along the streamer.Over-under F I G U R E 2 Illustrates the deghosting using demigration-based supervised learning (DEGDEM) workflow with dual-component data used in this paper.Source: Adopted from de Jonge, Vinje, Zhao et al. (2022).
streamers have different ghost notches.Consequently, combining the data from these two streamers can help fill the receiver ghost notches.The dual-component streamer contains both hydrophones and geophones measuring pressure and V z .As shown in Figure 1, ghost notches in the hydrophone data correspond to ghost peaks in the vertical geophone data.This complementary amplitude behaviour makes these data well suited for receiver deghosting, which means instead of towing shallow and relying on hydrophone-only recordings, we can tow deeper and use vertical geophone energy to fill in the hydrophone notches.Towing deeper has the advantage of reducing swell noise (Tenghamn & Dhelie, 2009).In this paper, we will focus on receiver deghosting using dual-component (pressure and V z ) measurements.
A central problem with the dual-component streamers is that V z data can be noisy at low frequencies to the extent that it can be difficult to incorporate them in conventional deghosting approaches (Mellier & Tellier, 2018;Peng et al., 2014;Poole & Cooper, 2018).Therefore, the low-frequency deghosting relies heavily on the hydrophone data.
For dual-component data, the conventional approach is to sum the pressure and V z data, after an obliquity correction on the V z data to correct for the fact that it records particle velocity only in the vertical direction (Tenghamn & Dhelie, 2009).As mentioned above, low-frequency noise on V z data can contaminate this sum, so normally, only pressure data is used in the deghosting of the lowest frequencies.As opposed to F I G U R E 3 Pressure and V z with ghosts are the input of the network.Pressure without ghosts is the output.The demigrated pressure data without ghosts is used to calculate the deghosting error and update the weight of the network using backpropagation.
the pressure and V z (P-V z ) sum, a hydrophone-only deghosting requires knowledge of the water-surface reflectivity and source and receiver locations.Note, however, that some methods still use pressure and V z data also for the low frequencies but avoid contamination of V z noise, such as Telling and Grion (2022), who used an inversion-based approach with a hybrid operator in frequency and space, and Poole and Cooper (2018), who used an inversion-based approach in the  −  domain.However, like many multi-component deghosting methods, both approaches use a weighting scheme to reduce the influence of the V z data for lower frequencies.
This paper aims to expand the DEGDEM method to also include V z data during training and prediction in contrast to the paper by de Jonge, Vinje, Zhao et al. (2022).Most importantly, we want to improve deghosting at low frequencies exhibiting substantial V z noise.However, this requires training data with both pressure and V z generated by demigration.Below we describe how we produce V z data using demigration and train the network using pressure and V z data.Dual-component DEGDEM is tested on synthetic Marmousi data and real data from the Northern North Sea.We compare our results with conventional dual-component deghosting as used in the seismic industry.In this paper, the term 'con- F I G U R E 5 Illustration of how the source, receiver and mirror positions are used to create four different datasets.Equations ( 9) or ( 10) and these four datasets can be used to create data with source and receiver ghosts.Source: Adopted from de Jonge, Vinje, Zhao et al. (2022).ventional deghosting' refers to  −  domain inversion-based pressure-only deghosting at low frequencies and P-V z sum at high frequencies.

Convolutional neural networks
This paper uses a convolutional neural network (CNN) with a U-net structure (Ronneberger et al., 2015).A CNN uses F I G U R E 6 Common-channel gathers showing (a) demigration of pressure data with ghosts using Equation ( 9) and (b) demigration of V z data with ghosts using Equation (10).A zoom is used in both figures focusing on a single event where arrows highlight the primary (black), source ghost (red), receiver ghost (green) and source-receiver ghost (white).The V z data is scaled with −  for illustration purposes.
convolutions in contrast to a dense neural network that uses matrix multiplications.As a result, the CNN has sparse interactions between neurons in the network that make the network more efficient.The kernel size determines the number of interactions of neurons from one layer to the next.The U-net structure uses an encoder to down-sample the number of pixels and a decoder to up-sample the number of pixels.The U-net also contains skip connections that copy feature maps from one layer to another.Pooling operations down-sample the size of the feature maps by using some functions to summarize subregions (e.g.max.pooling and average pooling).Transposed convolution up-samples the size of the feature maps and is an operation that goes in the opposite direction of a standard convolution.Dumoulin and Visin (2018) showed examples of transposed convolution and pooling functions.In contrast to CNNs without down-sampling or up-sampling, the U-net structure is beneficial because of the increased receptive field due to the down-and up-sampling (Lucas et al., 2018).In addition, down-and up-sampling makes it more efficient.
A few other papers have also used the U-net structure for pressure-only deghosting with high-quality results (de Jonge, Vinje, Zhao, et al., 2022;Peng et al., 2021;Vrolijk & Blacquière, 2021).In our case, we are confident that a U-net structure will also give good results on dual-component data.The exact structure used in our paper is also shown in the paper by de Jonge, Vinje, Poole et al. (2022) and de Jonge, Vinje, Zhao et al. (2022), which is similar to the original Unet by Ronneberger et al. (2015) but with more down-and up-sampling layers.That said, the goal of this paper is not to find the optimal network structure but to develop a new CNN based training method for deghosting dual-component data.

Deghosting using demigration-based supervised learning workflow
This research is an extension of the work done by de Jonge, Vinje, Zhao et al. (2022) on pressure-only deghosting.In the following section, we review the basic features of that method.The approach is called deghosting using demigration-based supervised learning (DEGDEM) and involves generating synthetic shot gathers using Kirchhoff demigration (Santos et al., 2000a) from a seismic image.We create the seismic image in our workflow by migrating seismic data, usually with pre-stack depth migration (PSDM).Kirchhoff migration and demigration are reverse processes (Santos et al., 2000b).As a result, the demigrated data will resemble the seismic data before migration.This is described in more detail in the 'Modelling demigrated dual-component data' section.
Using demigrated data to train the network has two main advantages first, the training data resembles the recorded seismic data, and second, it is possible to generate training data with no ghosts.The workflow is shown in Figure 2 and is a modification of the workflow shown in de Jonge, Vinje, Zhao et al. (2022) as it includes dual-component data.After recording the data (Figure 2 R1), it usually goes through a standard processing flow with multiple steps.These processing steps include denoising, deblending, debubbling, designature, deghosting and demultiple, followed by binning, interpolation, regularization, migration and stacking to produce a PSDM image.We use the PSDM image and a smooth PSDM velocity model for the demigration (Figure 2 M1, M2 and M3) to create shot gathers containing pressure (Figure 2 S2 and S3) and vertical particle velocity (V z ) (Figure 2 S1) data with and without ghosts.We train a CNN using these shot gathers.Figure 3 illustrates how we train the network.Pressure and V z data with ghosts (Figure 3 T1 and T2) are the input to the network.Pressure data without ghosts is the desired output of the network.During training, demigrated pressure data without ghosts (Figure 3 T3) is used to update the weights of the network using back-propagation.Thousands to tens of thousands of shot gathers are used to train the network.This trained network is later used on real pressure and V z data with ghosts to predict pressure data without ghosts.
The shot and receiver positions from the real marine acquisition were used when creating the training data.It is possible to apply the network (to predict deghosted data) to data from the same area from which the training data was created.Alternatively, it is possible to use a small portion of an area to create training data and apply the network to the entire area.de Jonge, Vinje, Zhao et al. (2022) showed on synthetic and real data that the deghosting quality is still high even if the training and prediction areas are not the same.They also showed that the deghosting required to create the PSDM image for demigration does not have to be optimal for the network to perform well.In addition, DEGDEM is quite robust to errors in sea-surface reflection coefficients and swell waves.In theory, it is possible to train a neural network on data from one survey geometry and apply it to another, but de Jonge, Vinje, Zhao et al. (2022) did not explore this.It is believed that similar geology is an advantage when applying the network to another area.

Modelling demigrated dual-component data
Kirchhoff demigration was used to produce pressure data for training.Kirchhoff demigration can produce synthesized seismograms shown by Santos et al. (2000a).True-amplitude Kirchhoff demigration and true-amplitude Kirchhoff migration are considered reverse processes (Santos et al., 2000b).The Kirchhoff constant-offset migration integral for any migration domain point (, ) can be represented by (Bleistein, 1987;Santos et al., 2000a) (3) where IM(, ) is the migrated data,  M (, , ) is the trueamplitude migration weight function, and (, ) is a trace in the data domain for a given surface location  = (  ,   ).The two-way traveltime (, , ), shown in Figure 4, is calculated by ray tracing from constant-offset sources () and receivers () down to a fixed migration domain point (, ).Equation (3) integrates traces along (, , ) in the data domain.
Figure 4 shows how we integrate data in the data domain and map it to a point (, ) in the migration domain (Bleistein et al., 2001;Schleicher et al., 2007).Kirchhoff demigration involves a weighted mapping process from the depth migrated image to the seismic data.The Kirchhoff demigration integral is (adopted from Santos et al., 2000a;Tygel et al., 1996) as follows: (4) where  (, ) is the demigrated data,  D (, , ) is the trueamplitude data weight function, IM(, ) is data in the migrated domain, and  (, , ) is the depth of the lateral position x in the data domain corresponding to the time point  of the trace in location  in the data domain.For a constant ,  (, , ) forms an isochron surface in the depth domain as shown in Figure 4.
We now turn to the problem of demigration to V z data.We have not seen anyone attempting to do this in the literature.As mentioned above, the demigration in Equation ( 4) integrates the migrated data along isochron surfaces and maps to a point (, ) in the data domain.The output of Equation ( 4) is pressure data as the input data to migration, shown in Equation ( 3), is pressure data and these are reverse functions.This means that each of the traces in the integrand in Equation ( 4) is also pressure data.So we will need to scale the pressure data in the integrand appropriately before integrating to convert from pressure to V z .Tenghamn and Dhelie (2009) defined the obliquity filter as where  is the water density,  is the angular frequency, and   is the vertical wavenumber.The obliquity filter is used to scale V z data such that the up-going wavefields of pressure and V z have equal magnitude and the opposite polarity to the down-going wavefield.This filter is used before pressure, and V z data are summed together to remove the receiver ghost.More information about this filter is given in the next section.We want to use the inverse obliquity filter, 1∕ , to convert pressure to V z .The vertical wavenumber, k z , can be expressed as where  is the incidence angle at the receiver (Figure 4), and  W is the water velocity.As can be seen from Figure 4, the incidence angle  is  for a wave travelling vertically upwards.If we insert Equation ( 6) into the inverse of Equation ( 5), we obtain We include the inverse obliquity filter from Equation (4) into Equation ( 7) to have V z as the output of demigration: In the further derivation of the demigrations for machine learning described below, we use Equations ( 4) and ( 8) to generate demigrated pressure and V z data, respectively.
F I G U R E 9 Illustration of how real V z noise and clean demigrated V z data are combined to create demigrated noisy V z data.The amplitude spectrum shows that most of the added V z noise can be found below ∼30 Hz.
As we use an angle-muted and stacked PSDM image, we cannot get back exactly to the original seismic data before migration when we apply demigration.However, the demigrated training data looks realistic, which is beneficial when training the neural network.We also need a velocity model, which generally would be the same model used for migration.Figure 5 illustrates how we create four datasets, which can be combined to generate pressure data with ghosts using the following equation (de Jonge, Vinje, Zhao, et al., 2022): where   (  ,   ) is the demigrated data, shown in Equation (4), using the actual source and receiver positions,  SG  (  ,   ) is the data using the mirror source location,  RG  (  ,   ) is the data using the mirror receiver location, and  SRG  (  ,   ) is the data using both the mirror source and receiver locations.Here,   is the offset,   is the time, and  is the sea surface reflection coefficient, usually close to −1.The source and receiver mirror locations are the mirror positions above the sea surface shown in Figure 5.To produce the V z data with ghosts, we modify Equation ( 9) using V z demigrated data, shown in Equation ( 8) and reversing the polarity of the V z receiver ghost: We scale V z data with ghosts by   before using it as training and prediction data.This scale is used such that pressure and V z have the same amplitude range.Figure 6 shows the demigrated pressure and V z data in a common-channel gather.

Pressure and V z summation
The V z data with ghosts will have a different receiver and source-receiver ghost polarity to the pressure data.Conventionally, pressure and V z data are summed (P-V z sum) together to get the up-going pressure using the following equation (Tenghamn & Dhelie, 2009): where the obliquity or scaling filter,  , is given by Equation (5).We have chosen a polarity convention for the V z data such that the up-going wavefields of pressure and V z data have an opposite polarity (Figure 1) which is why we use a negative sign in Equation ( 11) to remove the down-going wavefield.The vertical wavenumber,   , in Equation ( 5), is estimated from the in-line wavenumber,   , using the following equation: The in-line wavenumber,   , is usually sampled by receivers every 12.5 m along the seismic streamer.An example of a normalized obliquity filter and a finitedifference-modelled shot gather from the Marmousi model in the frequency-wavenumber amplitude domain are shown in Figure 7.As we approach   = 0, meaning horizontal travelling waves, the obliquity filter amplitude tends to infinity.Therefore, it is necessary to mute the filter before the amplitude becomes too big.The V z data shown in Figure 7 becomes aliased as we go beyond the Nyquist wavenumber, and we observe wavenumber wrap-around due to spatial aliasing.As a result, scaling aliased data with the obliquity filter will not be correct.
It is also possible to create a 3D obliquity filter,  (,   ,   ).However, the cable separation is often large compared to the in-line receiver separation, which results in sparse data in the crossline direction.Too sparse data will lead to aliasing in the wavenumber domain in the crossline direction.
A problem with V z data is the low-frequency mechanical tow noise measured during acquisition (Carlson et al., 2007;Tenghamn & Dhelie, 2009).Figure 8 shows an example of V z noise on a shot gather.On raw data, the amplitude of the noise will not increase with time.However, as we have used a  2 gain correction, it appears to increase in this figure .A low-cut filter is often used (e.g. at 20 Hz) on the V z data to avoid propagating noise onto the P-V z sum.As a result, we might not use V z data below a specific frequency (e.g.20 Hz).Conventionally, we use a hydrophone-only deghosting method below a specific frequency.In this paper, we use a  −  inversion-based deghosting method by Poole (2013) for the low frequencies and P-V z sum for the highs.In this paper, this is called conventional deghosting.

Using V z noise in training data
In our training process, V z noise is essential because we expect it, or part of it, to be present in the real data.To find suitable V z noise for our training, we directly use part of the real V z data containing low-frequency V z noise but no signal.This signal-free data can be data at the end of the sail line where no source is fired.Alternatively, we can use data before the first arrival on the shot gathers.The low-frequency noise is then added to the V z training data, as shown in Figure 9.It is clear from Figure 9 that the amplitude of the noise decreases with increasing frequency and is almost absent above 30 Hz.

RESULTS
This section looks at both synthetic data and real data.The synthetic data will verify if the deghosting using demigrationbased supervised learning (DEGDEM) method works and understand the problems and advantages under controlled circumstances.Real data can show if the method gives good results on one example from the Northern Viking Graben (NVG) area in the North Sea.

Synthetic data
We used the P-wave velocities and densities of the elastic Marmousi2 model (Martin et al., 2006) and acoustic finitedifference (FD) to model input data.Figure 10 shows the survey configuration used.The shot and receiver spacings are 6.25 and 12.5 m, respectively.The offset to the first receiver is 147 m.The offset to the last receiver is 3884.5 m.The receiver number increases as we move away from the vessel.We modified the workflow in Figure 2 for our synthetic tests (shown in Figure 11).The FD-modelled data without ghosts was migrated for different offset classes.These pre-stack depth migration (PSDM) offset classes were then stacked together to create one PSDM image.This PSDM image, a smooth velocity model (shown in de Jonge, Vinje, Zhao, et al., 2022), and Kirchhoff demigration create shot gathers without ghosts and with ghosts for pressure (Figure 11 S2 and S3) and vertical particle velocity (V z ) (Figure 11 S1) data.We trained a convolutional neural network (CNN) with these shot gathers and use the CNN on FD data with ghosts (Figure 11 R1 and R2).As mentioned in the 'Methodology' section, the training V z data (Figure 11 S1) contained mechanical V z noise from real data.The FD V z data (Figure 11 R1) also contains real V z noise.However, we use V z noise extracted from different parts of  modelled data that contains pressure data and low-cut filtered V z data.The V z noise amplitude is similar to what we would expect on real data.Figure 12 shows the receiver deghosting results on a common-channel gather (receiver 10) using pressure and low-cut filtered V z data as input to the network (i.e.approach 2).Notice that we remove most of the V z noise with a low-cut filter at 20 Hz (Figure 12a,b).The DEGDEM error is negligible, shown in the amplitude spectrum in Figure 12.The receiver deghosting results using approaches 1 and 2 are indistinguishable when displayed as common-channel gathers.Therefore, we have not shown the DEGDEM results using V z data without a low-cut filter (i.e.approach 1) in Figure 12.
We also compare dual-component DEGDEM (using approach 1 or 2) with pressure-only DEGDEM and conventional dual-component deghosting.In the conventional approach, we combined the pressure and V z (P-V z ) sum and low-frequency deghosting at 33 Hz, which was the optimal combination.Figure 13 shows the results of the amplitude spectrum of the error.Using V z data with a low-cut filter as input to DEGDEM gives a better result than the 'raw' V z data.Dual-component DEGDEM improves the results significantly for all frequencies compared to pressure-only DEGDEM.Conventional deghosting is generally better than dual-component DEGDEM above 50 Hz and worse below 50 Hz, but both errors are small.The V z noise contamination is most likely the reason why DEGDEM has less error compared to P-V z sum below 50 Hz.In addition, conventional deghosting is better than pressure-only DEGDEM for all frequencies.
Figure 14S shows a shot gather with a 12.5 m receiver spacing, typical for many towed streamer datasets; the cor-responding frequency-wavenumber (FK) spectrum is given in Figure 14c.Even though many recording systems include either analogue or digital arrays to reduce spatial aliasing, strongly dipping energy may still be aliased at higher frequencies, evident in Figure 14c above 60 Hz.This may result in some inaccuracies when applying the obliquity filter in the FK domain.In the crossline direction, the cable separation is usually between 50 and 100 m.This large spacing, along with a lack of any receiver array, makes the crossline aliasing problem even more challenging.Figure 14b shows the data from Figure 14a after decimation to a 50 m spacing.Figure 14d shows the corresponding FK domain, where we can see the energy has now become aliased at 15 Hz, a frequency four times lower than the 12.5 m spacing case.The black lines in Figure 14c indicate the Nyquist wavenumber for 50 m cable separation.For this reason, the obliquity correction is challenging in the crossline direction and will be the focus of the following synthetic example.
In our first synthetic test, we used training and prediction of data with a 12.5 m receiver spacing (Figure 15b).As an experiment, we removed every fourth receiver such that the receiver separation was 50 m (Figure 15c).We then trained a network on this sparse data and applied it to sparse prediction data.The conventional deghosting method was applied to sparse data to see how it could cope with a significant amount of energy being aliased.Due to the severe aliasing, we used a simple P-V z sum without the obliquity filter for the conventional route.Figure 16 shows the results for DEGDEM and conventional deghosting in common-channel gathers for receivers 3 and 10.Data from receiver 3 contains mostly nonaliased energy, whereas data from receiver 10 contains more aliased energy.The results show that conventional deghosting works well for receiver 3 but struggles for receiver 10.However, DEGDEM performs well for both receivers 3 and 10. Figure 16 also shows the DEGDEM results using 12.5 m data.The difference between DEGDEM trained on 12.5 or 50 m receiver spacing data is small compared to the difference between DEGDEM and conventional deghosting on aliased data.Current practices may include dealiasing input data through the use of interpolation, applying the obliquity correction in a sparse transform domain, or making assumptions about the kinematic behaviour of events (e.g.hyperbolic moveout).It is beyond the scope of the paper to examine these options on real data.

Real data -Northern Viking Graben dataset
The real data used in this paper is from a survey in the NVG area in the Northern North Sea off the western coast of Norway.Between 2020 and 2022, 26,000 km 2 of East-West data was acquired to provide a dual-azimuth survey of NVG (CGG, 2022).The survey used a multi-component streamer with a constant depth of 18 m.Figure 17 shows the survey geometry.
Before the final processing flow of this data, a fast-track processing flow was completed to produce a PSDM image.In the fast track, several processing steps were expedited to allow a preliminary evaluation of a migrated image.Consequently, the data and processing steps (including deghosting) are not optimized as fully as in the final processing flow.The PSDM image and a smooth velocity model acquired from the fast track cover a smaller area than the whole acquisition area.We used the PSDM image to create pressure and V z data using demigration.We trained the neural network on pressure and low-cut filtered V z data and applied it to real pressure and low-cut filtered V z data before deghosting (Figure 18).We used the area to the left of the dashed red line in Figure 18 to create training and validation data.Next, we tested the trained network on all the data shown in Figure 18.The neural network input window size is 3.2 s.In addition, we compared our method with conventional dual-component deghosting used in the final processing flow as quality control.We combined source and receiver deghosting for both DEGDEM and conventional deghosting.The results are shown in different domains (common midpoint [CMP] stacks, common-channel gathers and shot gathers) for both the outer and central cables.Figures 19 and 20 show CMP stacks for the outer and central cables, respectively.Figures 21 and 22 show common-channel gathers.Shot gathers are shown in Figure 23.Finally, we show frequency panels of CMP stacks in Figure 24.Both DEGDEM and conventional deghosting show similar results.However, some differences at the lower frequencies (0-10 Hz), around the first receiver notch (∼42 Hz) and on the higher frequencies (above 70 Hz) are shown in the amplitude spectra.The frequency panels between 0 and 10 Hz (Figure 24a) also show noticeable differences which are reviewed further in the 'Discussion' section.
When we include the network training process, DEG-DEM is approximately 2.3 times slower in time using the same resources compared to conventional deghosting when deghosting the sail line used here.Both methods are running on 4 node GPU's.This includes creating demigrated training data, training the neural network and deghosting the data.However, if we use the already trained network to predict the ghost-free data, it is approximately 125 times faster in time when using the same resources than conventional deghosting which makes DEGDEM suited for large data volumes.This data.In addition, they added residual ghost noise to the prestack depth migration (PSDM) image to simulate the case of sub-optimal initial deghosting.Even though DEGDEM was trained on data from the PSDM image with a residual ghost, it improved the initial deghosting significantly.They tested the effects of swell waves and changing the sea-surface reflection coefficient used in the training and prediction data.The swell waves and sea-surface reflection coefficient tests indicated that DEGDEM was more robust than inversion/modellingbased conventional deghosting.The robustness of this method could be an advantage on real data where, for example, we could have swell waves or incomplete control of the receiver positions and the sea surface reflection coefficient.The results from de Jonge, Vinje, Zhao et al. (2022) should also be relevant to the dual-component DEGDEM method shown here.One drawback of DEGDEM is that we depend on a PSDM or a reflectivity image to create the training data.However, in a standard modern processing project, the workflow shown in Figure 2, from the 'raw' data to the PSDM image, is usually repeated iteratively to create ever better images.Therefore, a PSDM image should be available at an early stage in most processing projects.In addition, DEGDEM is not the only method dependent on a PSDM image.Some demultiple methods (e.g.Brittan et al., 2011;Martin et al., 2011) or some velocity-building methods (e.g.Chang et al., 1996) also require a PSDM image.It is also possible to use a PSDM image from another acquisition.Conventional modelling or inversion-based methods often require comprehensive testing.However, they are not dependent on a PSDM image and are not affected by changing geology.An advantage of our method is that it is easy to use, robust, and can give good results.
Another drawback of DEGDEM is that demigration is not able to create refractions or multiples.The results from de Jonge, Vinje, Zhao et al. (2022) show that despite multiples not being in the training data, the network was able to deghost multiples.While multiples behave similarly to primary reflections, refractions behave differently, and for this reason, our neural network struggled more to deghost them compared to multiples.Peng et al. (2021) showed that it is possible to create training data directly from data after conventional deghosting (pre-stack).One issue with this method is that the trained neural network will most likely never improve the deghosting as it will reproduce the weaknesses of conventional deghosting.The main advantage of their method is that it can reduce the computation cost by training a network on a small part of a survey and applying it to the rest of the survey or another survey.However, our method can use the same strategy dur-ing training to reduce the cost.In addition, our method has the possibility of improving the deghosting (de Jonge, Vinje, Zhao, et al., 2022).
It is also possible to use time demigration instead of depth demigration to generate training data.Time migration is less computationally expensive than depth migration, and preliminary time-domain images are often available relatively early in the processing sequence (Iversen et al., 2012).It should also be possible to use reverse time migration (RTM) to produce a seismic image and then use demigration to create training data.It is unclear if using RTM instead of Kirchhoff migration before demigration influences the final results.However, it should be mentioned that demigration is a reverse process to Kirchhoff migration.Therefore, demigration from Kirchhoff migrated images will be more similar to the data before migration.
In the following paragraphs, we will discuss the results and conclusion on the synthetic and real-data tests.
The dual-component DEGDEM results on synthetic data are encouraging.Figure 12c,d  the ghosts effectively.In Figure 12E, we have almost identical amplitude spectra for the ground-truth and dual-component DEGDEM.In addition, Figure 13 shows that dual-component DEGDEM is better than pressure-only DEGDEM, with an improvement between ∼5 and 15 dB for all frequencies.Conventional deghosting gave good results, especially above 50 Hz.However, below 50 Hz, dual-component DEGDEM gave a smaller error at most frequencies.Based on our results, we could use the pressure and vertical particle velocity (P-V z ) sum for the higher frequencies (e.g.above 50 Hz) and DEGDEM on the lower frequencies (e.g.below 50 Hz) for the optimal result.We speculate that it might be easier for a neural network to separate ghosts in this synthetic data compared to real data because of the 'spiky' nature of the synthetic data.
Figure 16 illustrates that on synthetic data, DEGDEM was not significantly affected by spatially aliased data compared to conventional deghosting.This result indicates that DEGDEM could have an advantage on sparse data where we have prob-lems using an obliquity filter on the V z data.DEGDEM could also have an advantage on 3D data with wide cable separation.For example, if we used 3D data as input to the network using all cables (one cable for each channel), we would have sparse data in the crossline direction.However, testing this is outside the scope of this paper.It is worth mentioning that there are some ways of dealing with aliased or sparse data.Streamers usually have anti-alias filters that prevent some aliased energy.It is also possible to interpolate traces in the inline or crossline direction (Gulunay, 2003;Wang et al., 2019) before using the P-V z sum.An alternative way to estimate the crossline wavenumber and the obliquity factor is to use a 1D velocity model to construct hyperbolic wavefronts.The differential of these wavefronts can be used to estimate the incidence angle and, consequently, the crossline wavenumber.
Our results show that dual-component DEGDEM was able to deghost real 3D data.We compared DEGDEM with conventional dual-component deghosting to quality control our results.DEGDEM and conventional deghosting gave similar results for both source and receiver deghosting, as shown in Figures 19-24.However, we observe differences in the lower frequencies (0-10 Hz), around the first receiver notch (∼42 Hz) and on the higher frequencies (above 70 Hz).The biggest difference seems to be below 5 Hz. Figure 24a,d shows a difference in the amplitude of the low frequencies.At higher frequencies, the waveforms are more similar.To investigate the low frequency difference further, we scaled the amplitude of the DEGDEM result slightly to match the amplitude of conventional deghosting.Figure 25 shows both DEGDEM and conventional deghosting with the same amplitude on the low frequencies.We can observe that the waveforms are similar apart from three places where we observe some inconsistencies marked with arrows.The red arrows indicate places where conventional deghosting seems to have a more coherent signal, and the blue arrows indicate places where DEGDEM seems to have a more coherent signal.Conventional deghosting may over-boost the amplitude on the low-frequencies and/or DEGDEM may suppress them.It is unclear why we observe a difference in amplitude on the low frequencies, and this is a subject of future research.These results give us confidence that dual-component DEGDEM works on real data with good results.We used only part of the area from Figure 18 to create training data.However, the data quality inside and outside the training area is the same.
DEGDEM is computationally inexpensive if a trained neural network is available.However, creating demigrated training data and training a neural network carries a significant computational cost.It is possible to train a neural network on a smaller area and apply the trained network to the rest of the survey, making this method quite efficient.DEGDEM is approximately 2.3 times slower in time using the same resources than conventional deghosting when we include demigration, training and prediction.In this calculation, we include the real data results along the sail-line shown in this paper for all guns and cables.However, if the network is already trained, it is approximately 125 times faster in time than conventional deghosting.Computation time depends on the parameters used for conventional deghosting, demigration and the neural network.

CONCLUSION
In this paper, we demonstrate that dual-component deghosting using demigration-based supervised learning (DEGDEM) can remove ghosts.Tests on synthetic data show that using DEGDEM with dual-component data improves deghosting compared to using only pressure data.In addition, DEGDEM improved deghosting below 50 Hz on synthetic data compared to conventional deghosting.We also tested our method on spatially aliased synthetic data and the results indicate that the DEGDEM is less affected by aliasing compared to conventional deghosting.This is an indication that DEGDEM could be beneficial on sparse data.
We used DEGDEM on real data from the North Sea and compared our method with conventional deghosting.Our results show that both methods can remove the ghosts and are similar in quality for most frequencies.These results are an indication that DEGDEM works well on real data.In addition, a pre-trained DEGDEM will be up to a couple of orders of magnitude faster than the conventional deghosting.

A C K N O W L E D G E M E N T S
The authors would like to thank CGG and the University of Bergen for providing the working environment for this research.We are grateful to CGG Earth Data for permission to use their NVG data in our study.We would also like to thank CGG internal reviewers, Kawin Nimsaila, James Cooper and Rob Schouten, for helpful comments and suggestions that improved this work.We extend our gratitude to the Research Council of Norway, the University of Bergen and CGG for funding this work through an industrial Ph.D. grant, project no.305450.We wish to pay tribute to former geophysics professor Norman (Norm) Bleistein who recently passed away.Norm is recognized as a key contributor to the field of seismic imaging.In this paper, his basic data mapping concept has been applied in the form of seismic demigration.

D A T A AVA I L A B I L I T Y S T A T E M E N T
The real data associated with this research is confidential and cannot be released.The Marmousi model can be found online: https://wiki.seg.org/wiki/AGL_Elastic_Marmousi.

R E F E R E N C E S
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.© 2023 CGG Services Norway AS.Geophysical Prospecting published by John Wiley & Sons Ltd on behalf of European Association of Geoscientists & Engineers.

F
I G U R E 4 An illustration showing the basic concept of constant-offset migration and demigration.The ray paths and two-way traveltime are calculated from S and R to all depth points.Migration is a mapping from the data domain (a) to the depth domain (b), and demigration is the reverse process.

F
I G U R E 7 An example of an obliquity filter (left) and a finite-difference (FD)-modelled shot gather from the Marmousi model (right) in the frequency-wavenumber (FK) amplitude domain.The black stippled line indicates when   = 0. F I G U R E 8 (a) An example of mechanical V z noise on a real shot gather.(b) The same V z shot gather, but with a low-cut filter (20 Hz) applied.

F
Illustration showing the acquisition geometry used to create the synthetic finite-difference (FD) data from above the ship (a) and from the side of the ship (b).F I G U R E 1 1 Illustrates the deghosting using demigration-based supervised learning (DEGDEM) workflow with dual-component data used on synthetic data from the Marmousi model.Source: Adopted from de Jonge, Vinje, Zhao et al. (2022).

F
Receiver deghosting in common-channel gather (receiver 10): (a) V z data with 'raw' noise and ghosts; (b) low-cut (20 Hz) filtered V z data with ghosts; (c) pressure data with ghosts; (d) deghosting using demigration-based supervised learning (DEGDEM) using both pressure and low-cut filtered V z data as input; (e) amplitude spectrum of pressure with ghosts, low-cut filtered V z data with ghosts, DEGDEM, the ground-truth and DEGDEM error.

F
I G U R E 1 4 (a) and (b) V z shot gathers with 12.5 and 50 m receiver separation, respectively.(c) and (d) Frequency-wavenumber (FK) amplitude spectra of the shot gathers for 12.5 and 50 m receiver separation, respectively.F I G U R E 1 5 (a) Illustration of typical receiver positions in a 3D marine seismic acquisition.(b) A 2D shot gather with 12.5 m receiver separation.(c) A 2D shot gather with 50 m receiver separation.

F
Deghosting error in common-channel gathers: (a) deghosting using demigration-based supervised learning (DEGDEM) error for receiver 3; (b) DEGDEM error for receiver 10; (c) conventional deghosting error for receiver 3; (d) conventional deghosting error for receiver 10; (e) and (f) amplitude spectrum for receivers 3 and 10, respectively, showing pressure with ghosts, V z with ghosts, DEGDEM error, conventional deghosting error and 12.5 m DEGDEM error.the survey for the training and prediction data.Using the same noise in training and prediction could give a biased result.We test two different training approaches: (1) Training data contains pressure data and 'raw' V z data.We then train a neural network using this training data.After training, we test the network on FD modelled data that contains pressure data and 'raw' V z data.(2) Training data contains pressure data and V z data with a low-cut filter at 20 Hz to remove most of the V z noise.We then train a neural network using these training data.After training, we test the network on FD F I G U R E 1 7 Illustration showing the acquisition geometry used by the Northern Viking Graben (NVG) survey from above the ship (a) and from the side of the ship (b).

F
Real Northern Viking Graben (NVG) data shown as CMP stacks for the central cable.This data was used as input for deghosting using demigration-based supervised learning (DEGDEM) and conventional deghosting: (a) V z data with ghosts and a low-cut filter and (b) pressure data with ghosts.The area to the left of the dashed red line is used to create training and validation data.

F
Source and receiver deghosting on the outer cable are shown as CMP stacks.(a) and (b) P-V z deghosting using demigration-based supervised learning (DEGDEM) and conventional deghosting, respectively, with pressure and low-cut filtered V z data as input.(c) Difference between conventional deghosting and P-V z DEGDEM.(d) Amplitude spectrum showing pressure with ghosts, low-cut filtered V z with ghosts, DEGDEM and conventional deghosting.

F
Source and receiver deghosting on the central cable are shown as CMP stacks.Descriptions of (a)-(d) are given in Figure 19.method would also be useful when, for example, a dataset area is extended from 1 year to the next, or for new vintages of data for 4D seismic.However, if the acquisition geometry is different from the current one (i.e.different source and receiver depths), we cannot use the same trained network.In this case, we have to create demigrated training data that matches the new vintage and train a new neural network.DISCUSSION de Jonge, Vinje, Zhao et al. (2022) compared pressureonly deghosting using demigration-based supervised learning (DEGDEM) with conventional  −  pressure-only deghosting by Poole (2013); both showed similar results on synthetic data.They carried out various tests on synthetic data, such as including multiples in the prediction data but not the training F I G U R E 2 1 Source and receiver deghosting on the outer cable are shown as common-channel gathers (receiver 10).Descriptions of (a)-(d) are given in Figure 19.

F
Source and receiver deghosting on the central cable are shown as common-channel gathers (receiver 10).Descriptions of (a)-(d) are given in Figure 19.

F
Source and receiver deghosting on the outer and central cables are shown as shot gathers.(a)-(c) P-V z deghosting using demigration-based supervised learning (DEGDEM), conventional deghosting and difference for the outer cable, respectively.(e)-(g) P-V z DEGDEM, conventional deghosting and difference for the central cable, respectively.(d) and (h) Amplitude spectrums showing pressure with ghosts, low-cut filtered V z with ghosts, DEGDEM and conventional deghosting for the outer and central cable, respectively.

F
Source and receiver deghosting on the central cable are shown as CMP stacks from 0 to 10 Hz: (a) scaled P-V z deghosting using demigration-based supervised learning (DEGDEM) deghosting and (b) conventional deghosting.