Test of CP invariance in vector-boson fusion production of the Higgs boson in the H → τ τ channel in proton–proton collisions at √ s = 13 TeV with the ATLAS detector

A test of CP invariance in Higgs boson production via vector-boson fusion is performed in the H → ττ decay channel. This test uses the Optimal Observable method and is carried out using 36 . 1 fb − 1 of √ s = 13 TeV proton–proton collision data collected by the ATLAS experiment at the LHC. Contributions from CP-violating interactions between the Higgs boson and electroweak gauge bosons are described by an effective ﬁeld theory, in which the parameter ˜ d governs the strength of CP violation. No sign of CP violation is observed in the distributions of the Optimal Observable, and ˜ d is constrained to the interval [ − 0 . 090, 0 . 035] at the 68% conﬁdence level (CL), compared to an expected interval of ˜ d ∈ [− 0 . 035 , 0 . 033 ] based upon the Standard Model prediction. No constraints can be set on ˜ d at 95% CL, while an expected 95% CL interval of ˜ d ∈ [− 0 . 21 , 0 . 15 ] for the Standard Model hypothesis was expected.

The investigation of Higgs boson production and decay at the LHC offers a novel opportunity to search for new sources of CPV in the interaction of the Higgs boson with other SM particles. No observable effect of CPV is expected in the production or decay of the SM Higgs boson. Hence any observation of CP violation involving the observed Higgs boson [21,22] would be an unequivocal sign of physics beyond the SM.
The measured Higgs boson production cross sections, branching ratios, and derived constraints on coupling-strength modifiers, assuming the tensor structure of the SM, agree with the SM predictions within the current precision [23][24][25]. Investigations of spin and CP quantum numbers strongly indicate that the observed particle is of scalar nature and that the dominant coupling structure is CP-even and consistent with the SM expectation [26][27][28]. Various measurements have been used in the framework of effective field theories to derive limits on Wilson coefficients which multiply CPeven and CP-odd operators and modify the structure and strength of the coupling of the Higgs boson to gluons and electroweak gauge bosons. These include measurements of differential cross sections as functions of CP-even observables in the decay H → γ γ [29], measurements of event rates in specific event categories and phase-space regions in the decay H → Z Z * [30], and measurements of the V H invariant mass in Higgs boson production in association with a weak gauge boson V (V = W ± , Z ) [31]. These analyses use CP-even observables and event rate information and hence are not directly sensitive to possible interference between the CP-even SM operators and new CP-odd operators. The shapes of distributions of CP-odd and CP-even observables (without exploiting CP-even rate information) have been used to set limits on CP-odd and CP-even couplings of the Higgs boson to gauge bosons. This is done by investigating the decay H → V V * (V = W ± , Z ), using only information from the decay [27,32] and combining it with information from vector-boson fusion (VBF) or associated V H production [33,34]. Another analysis using the decay H → τ τ exploits information from VBF and V H production [35]. The shape of the distribution of a single CP-odd observable constructed from kinematic information in VBF production in H → τ τ candidate events has been previously used to set a limit on the parameter d [36], which governs the strength of CPV in an effective field theory ansatz as described in Section 2. This analysis constrained d to the interval [−0.11, 0.05] at the 68% confidence level (CL) using ATLAS data collected at √ s = 8 TeV in 2012, while a 68% CL interval of d ∈ [−0.16, 0.16] was expected. No hints of CPV have been observed in these studies.
In this Letter, a direct test of CP invariance in Higgs boson production via VBF is presented in the H → τ τ channel, based on proton-proton collision data corresponding to an integrated luminosity of 36.1 fb −1 collected with the ATLAS detector at √ s = 13 TeV in the years 2015 and 2016. A CP-odd Optimal Observable [37-39] is employed. The Optimal Observable combines the information from the multidimensional phase space in a single quantity calculated from leading-order matrix elements for VBF production, independent of the decay mode of the Higgs boson. VBF production provides a promising physics process to test CP invariance in the H V V vertex [40]. The decay mode H → τ τ allows the selection of signal events with a good signal-to-background ratio and the reconstruction of the four-momentum of the Higgs boson candidate with adequate precision.
In the present work a direct test of CP invariance is obtained through a measurement of the mean value of the CP-odd Optimal Observable, neglecting possible effects from rescattering by new light particles in loops [40]. A measurement of the parameter d is also performed. Limits on d are derived by analysing the shapes of distributions of the Optimal Observable measured in H → τ τ candidate events with two jets in the final state consistent with VBF production. The event selection, estimation of background contri-butions, and systematic uncertainties closely follow the analysis employed for the observation of the H → τ τ decay [41]. In order to increase the signal-to-background ratio, the final event selection utilizes multivariate discriminants.

Theoretical framework and methodology
The effective Lagrangian L eff considered is the SM Lagrangian augmented with CP-odd operators of mass dimension six, involving the Higgs field and electroweak gauge fields. No CP-even dimension-six operators built from these fields are taken into account. All interactions between the Higgs boson and other SM particles (fermions and gluons) are assumed to be as predicted in the SM, i.e. the coupling structure in gluon-gluon fusion production and in the decay into a pair of τ -leptons is considered to be the same as in the SM. The theoretical ansatz considered and the methodology is the same as in Ref. [36], which contains further details. After electroweak symmetry breaking, the Lagrangian can be written in the mass basis of the Higgs boson H , photon A and weak gauge bosons W ± and Z as in Ref. [42]: where g is the SU(2) coupling constant and θ W is the weak mixing angle. Adopting the arbitrary choice d =d B yields the following relations 2 : In an effective field theory (EFT), the coupling parameters are real valued. However, rescattering effects from new particles in loops, with masses lower than the scale of new physics assumed in the EFT, may introduce an imaginary part [40]. Such effects are not considered in the analysis presented here, as d is assumed to be real valued.
The strength of CP violation in VBF Higgs boson production is then described by a single parameter d . The corresponding matrix element M for VBF production is the sum of a CP-even contribution M SM from the SM and a CP-odd contribution M CP-odd from the dimension-six operators considered: where the dependence on d has explicitly been factored out. The squared matrix element has three contributions: |M| 2 = |M SM | 2 +d · 2 Re(M * SM M CP-odd ) +d 2 · |M CP-odd | 2 .
The first term |M SM | 2 and third term d2 · |M CP-odd | 2 are both CP-even and hence are not a source of CPV. The second term d · 2 Re(M * SM M CP-odd ) stems from the interference of the two contributions to the matrix element and is CP-odd, representing a possible new source of CPV in the Higgs sector. The interference term integrated over a CP-symmetric part of phase space vanishes and therefore does not contribute to the total cross section and observed event yield after CP-symmetric selection criteria are applied. The third term increases the total cross section by an amount quadratic in d , but this is not exploited in the analysis presented here as the observed rate can also be influenced by additional CP-conserving new physics. The final state consisting of the reconstructed decay of the Higgs boson and the two tagging jets corresponding to the VBF topology can be characterized by seven phase-space variables, by fixing the mass of the Higgs boson, neglecting jet masses, and exploiting momentum conservation in the plane transverse to the beam line. The concept of the Optimal Observable (O opt ) combines the information from the seven-dimensional phase space into a single observable, which is shown to have the highest sensitivity to small values of the parameter of interest and neglects contributions proportional to d2 in the matrix element.
The Optimal Observable for the determination of d is given by the ratio of the interference term in the matrix element to the SM contribution: In order to make an almost model-independent test of CP invariance, the mean value of the Optimal Observable can be measured. If no CPV is present in the H V V vertex, then the expectation value of the Optimal Observable vanishes: O opt = 0, as the Optimal Observable is a CP-odd (and T -odd 3 ) variable. Since the initial state of VBF production of the Higgs boson is not CPsymmetric, this argument assumes that effects from rescattering are negligible [40]. Thus an observation of a non-vanishing mean value or an asymmetry in the Optimal Observable distribution would indicate physics beyond the SM, either stemming from CPV, or originating from rescattering effects (i.e. new particles being on the mass shell in loop corrections to the H V V vertex). Example distributions of the Optimal Observable for signal events after the full event selection, as described in Section 5, are shown for various values of d in Fig. 1. In the SM the distribution is symmetric and has a mean value of zero, whereas a non-vanishing value of d causes an asymmetry and a non-vanishing mean value of the Optimal Observable.
The values of the leading-order matrix elements (ME) needed for the calculation of the Optimal Observable are extracted from HAWK [45][46][47]. The evaluation requires the four-momenta of the Higgs boson and the two tagging jets ( jj). The momentum fraction x 1 (x 2 ) of the initial-state parton from the proton moving in the positive (negative) z-direction (along the beam) can be derived by exploiting energy-momentum conservation from the Higgs boson and tagging jet four-momenta as The best estimate and confidence intervals for d in this analysis are determined by a fit of the predicted distribution of the Optimal Observable to that measured in data. It has been shown in Ref.
[36] that the Optimal Observable yields a significantly higher sensitivity in the determination of d than the CP-odd signed difference in the azimuthal angle φ jj between the two tagging jets, as suggested in Ref. [44].

ATLAS detector
The ATLAS experiment [49][50][51] at the LHC is a multipurpose particle detector with a forward-backward symmetric cylindrical geometry and a near 4π coverage in solid angle. 4 It consists of an inner tracking detector surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic and hadron calorimeters, and a muon spectrometer. The inner tracking detector covers the pseudorapidity range |η| < 2.5. It consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors. Lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic (EM) energy measurements with high granularity. A steel/scintillator-tile hadron calorimeter covers the central pseudorapidity range (|η| < 1.7). The endcap and forward regions are instrumented with LAr calorimeters for both the EM and hadronic energy measurements up to |η| = 4.9. The muon spectrometer surrounds the calorimeters and is based on three large air-core toroidal superconducting magnets with eight coils each. The field integral of the toroids ranges between 2.0 and 6.0 T m across most 4 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis along the beam pipe. The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upwards. Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis. The pseudorapidity is defined in terms of the polar angle θ as η ≡ − ln tan(θ/2). Angular distance is measured in units of of the detector. The muon spectrometer includes a system of precision tracking chambers and fast detectors for triggering. The integrated luminosity recorded by ATLAS is obtained with the LUCID-2 detector [52]. A two-level trigger system is used to select events [53]. The first-level trigger is implemented in hardware and uses a subset of the detector information to reduce the accepted rate to at most 100 kHz. This is followed by a software-based trigger that reduces the accepted event rate to 1 kHz on average depending on the data-taking conditions.

Simulated event samples
Samples of signal and background events were simulated using various Monte Carlo (MC) event generators. The generators and the PDF sets used for the hard-scattering process and the models used for the parton showers, hadronization, and underlying-event activity (UEPS) are summarized in Table 1. In addition, the order of the total cross-section calculation is given.
Only Higgs boson production via VBF is considered as signal, including the signals observed as H → τ τ decay and H→ W W * → ν ν decay. The analysis is not sensitive to CPV in the H → W W * decay vertex and the shape of the Optimal Observable is the same for the H→ W W * → ν ν and H → τ τ → 4ν decay modes regardless of the value of d . The other Higgs boson production modes -gluon-gluon fusion (ggF H ), V H, tt H -are considered as background in this analysis, and all couplings other than the H V V coupling were set to SM values. All SM signal and background samples used in this analysis are the same as those employed in Ref.
[41], and the same normalization of those samples is used. The only exception is the normalization of the electroweak Z jj process. Here, the leading-order (LO) cross section calculated by the Sherpa 2.2.1 generator [54][55][56][57] is multiplied by a factor of 1.7 to match the cross-section value measured by the ATLAS experiment at √ s = 13 TeV [58]. An uncertainty of 25% from the measured cross-section of the electroweak Z jj process is applied to the normalization.
To simulate the presence of non-vanishing values of d in the H V V vertex, a matrix-element reweighting method is applied to the VBF SM signal sample. The weight is defined as the ratio of the squared ME value of the VBF process associated with a specific amount of CP mixing (given in terms of d ) to that obtained from the SM. To extract the weights, the leading-order MEs from HAWK are used for the 2 → 2 + H and 2 → 3 + H processes separately. The MEs are evaluated using the four-momenta and particle identification codes of the initial-and final-state partons and the Higgs boson of each event. The reweighting procedure has been validated [36] against samples generated with Mad-Graph5_aMC@NLO [98] and proves to be a good approximation of a full NLO description of the process with non-vanishing values ofd. For all samples, a full simulation of the ATLAS detector response [99] using the Geant4 program [100] was performed. The effect of multiple pp interactions in the same and neighbouring bunch crossings (pile-up) was included by overlaying minimumbias events simulated with Pythia 8 using the MSTW2008LO PDF [101] and the A2 set [102] of tuned parameters on each generated signal and background event. The number of overlaid events was chosen such that the distribution of the average number of interactions per pp bunch crossing in the simulation matches that observed in data.

Event selection
In this analysis, events with at least two jets and a H → τ τ decay candidate in the final state are selected. Decays of the τleptons with all combinations of leptonic (τ → νν with = e, μ) and hadronic (τ → hadrons ν) final states are considered. In the following, the event preselection, which closely follows Ref.
[41], is summarized and the analysis strategy using gradient boosted decision trees (BDTs) [103] is described. After data quality requirements [104], the integrated luminosity of the √ s = 13 TeV dataset used is 36.1 fb −1 . The definition of the reconstructed objects as well as the triggers used in this analysis correspond to those used in Ref.
Depending on the reconstructed decay modes of the two τleptons, events are separated into four analysis channels: the dileptonic same-flavour (τ lep τ lep SF), the dileptonic different flavour (τ lep τ lep DF), the semileptonic (τ lep τ had ), and the fully hadronic (τ had τ had ) channel. All channels require an exact number of identified and isolated τ -lepton decay candidates, i.e. electrons, muons, and visible products of hadronic τ decays (τ had-vis ), as defined  Summary of the event selection requirements for the four analysis channels. In the case of the p T requirements on the τ -lepton decay candidates, the asterisk marks the lowest p T threshold, which varies depending on the trigger used. Details of this are given in Ref. [41]. The transverse momentum of the visible decay products of the τ -lepton candidate with the higher (lower) transverse momentum is denoted by p τ1 T (p τ2 T ). The input variables used for the BDT training and the BDT score threshold used to define the signal regions are also reported.
Two isolated τ -lepton decay candidates with opposite electric charge an E miss T calculation considering only contributions from reconstructed objects and neglecting contributions from inner-detector tracks originating from the vertex of the hard-scattering process, but not associated with any of the reconstructed objects. In addition, a requirement on the invariant mass of the two light leptons, m , is applied in the τ lep τ lep channels. A requirement on the di-τ mass calculated in the collinear approximation [105] of m coll τ τ > m Z − 25 GeV is introduced in the τ lep τ lep channels to ensure orthogonality between this analysis and the analysis of H → W W * → ν ν [106], which has a similar final state. In the τ lep τ lep and τ lep τ had channels, the top quark background is suppressed by requiring that no jet with p T > 25 GeV and |η| < 2.5 contains b-hadrons (b-jets). A multivariate algorithm [107,108] is used to identify and select b-jets with a working point corresponding to an average efficiency of 85%, as measured on a sample from top quark pair production. Low transverse mass 5 (m T < 70 GeV) is required in the τ lep τ had channel to reject events with leptonic W decays.
Requirements on the angular distance between the visible products of the two selected τ -lepton decays, R τ τ , and their pseudorapidity difference, | η τ τ |, are applied in the τ had τ had channel to reject non-resonant background events.
To select Higgs boson events produced by VBF, all channels require at least two jets with transverse momentum of the leading jet p j 1 T > 40 GeV and of the subleading jet p j 2 T > 30 GeV, a large invariant mass of the two leading jets, m jj > 300 GeV, and a pseudorapidity separation of | η jj | > 3. In the τ had τ had channel, the requirements on the leading jet are raised to p j 1 T > 70 GeV and |η j 1 | < 3.2 to achieve a uniform trigger selection efficiency as a function of p j 1 T . This selection is referred to as the VBF event selection in the following.
To construct a region enriched in VBF signal events, BDTs trained to discriminate between the VBF signal and the backgrounds are used in all channels. Kinematic variables used in the BDT training can be categorized as follows: 5 The transverse mass is defined as is the azimuthal separation between the directions of the lepton and the missing transverse momentum. • Properties of a resonant di-τdecay which discriminate against processes with jets that are misidentified as τ -decay candidates (referred to as "Misidentified τ "): the angular distance R τ τ , the difference in pseudorapidity | η τ τ |, and the difference in azimuth φ τ τ between the two visible τ- • Properties of the VBF topology: m jj , the total transverse momentum p tot T , which is defined as the transverse momentum of the system composed of all objects in a VBF event (τ 1 , τ 2 , j 1 , j 2 , E miss T ), η-centralities, C jj (τ 1 ) and C jj (τ 2 ), of each τ-candidate relative to the pseudorapidity of the two leading jets, 7 and the transverse momentum of the third leading jet p j 3 T which is set to zero for events with exactly two jets.
The most important variables in the training are m MMC τ τ , m jj , and C jj (τ 1 ). The resulting BDT score (BDT score ) distributions are shown in Fig. 2 for events surviving the VBF event selection , where η τ , η j1 and η j2 are the pseudorapidities of the τ-candidate and the two leading jets, respectively. This variable has a value of unity when the object is halfway in η between the two jets, 1/e when the object is aligned with one of the jets, and < 1/e when the object is not between the jets in η. and show the ability of the BDT to separate the signal process from background processes. All figures in this Letter use signal strength μ (defined as the ratio of the measured cross section times branching ratio to the SM prediction for the VBF signal process), background normalizations, and systematic uncertainties as fitted by the final statistical analysis discussed in Section 8 and referred to as post-fit. The signal purity increases at high values of BDT score . A threshold value of BDT score is used to define the final signal region (SR) in each channel. This threshold is chosen to yield a high signal significance and is given in Table 2

Background estimation
Several background processes contribute to the SR event yields in the four analysis channels. The dominant contributions in the τ lep τ lep DF, τ lep τ had , and τ had τ had channels arise from Z → τ τ production and from light-and heavy-flavour jets that are misidentified as prompt leptonic or hadronic τ decays. The misidentified τ decays in the τ lep τ lep and τ lep τ had channels originate largely from W +jets production with smaller contributions from multijet and top quark production, while in the τ had τ had channel the contribution from multijet production dominates. In the τ lep τ lep SF channel the contribution from Z → production is dominant. Other background contributions in all analysis channels originate from top quark pair and associated W t production (denoted by "tt/W t" in the following), diboson production, and other Higgs boson production modes. To construct a CR for Z → τ τ production, the SR requirement on the BDT score (given in Table 2) is inverted for each analysis channel. This CR is called the "low-BDT score CR" in the follow- ing. Since the purity of Z → τ τ production in the low-BDT score CR ranges from 30% to 54% depending on the analysis channel, Z → τ τ production is normalized to data in the Z boson mass peak of the m MMC τ τ distributions, shown in Fig. 3. In the fit the Z → τ τ normalization is correlated across all analysis channels and the fit yields a normalization factor of 0.93 ± 0.08. To ensure that the normalization is valid in the SR, the modelling of the Z -boson and jet kinematic properties was checked in a validation region which is composed of Z → events with kinematic properties similar to those of the Z → τ τ events in the VBF region of each analysis channel. This region is defined by selecting two same-flavour leptons of opposite charge with a dilepton mass of m > 80 GeV and low missing transverse momentum (E miss T < 55 GeV). All VBF selection requirements given in Table 2 are applied as well. As in Ref.
[41], a slight positive slope in the ratio of the data to the Sherpa simulation as a function of m jj is observed. In this analysis, the simulated Z → τ τ and Z → events are reweighted to the observed m jj distribution after the VBF event selection, which results in a small change in the acceptance of Z → τ τ and Z → events in the SR.
In each of the two τ lep τ lep channels, a top quark CR is defined by inverting the veto on b-tagged jets and not applying the selection on the BDT score . The normalization of tt/W t production is constrained by the event yield in these CRs, corresponding to a normalization of 1.16 ± 0.06 from the combined fit to the data. Additionally, another CR is defined to normalize the Z → process for the τ lep τ lep SF channel. Again, the selection on the BDT score is not applied, and the requirement on the dilepton invariant mass is changed to 80 < m < 100 GeV. The observed event yield in the Z → CR constrains the normalization of simulated Z → events in the τ lep τ lep SF channel to 1.0 ± 0.4. In the τ had τ had channel, the background from misidentified hadronic τ decays is dominated by multijet events. This background process is modelled using a template extracted from τ had-vis candidates with one, two, or three associated tracks that pass all selection requirements, but fail the opposite-charge requirement. Before the final fit, the template is normalized to data by a fit of the | η τ τ | distribution after the preselection, but removing the requirement on | η τ τ |. In the final fit the template is normalized to data in the m MMC τ τ distribution of the low-BDT score CR in the τ had τ had channel. Then, the multijet background is normalized with a factor of 0.99 ± 0.09 relative to the pre-fit normalization.
The modelling of the Optimal Observable distribution for the background processes is validated in all CRs. Fig. 4 shows Optimal Observable distributions in the low-BDT score CR for all analysis channels, where the background processes have been normalized to the result of the fit. Neither the observed nor the predicted distributions in any CR show hints of an asymmetry or non-vanishing mean values of the Optimal Observable caused by event reconstruction and selection within uncertainties. The data and the predicted distributions are observed to be compatible within uncertainties here as well as in the top quark and Z → CRs of the τ lep τ lep channels.

Systematic uncertainties
The effects of the systematic uncertainties on the yields in both the SRs and CRs and on the shape of the Optimal Observable in the SRs, as well as the m MMC Theoretical uncertainties affecting the total cross section are evaluated for the Higgs boson production cross sections for ggF H , V H, and tt H production by varying the QCD factorization and renormalization scales as well as the PDF model following the recommendations in Ref. [119]. Also, uncertainties in the H → τ τ and H → W W * branching ratios are considered [119]. Theoretical uncertainties in the MC modelling are considered for the VBF and gluon-gluon fusion production of the Higgs boson as well as for Z → τ τ production. For all simulated background contributions other than Z → τ τ , no theoretical uncertainties are considered, as their impact is negligible. Uncertainties in MC modelling are assessed by a comparison between nominal and alternative event generators and UEPS models, as indicated in Table 1. In addition, the effects of QCD factorization and renormalization scale variations, matching-scale variations (in the case of Z → τ τ only), and PDF model uncertainties are evaluated. As an additional uncertainty in the Z → τ τ and Z → processes, the full difference between the sample reweighted to the observed m jj distribution and the sample without reweighting is applied to the full analysis. An uncertainty to account for the signal reweighting procedure described in Section 4 was considered and found to be negligible. The uncertainty due to limited MC sample size is evaluated for the sum of all MC-based background processes in each analysis bin.

Fitting procedure
The estimate of d is obtained using a binned maximum-

Results
For a CP-even Higgs boson, the mean value of the Optimal Observable for the signal and background processes is expected to be zero if any effects from the rescattering of new particles in loops can be neglected. However, CP-violating effects could result in the mean value of the Optimal Observable in data deviating from zero, allowing an almost model-independent test for CP-violating effects in this measurement.
The observed values for the mean of the Optimal Observable in data, along with their statistical uncertainties, are summarized in Table 3 for the four channels in this analysis, as well as their combination. The combined mean is obtained by weighting the mean value of each individual channel by the inverse of its respective variance. These values are fully consistent with zero, so no evidence of CPV is observed.
To extract confidence intervals for the CP-mixing parameter d ,  While the predicted background distributions for the Optimal Observable are not perfectly symmetric, they are statistically consistent with a symmetric distribution. This slight asymmetry causes the expected confidence intervals for d to also be asym- The observed and expected NLL curves are shown in Fig. 6(a) as a function of d . The expected curves are obtained in a two-step process: firstly, nuisance parameters and background normalization factors are constrained via a ML-fit to all analysis CRs, excluding the SRs; then another fit is performed in all SRs and CRs to pseudo-data which were created with the best-fit parameter values from the first step. This two-step process ensures that the nuisance parameters and the background normalization factors for the expected sensitivity are set to values that are consistent with the observed data in the analysis CRs. The expected NLL curve is shown for d = 0 and μ = 1 and represents the best estimate of the sensitivity of the analysis based on SM expectations. Another NLL curve with d = 0 and the signal strength μ set to the observed value of 0.73 is also shown in order to demonstrate the decrease in sensitivity due to the lower than expected event yields (see Tables 4 and 5). Also shown for comparison in Fig. 6(a) is the pre-fit expected NLL curve, which is obtained using a pseudo-dataset where the event yields and distributions in the SRs and CRs are set to the SM expectations for both the signal (with d = 0 and μ = 1) and background processes. This demonstrates that the preferred values of the nuisance parameters and normalization factors based on the observed data in the background CRs in the expected NLL curve result in a decrease in sensitivity to d when compared with the pre-fit expected curve.
The effect of systematic uncertainties on the sensitivity to d can be seen in Fig. 6(b). Here, the expected NLL curves are shown for d = 0 and μ = 1, with and without the effect of systematic uncertainties. To assess the impact of systematic uncertainties stemming from jet reconstruction, τ -lepton identification, and MC sample size, expected NLL curves are also shown where the nuisance parameters corresponding to the systematic uncertainties in question have been removed from the likelihood function. It is evident that the experimental uncertainties related to jet reconstruction have the largest effect on the sensitivity of this analysis to d . To obtain insight into the preferred values of d obtained for the individual Optimal Observable distributions in the different analysis channels, NLL curves for each individual channel are shown in Fig. 6(c), and compared with the combined result. For these individual NLL curves, only event yield information from the other three signal regions that are not being featured is used, so that the distribution of events in the Optimal Observable in these other signal regions is not exploited in the ML-fit. For these individual channel NLL curves, the signal strength is constrained to be positive so that the ML-fit is stable and insensitive to event yield fluctuations in the individual channel SRs that arise from smaller prediction are set to those which minimize the NLL. The ratios of the data to the prediction are shown in the lower panels. The size of the combined statistical, experimental and theoretical uncertainties in the predicted event yields is indicated by the hatched bands.

Table 4
Post-fit event yields in the SRs for the τ lep τ lep SF and τ lep τ lep DF analysis channels.  Fig. 6(a) is Table 5 Post-fit event yields in the SRs for the τ lep τ had and τ had τ had analysis channels.
The line "Other backgrounds" includes top quark (tt and single top), diboson, and Z → backgrounds. Backgrounds from W (→ τ had ν)+jets production in the τ had τ had channel are also included in "Other backgrounds". For comparison, the expected signal yields for the SM expectation (μ = 1, d = 0) are also shown.  The intervals based upon the pre-fit expected NLL curve in Fig. 6(a), where the nuisance parameters and background normalization factors do not take into account the observed data in the CRs, are d ∈ [−0.032, 0.031] at 68% CL and d ∈ [−0.12, 0.10] at 95% CL.

Conclusion
The CP invariance of the Higgs boson coupling to vector bosons has been tested in the VBF H → τ τ process in 36.1 fb −1 of √ s = 13 TeV proton-proton collision data obtained with the ATLAS detector at the LHC. In this analysis, an Optimal Observable was used and confidence intervals were set on the CP-mixing parameter d .
Since the mean of the Optimal Observable observed in data is consistent with zero, and the obtained confidence intervals for

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.  [120]. [1] A.D. Sakharov, Violation of CP invariance, C asymmetry, and baryon asymmetry of the universe, Pisma Zh. Eksp. Teor. Fiz. 5 (1967)