A quantum information theoretic analysis of reinforcement learning-assisted quantum architecture search

In the field of quantum computing, variational quantum algorithms (VQAs) represent a pivotal category of quantum solutions across a broad spectrum of applications. These algorithms demonstrate significant potential for realising quantum computational advantage. A fundamental aspect of VQAs involves formulating expressive and efficient quantum circuits (namely ansatz) and automating the search of such ansatz is known as quantum architecture search (QAS). RL-QAS involves optimising QAS using reinforcement learning techniques. This study investigates RL-QAS for crafting ansatzes tailored to the variational quantum state diagonalization problem. Our investigation includes a comprehensive analysis of various dimensions, such as the entanglement thresholds of the resultant states, the impact of initial conditions on the performance of RL-agent, the phase change behavior of correlation in concurrence bounds, and the discrete contributions of qubits in deducing eigenvalues through conditional entropy metrics. We leverage these insights to devise an optimal, admissible QAS to diagonalize random quantum states. Furthermore, the methodologies presented herein offer a generalised framework for constructing reward functions within RL-QAS applicable to variational quantum algorithms.


I. INTRODUCTION
Quantum computing represents a paradigm-shifting approach to computation, leveraging the principles of quantum mechanics to process information in ways fundamentally different from classical computers.At its core, this technology employs quantum bits, or qubits, which can exist in superposition states, enabling parallel computation paths.The information in qubits is transformed via quantum circuits consisting of quantum logic gates that are capable of acting on these computation paths in tandem.The quantum circuits represent a corresponding quantum algorithm that can effectuate a computational advantage for a target application [1][2][3].These algorithms are either constructed manually, based on conventional logical reasoning or automatically by composing the quantum gates to replicate a required input-output behaviour via training.The latter approach is termed quantum architecture search (QAS) [4][5][6][7].
QAS typically consists of two parts.First, a template of the circuits is built, called the ansatz.The ansatz can have parametric quantum gates, e.g., rotation angles.Then, these parameters are determined via the variational principle using a classical optimiser in a feedback loop.Algorithms constructed via this technique are called variational quantum algorithms (VQA) [8,9].QAS can also used to determine non-parametric circuits * abhisheks@rri.res.in† a.sarkar-3@tudelft.nl‡ akundu@iitis.pl,corresponding author as an approach for quantum program synthesis [10].In this research, however, we will focus on QAS in the context of VQA.
The ansatz design is critical, as it directly influences the expressivity and efficiency of the quantum solution.Given the vast potential configuration space of quantum circuits, the process of identifying a suitable ansatz is a challenging problem.In previous works such as [5,[11][12][13], the search for the quantum circuits that solve various VQA has been automated using reinforcement learning (RL) techniques [14].We particularly call quantum ansatzes capable of expressing the solution of VQA an admissible ansatz.The automated search for such admissible ansatz using the RL-agent is termed reinforcement learning-assisted quantum architecture search (RL-QAS).In this setting, after a certain number of episodes, the RL agent returns possible configurations of quantum ansatz that solve the problem.For E tot episodes, the agent returns E s successful episodes, and in E tot − E s episodes, the agent fails to find a configuration within a specific predefined accuracy and depth.It has been shown in [11-13, 15, 16] that classical machine learningdriven QAS algorithms outperform the state-of-art structures of ansatz and effectively optimise the number of parameters, depth, and impact of noise in VQAs.
The use-cases of such admissible ansatzes cover applications in quantum information theory [9,[17][18][19][20], chemistry [21][22][23] and combinatorial optimization [24][25][26].Yet, a thorough investigation of such admissible ansatzes is still missing in recent literature.It has been previously shown [13] that in RL-QAS methods, it is crucial to feed proper information about the problem to the arXiv:2404.06174v1[quant-ph] 9 Apr 2024 agent to enhance the agent's performance.One of the ways of providing this information is by engineering a reward function.In RL, the formulation of the reward function and its engineering can be sparse and dense.The choice of optimal reward function depends on trial and error by investigating the performance of the agent under various reward configurations.
In this article, we investigate the admissible ansatzes proposed by the RL-agent from the perspective of quantum information theory [27].For the investigation, we specifically focus on the reinforcement learning enhanced variational quantum state diagonalisation (RL-VQSD) problem [12].In the VQSD algorithm, the probleminspired ansatz is based on density matrix exponentiation [28], but its depth and number of gates increase exponentially with the number of qubits.Hence, searching for an optimal ansatz (with a minimum number of gates and depth) that diagonalises a quantum state is an open research problem.In RL-VQSD, the search for the optimal configuration of ansatz is done by using a double deep-Q network (DDQN).See Appendix B for further details on the VQSD algorithm, the agent and environment specifications.The RL-agent proposes E s < E tot admissible ansatzes that solves the VQSD problem.
Our investigation shows that the concurrence of the s ansatzes lies between an upper and lower bound, independent of the different initialisation of the double deep-Q network (DDQN), where each initialisation of DDQN corresponds to a different random initialisation of the weights.Moreover, we show that the upper and lower bounds are anti-correlated with respect to the increment in the initial state entanglement.The anti-correlation turns into a mild, and later strong, correlation when the initial state entanglement goes beyond 0.322, indicating a phase change in correlation between the upper and lower bounds of the concurrence.The phase transition provides us with an in-depth insight into the relation between the input and the output of the VQSD algorithm.Further, our investigation reveals that the RL-agent can generate high concurrence admissible ansatz with fewer 2-qubit gates and circuit depth compared to ansatzes with very low concurrence.Furthermore, we quantify the contribution of individual qubits of the RL-ansatz using conditional quantum entropy.Using such a measure, we observe a correlation between entanglement and quantum entropy, revealing why RL-VQSD can efficiently find the largest eigenvalues but fails to find the smallest ones.
The structure of the paper is as follows.We present an overview of our main results in Sec.II.In Sec.III A, we analyse the upper and lower bounds on the concurrence of the admissible ansatzes that solve the VQSD problem.Specifically, in Sec.III A 1 we present the variation of the entanglement bounds for different weight initialisation of the DDQN.We observe the correlation properties of the entanglement bounds dependent on the concurrence of the initial state in Sec.III A 2. In Sec.III A 3, we present the dependence of the entanglement bounds on the number of gates and circuit depth of the ansatz.The analysis of the contribution of individual qubits to the admissible ansatz is provided in Sec.III B. In particular, we present the contribution of each qubit for change in the entanglement of the ansatz in Sec.III B 1. In Sec.III B 2, we explain why RL-VQSD can efficiently find the largest eigenvalues of the initial state but not the smallest eigenvalues.We provide concluding remarks and discuss possible future directions in Sec.IV.

II. CONTRIBUTIONS
Through the following points, we summarise the contribution of the paper.
1. Investigating admissible ansatzes of reinforcement learning based quantum architecture search problem from a quantum information theory perspective.We focus on the RL-VQSD algorithm, where the proposed ansatzes diagonalise random quantum states.
2. Demonstrating that the entanglement of the states generated by the RL-agent-proposed admissible ansatzes lies within a lower and upper concurrence bound.
3. The upper and lower bounds of concurrence remain consistent across various weight initialisation of the DDQN.
4. Bounds are anti-correlated with respect to increasing entanglement until concurrence surpasses 0.322; after this point, the anti-correlation turns into correlation.Hence, we introduce the concurrence 0.322 as the phase change point.
5. Investigation of ansatzes in upper and lower bounds, showing optimal ansatz for VQSD generates high concurrence states.
6. RL-agent requires fewer gates and circuit depth to produce highly entangled admissible ansatzes than weakly entangled ansatzes.We utilise this information to minimise RL-agent and environment (which is the configuration of ansatz) interaction, thereby optimising the convergence time of the RL-VQSD.
7. We evaluate individual qubits' contribution in diagonalising 2-qubit quantum states using conditional quantum entropy.
8. Our investigation shows that the qubits have equal contributions on average in diagonalising random quantum states.
9. We further investigate the correlation in conditional entropy between qubits for different inferred eigenvalues.It requires mild anti-correlation for the first two largest eigenvalues and strong anticorrelation for the smallest eigenvalues.
10.The majority of admissible ansatzes lie in a mild anti-correlation regime, explaining VQSD's limitation in finding the smallest eigenvalues.

III. MAIN RESULTS
In this section, we investigate the upper and lower bounds of concurrence of the ansatzes produced by the RL agent in diagonalising 2-qubit random quantum states sampled from IBM qiskit's random density matrix module.For the simulation, we consider multiple such quantum states (specified in the caption of figures), and for each state, we use 10000 episodes of RL-VQSD to diagonalise the state.The configuration of the RL-VQSD is provided in detail in Appendix B.
A. Entanglement

DDQN weight independent entanglement bound
To benchmark the performance of the RL agent for a specific problem, it is ideal to take multiple initialisation of the DDQN (see Appendix B for a brief description).To efficiently investigate the upper and lower bounds of concurrence [29,30] produced by the RL ansatz, we first sample different random quantum states, then diagonalize them using RL-VQSD [12] for five different initialisation of DDQN weights.Thereafter, the agent suggests different possible configurations for the RL-ansatz, which diagonalizes the sampled random quantum states.For a specific state, we collect such valid quantum circuits and find the concurrence of the state obtained after evolving through the RL-ansatz.It should be noted that finding the concurrence of the evolved state indicates how much the RL-ansatz induces entanglement to the input sampled state.See Appendix C for further analysis of the sampled random quantum states.
The results are presented in Table I.We observe that the optimal configuration of the RL-ansatz always gives us a state with concurrence close to one.Hence, the deviation in the maximum concurrence over different weights of the DDQN is more significant than the deviation in the minima of the concurrence.As seen in Table I, the deviation in the max concurrence is in the order of 10 −3 .The low standard deviation implies that there is a negligible deviation in concurrence across different initialisation of the DDQN for a specific state.This indicates that the bounds on concurrence are invariant for different configurations of the RL-ansatz.Hence, proving that the entanglement bounds of the RL-ansatz are independent of initialisation of DDQNs.Due to this observation, it is sufficient to consider a single initialisation of DDQN to benchmark the outcomes in the remaining simulations.
In the following subsection, we investigate how the upper and lower bounds on the entanglement of the RLansatz depend on the entanglement of the input quantum state.Reciprocal behaviour in the upper and lower bounds of concurrence with respect to the increasing concurrence in the input random state.A clear anti-correlation between entanglement minima and maxima which breaks down after 0.3.From the interpolating function, the Pearson correlation coefficient (PCC) for the first eleven points is -0.982, while that for the remaining points is 0.049.At this point, the correlation after the first eleven points changes from -0.055 to 0.049.There is a phase transition from weak anti-correlation to weak correlation.

Anti-correlation between entanglement bounds
Corresponding to a predefined threshold, the agent proposes different architectures of quantum circuits that diagonalise the quantum state.To observe the amount of entanglement generated by the RL-ansatz we calculate the concurrence of the state after passing through the RL-ansatz, just before the dephasing operation.By sorting out the quantum circuit with maximum and minimum entanglement, we observe their variation with the entanglement of the input state.It has been proved in the previous section that the upper and lower bounds of entanglement are independent of different initialisation of DDQNs; hence, we for further benhamarking we can constrain ourselves to just one initialisation of the DDQN weights.
In Figure 1, we illustrate the variation of the entanglement bounds with the entanglement of the input quantum state.To investigate the dependency, we use the Pearson correlation coefficient (PCC) [31] as a quantifier of correlation.See Appendix A for a brief description.We see that the upper and lower bounds of entanglement are initially strongly anti-correlation when the input state entanglement is below 0.3 and then strongly correlate afterwards, indicating a phase change in the upper and lower bounds of entanglement.The PCC indicating Pearson correlation coefficient is calculated over an interval i < k < j.The coefficient is described as in Equation 1, where i is minimum, j is maximum concurrence, and k splits the intermediate values of concurrence into two parts.Hence, i and j are fixed.The w ik ∈ [0, 1] is the weight of the first interval which defines how much each of the two intervals contributes when varying k.We vary the cardinality of the two intervals, causing variation in the weights.In the figure, we consider i = 0.119 and j = 0.402 and when k = 0.322, i.e. i < x ≤ 0.322, the weight corresponds to the first interval.
Next, we obtain the variation of PCC between the upper and lower bound on the concurrence of the initial state.Specifically, we introduce a parameter k ∈ (i, j) to split the range of initial state concurrence u := [i, j] into two intervals: u 1 := [i, k) and u 2 := [k, j].Also, let us denote the set of lower and upper bounds on the concurrence of the ansatz as u and l, respectively.For both the ranges u 1 and u 2 , we introduce the following correlation function: where w ik ∈ {0, 1} and PCC ik ul , PCC kj ul are the value of PCC between the upper and lower bounds of concurrence of the ansatz for the intervals u 1 and u 2 respectively.Note 1.We observe that depending on the values of w ik , η k w ik can take values either PCC ik ul or PCC kj ul .For η k w ik = PCC ik ul , we obtain the amount of (anti-)correlation between the lower and upper bounds of concurrence of the ansatz for the interval u 1 .While for η k w ik = PCC kj ul , we obtain the amount of (anti-)correlation for the interval u 2 .We leave proving η k w ik is a valid measure of correlation for arbitrary values of w ik ∈ (0, 1) as an open problem.
We present in Figure 2, the variation of η k 0 (in blue) and η k 1 (in red) for different values of k.We observe that η k 1 ≈ −1 ∀k ∈ (i, j).This indicates strong cumulative anti-correlation between the upper and lower bounds of concurrence.For η k 0 , we observe a gradual change of PCC kj ul from -0.582 to 1.We note that η k 0 changes from −0.0451 to 0.279 for k ∈ [0.313, 0.379] with η k 0 = 0 at k = k * = 0.322.We call this point the phase transition point of correlation between upper and lower bounds of concurrence for the VQSD ansatz.At values of k < k * , there is anti-correlation between the upper and lower bounds of concurrence of the ansatz for both the intervals u 1 and u 2 .While for values of k > k * , there is (anti-)correlation between the upper and lower bounds of concurrence of the ansatz for both the intervals (u 2 ) u 1 .

Agent and environment interaction reduction based on entanglement bound
Here, we provide a detailed investigation of the admissible ansatzes produced by the RL-agent in terms of ansatz depth, the total number of 2-qubit gates and the total number of gates in the upper and lower bounds of concurrence.
Our investigation reveals that the admissible ansatzes corresponding to the upper bound of concurrence has, on average, a shorter depth of 13.23 and a smaller number of gates of 18.5 compared to the admissible ansatzes corresponding to the lower bound of concurrence where the depth is 13.9 and number of gates is 19 over 25 random quantum states.Moreover, Figure 3 makes it more prominent that the minimum number of gates and the depth of the admissible ansatzes are smaller for the upper bound of concurrence than the lower.This observation is crucial in reducing the subspace of all possible admissible ansatzes.The main motivation of RL-QAS in VQA is to find ansatzes that solve a VQA with a minimum number of gates and circuit depth; in the case of RL-VQSD, we can narrow down the search for an optimal admissible ansatz near the upper bound of entanglement by encouraging the RL-anent to produce ansatzes with very high concurrence and penalise for the other structures of ansatz.3. We observe that the RL-agent can generate ansatz with very high concurrence with less number of 2-qubit gates and circuit depth compared to the ansatzes with very low concurrence.As the main aim of RL-VQSD is to diagonalise unitary with a small number of gates and depth, our analysis shows that if we particularly focus on admissible ansatzes near the upper bound of concurrence, we can not only reduce the computational cost of RL-VQSD severely but find optimal admissible ansatzes.The numbers in the bars correspond to the amount of entanglement generated by the RL-ansatz when initialised in the vacuum state.

Contribution of individual qubits for change in entanglement
We quantify the contribution of each qubit using the conditional quantum entropy S ρ ′ q 0 q 1 q0|q1 [27,32,33].The conditional entropy of an individual qubit for an N qubit quantum system indicates the amount of uncertainty remaining about that qubit after measuring the whole system.Hence, it is essential to know how much information the state of qubits provides about the state of the N qubits.For a detailed discussion and notation, see Appendix A.
Let the change in the conditional entropy of the qubit q 0 (q 1 ) after evolving through the VQSD ansatz be given by ∆S ρ→ρ ′ q0|q1 (∆S ρ→ρ ′ q1|q0 ).We quantify the relative contribution of individual qubits via the quantity: We note that ∆ q1 q0 > 0 implies a greater change in conditional entropy of q 0 as compared to q 1 , and hence the ansatz has a greater contribution from q 0 as compared to q 1 .Following a similar reasoning ∆ q1 q0 < 0 implies q 1 has greater contribution as compared to q 0 , while ∆ q1 q0 = 0 implies both q 0 and q 1 contributes equally to the ansatz.
We present the variation of ∆ q1 q0 for the change in concurrence due to the ansatz given by ∆ c := C ρq 0 q1 −C ρ ′ q 0 q 1 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 in Figure 4. We observe that the distribution of ∆ q1 q0 is bounded in the interval [−0.7, 0.7] and is uniform with respect to ∆ q1 q0 = 0.This implies an equal contribution to the VQSD ansatz from both the qubits for the range of change in entanglement.In the region of ∆ q1 q0 > 0, the relative contribution of q 0 first increases and then decreases.However, no such pattern is observed for the region of ∆ q1 q0 < 0. When ∆ c > 0.4, the value of ∆ q1 q0 monotonically decreases with increase in ∆ c .Also, for ∆ c < 0.2, ∆ q1 q0 increases monotonically with increase in ∆ c .

Problem with variational quantum state diagonalization
In this section, we investigate the contribution of individual qubits to the performance of the RL-VQSD algorithm.Through this investigation, we get a deeper insight into why the variational quantum state diagonalization algorithm can not find the smallest eigenvalues.
We present the main observations in Figure 5, where we plot the four eigenvalues of 21, 2-qubit Haar random mixed quantum states with the correlation in conditional entropy among the qubits.For an individual quantum state, the RL-agent proposes s different structures of admissible ansatzes.Then we get ρ ′ i states where i ∈ [0, s].Afterwards, we calculate the conditional entropy of individual qubits for each state and take the median. 1 Then, we calculate the correction in conditional entropy among the qubits using PCC.
In Figure 5, we observe that a mild anti-correlation between the qubits is required to find the two largest eigen-values.As the magnitude of the largest eigenvalues decreases, the mild anti-correlation turns into a strong correlation between the qubits.This indicates a strong correlation between the qubits is required to achieve the two largest eigenvalues.Meanwhile, we require a strong anticorrelation between the qubits to find the smallest eigenvalues of the same states.Then, as the magnitude of the smallest eigenvalue increases, the strong anti-correlation turns into mild anti-correlation.FIG. 5.The conditional entropy correlation between the first and the second qubit decreases as the magnitude of the largest and the 2nd largest eigenvalues increases.Meanwhile, the same correlation between the qubits increases with the magnitude of the 3rd and the smallest eigenvalues.The RL-ansatz proposes specific structures of admissible ansatzes, and for each such ansatz, it is not possible to have two different kinds of correlations.Hence, our investigation suggests that it is not feasible to find the smallest eigenvalues with the same ansatz that provides a good approximation of the largest eigenvalues.The correlation of conditional entropy between qubits is calculated using Pearson correlation function (PCC) and is the median over all the configuration of RL-ansatz proposed by the agent.The ⋆ signifies the eigenvalues corresponding to the same state.
We recall that the correlation is calculated among the qubits based on the conditional entropy of individual qubits.As it is not possible to have two different kinds (such as mild anti-correlation and strong anti-correlation at the same time) of correlation between the qubits in the same ansatz structure, we can not find the largest and smallest eigenvalues utilising the same ansatz.For example, we mark the eigenvalues of a state with ⋆, where we observe that to find the largest two eigenvalues, the RL-ansatz need to induce a mild anti-correlation between the two qubits; meanwhile, to obtain the smallest eigenvalues, we require strong anti-correlation.
On the contrary, for the algorithm, as the agent adds a new gate to the RL-ansatz, the correlation in conditional entropy among the qubits increases.That is why it is eas-ier for the RL-ansatz to propose admissible ansatz with mild anti-correlation.We further certify this observation in Figure 6, where we observe the correlation in conditional entropy between qubits for 21 random quantum states.We see that most of the admissible ansatz provide a mild anti-correlation among the qubits, indicating that the VQSD algorithm primarily focuses on finding the largest eigenvalues rather than the smallest ones.Hence, it is easier for the variational diagonalisation algorithm to find the largest eigenvalues more efficiently than the smallest ones.

IV. DISCUSSION
In this article, we use tools from quantum information theory to analyse the admissible ansatzes proposed by the RL-agent for solving reinforcement learning-based quantum architecture search problems, and we particularly investigate the RL-agent proposed ansatzes in solving recently proposed RL-VQSD problem.
We observe that the concurrence of the admissible ansatz ranges between an upper and lower bound, which are independent of the initialisation of the weights of the deep-Q network.The upper and lower bounds are initially anti-correlated with respect to the initial state entanglement.The anti-correlation turns into a mild and eventually strong correlation as the entanglement of the initial state surpasses beyond a phase change point at concurrence 0.322.We also see that the optimal configuration of the admissible ansatz lies in the upper bound of concurrence, which has a smaller circuit depth and re-quires fewer gates than the lower bound.These observations provide insight regarding the relation between the entanglement of the initial state and the entanglement generated by the RL-VQSD admissible ansatzes.We effectively utilise these observations to greatly reduce the RL-agent and environment interaction, i.e., the computational time needed to solve the RL-VQSD problem.Additionally, the admissible ansatzes not only diagonalize the random quantum states with high accuracy but can be used to generate close to maximum entangled states from the vacuum state.
Furthermore, we quantify the contribution of each qubit in the admissible ansatz using conditional quantum entropy.For the admissible RL-VQSD ansatz, we observe an equal contribution from each qubit for the full range of change in the entanglement of the state due to the ansatz.Focusing on the task of obtaining the eigenvalues of the starting state, we observe that a strong correlation between the qubits of the admissible ansatz is required to obtain the two largest eigenvalues, while strong anti-correlation between the qubits is needed to find the smallest eigenvalues of the same state.Noting that it is impossible to have both strong correlation and anti-correlation between the qubits of the same ansatz, we conclude that we cannot simultaneously obtain both the largest and smallest eigenvalues from the RL-VQSD admissible ansatz.From the observation, most of the admissible ansatz has mild anti-correlation among the qubits, which explains why it is easier for the diagonalisation algorithm to find the largest eigenvalues than the smallest ones.
For future work, it would be interesting to see the implication of introducing entanglement [34] or nonlocality [35,36]-based reward functions of the RL-agent for solving the VQSD problem.Specifically, given that most of the admissible ansatzes have high values of concurrence, it is expected that an entanglement or nonlocality-based reward function may further reduce the time required to find the optimal configurations.Also, we may expect this to provide an optimal quantum circuit that provides both the lowest and highest eigenvalues of the input quantum state.is calculated.The goal of the RL-agent is to reach the minimum error for a predefined threshold ζ (which is a hyperparameter of the model) by optimising the parameters of the U ( ⃗ θ) using the COBYLA optimiser.The cost function at each step t is calculated for the ansatz which outputs a state ρ t ( ⃗ θ).
Appendix C: Analysis of the Haar random input quantum states In this section, we analyse the 2-qubit quantum states sampled from random density matrix of IBM qiskit's quantum info module.Here, we sample 100000 Haar random quantum states and observe how the eigenvalues and the qubits' conditional entropy change with the state's entanglement.
We see that the difference in conditional entropy in qubits of the input state varies evenly on either side of S ρ AB AB = 0 where A and B are subsystems containing one qubit each and ρ AB is the sampled Haar random state.Meanwhile, the area of the distribution of S ρ AB AB on the ether side of zero entropy converges towards zero as the input state entanglement increases.Furthermore, the magnitude of the input quantum state's largest eigenvalue increases with the increase in entanglement, and all the other eigenvalues converge towards 0.
FIG.1.Reciprocal behaviour in the upper and lower bounds of concurrence with respect to the increasing concurrence in the input random state.A clear anti-correlation between entanglement minima and maxima which breaks down after 0.3.From the interpolating function, the Pearson correlation coefficient (PCC) for the first eleven points is -0.982, while that for the remaining points is 0.049.At this point, the correlation after the first eleven points changes from -0.055 to 0.049.There is a phase transition from weak anti-correlation to weak correlation.
FIG. 2.The PCC indicating Pearson correlation coefficient is calculated over an interval i < k < j.The coefficient is described as in Equation1, where i is minimum, j is maximum concurrence, and k splits the intermediate values of concurrence into two parts.Hence, i and j are fixed.The w ik ∈ [0, 1] is the weight of the first interval which defines how much each of the two intervals contributes when varying k.We vary the cardinality of the two intervals, causing variation in the weights.In the figure, we consider i = 0.119 and j = 0.402 and when k = 0.322, i.e. i < x ≤ 0.322, the weight corresponds to the first interval.

6 −FIG. 4 .
FIG. 4. Relative change in the contribution of qubits 0 and 1 as a function of change in entanglement.Weight of the positive part 5.524 and that of the negative part is −4.854.

FIG. 6 .
FIG.6.The correlation in conditional entropy among qubits is in the range of mild correlation for 21 random quantum states.This observation, along with the observation in Figure5certify that RL-agent only proposes a particular configuration of admissible ansatz that can find only the largest eigenvalues.The average correlation among all the 21 states is −0.4263, which is in the regime of mild anti-correlation.

FIG. 7 .
FIG.7.The difference in conditional entropy converges towards the zero as the entanglement of the sampled input state increases.Meanwhile, as the input state gets closer to maximally entangled state (with concurrence 1), the magnitude of the largest eigenvalue increases while the other eigenvalue decreases.

TABLE I .
In this table, we present the average and standard deviation of the concurrence of the final state after evolution through the circuit for five different initialisations of the DDQN wrights for different quantum seed states.