PairDiag: an exact diagonalization program for solving general pairing Hamiltonians

We present a program for solving exactly the general pairing Hamiltonian based on diagonalization. The program generates the seniority-zero shell-model-like basis vectors via the `01' inversion algorithm. The Hamiltonian matrix is constructed in that seniority-zero space. The program evaluates all non-zero elements of the Hamiltonian matrix"on the fly"using the scattering operator and the search algorithm that act on the generated basis. The matrix is diagonalized by using the iterative Lanczos algorithm. The program thus developed, PairDiag, can calculate efficiently the ground-state eigenvalue and eigenvector of any pairing Hamiltonian. The program can be easily implemented to replace the BCS approximation in standard self-consistent mean-field calculations. The code is parallelized using OpenMP. For larger systems with dimension around 10$^{8-9}$, the calculation can be done within a day on standard desktop computers.


Introduction
It is well established that the pairing correlation is an essential ingredient in describing the ground-state properties of finite atomic nuclei, and the solving of the pairing Hamiltonian is important for describing not only nuclei but also other many-body systems [1] including neutron stars [2], superconductors [3], and trapped twocomponent Fermi gases [4].The nuclear pairing theory based on the Bardeen-Cooper-Schrieffer (BCS) approximation [5] was considered for the first time 60 years ago [6].But the BCS condensate with indefinite particle numbers is problematic in weak pairing finite systems, and the pairing condensate around shell closures will collapse.To overcome those drawbacks, there have been extensive efforts in developing particle-numberconserved approximations (see, Ref. [7] and references therein) and exact pairing models which include the diagonalization [8,9,10,11,12,13] and the Richardson (or the Richardson-Gaudin) method [14,15,16].In 1966, Richardson [17] studied the exact and the BCS solutions of the standard pairing Hamiltonian with Ω = 8 ∼ 32 at half-fillings, and concluded that the BCS model strongly underestimates the pairing correlations even for relatively large pairing strength.
The Richardson algebraic approach can be applied for large systems but is limited to the standard pairing Hamiltonians with constant pairing force.Therefore, it is important to develop algorithms that can handle the general pairing Hamiltonians.The most straightforward method for the numerically exact solution is to diagonalize the Hamiltonian in configuration spaces of fixed seniority [9,12].In diagonalization, the number of particles and orbits that can be included is usually limited due to the rapid growth of the dimension.In our previous calculations with exact diagonalization [10,11], the dimension of the problem is limited to 10 6 on standard desktops.The limitation is related to the inefficient generation and storage of the basis vectors and the high time complexity of the matrix operations.The limited capacity of the diagonalization makes it challenging to perform realistic calculations and to compare the results with those of BCS or other approximate approaches.
In this study, we developed an efficient diagonalization program, PairDiag, for solving the general pairing Hamiltonian in the doubly-degenerate deformed system.We have applied in the code a novel basis generation algorithm, dynamic evaluation of the non-zero Hamiltonian matrix elements, and the Lanczos [18] algorithm for diagonalization.The system with oddnumber of particles can also be treated within the blocking approximation.The program is optimized using OpenMP parallelization and packaged in a fortran module which allows it to be easily combined with existing nuclear structure programs (e.g., self-consistent mean field codes EV8 [19] and Sky3D [20]) as an alternate to the problematic BCS solver.The vector generation and search algorithms in the program can also be transplanted to other programs including the large-scale shell model code.

The General Pairing Hamiltonian
The general pairing Hamiltonian acting in doublydegenerate time-reversed states is given by where ǫ i are the single-particle energies and V ii ′ are the orbit-dependent pairing interaction strength.a † i and a i is the particle creation and annihilation operator, respectively.The Hamiltonian with V ii ′ = G, a constant strength, is usually called the standard pairing Hamiltonian.For the convenience of description, we will use the orbit i to refer to the time-reversed states i and ī, and treat two particles distributed in the time-reversed states as one pair.We can then define S + i = a † i a † ¯i and S − i = a ¯ia i as the pair creation and annihilation operator of orbit i.For a system of even-number paired particles in the finite space with Ω orbits, the Hamiltonian can be written as S + i S − i and S + i S − j can be understood as the pair number and the scattering operator.V ii is the diagonal pairing element which is sometimes referred to the self-energy of a pair.For an odd-mass system with only one unpaired particle, the odd particle blocks the pairs scattering into the orbit occupied by itself due to the Pauli principle (blocking effect).The space can be expressed as the direct sum of seniority-one subspaces corresponding to different blocked orbits, and the Hamiltonian can be given by where Ĥb is the Hamiltonian for the subspace in which orbit b is blocked.

Principles of the Method
In the present program, we solve the pairing Hamiltonian via diagonalization to get the ground-state eigenvalue and eigenvector.This shell-model style approach can be divided into three parts, first generating the basis with fixed seniority, then constructing and diagonalizing the Hamiltonian matrix.In the following content, we focus on the Hamiltonian in Eq. 2 for the even-mass seniority-zero system, while the Hamiltonian in Eq. 3 for the odd-mass seniority-one system can be solved based on the even-mass system.

Basis Generation
Let's consider an even-mass system with m pairs and n orbits (m ≤ n).In the seniority-zero scheme, the basis consists of all possible Slater determinants of m identical pairs distributed in n different orbits.Each determinant can be represented by a binary word in the computer, while each bit of the word being associated to an individual orbit, with a value of 1 or 0 depending on whether the orbit is fully occupied.A set of all binary numbers with m occupied bits distributed in the first n digits is equivalent to a seniority-zero space, and the dimension of the space from the binomial coefficient is The definition of extension to n < m will be used in the vector hash search.For the case where 2 pairs occupy 4 orbits, a set of 6 binary numbers from 0011B to 1100B can be used to represent the space.
To generate all the required binary numbers efficiently, an iterative combination algorithm, '01' inversion algorithm, is used.Each iteration takes an integer as input and searches from its lowest binary digit until the 2 adjacent bits with pattern '01' is found, then the found '01' will be inverted to '10', and all bits '1' below the turned '10' will be moved to refill this number from the lowest digit.After the two steps, a larger integer is obtained which will be the input for the next iteration.The pseudocode shown in Algorithm. 1 is an efficient implementation based on bit operations.Since one iteration only calculates one next larger integer while keeping the total number of occupied bits conserved, the minimum and the maximum integer in the space representing the start and the end of the iteration must be specified in advance, and the remaining numbers will be calculated iteratively from the minimum.
For the previous example of C 2  4 with the minimum 0011B and the maximum 1100B, the iteration should start at 0011B and end when the output reaches 1100B.In the first iteration, the '01' in the 2nd and 3rd digits of 0011B should be inverted to '10' to get 0101B, because the bit '1' in the 1st digit is already at the lowermost, the output is 0101B.In the same way, 0110B is the second output.For the input 0110B, 1010B will be obtained by inverting the '01' and then the bit '1' in the 2nd digit needs to be moved to the lowermost to get the 1001B.Iteratively, 1010B and 1100B will be created in order, then the calculation should be terminated since 1100B Algorithm 1 '01' inversion algorithm.BTEST(), IB-CLR(), and others refer to the Fortran intrinsic bit manipulation functions Input: integer I in Output: integer I out has reached the maximum of the space.With five iterations starting from 0011B, all six numbers summarized in Table 1 are obtained, and the index are assigned according to the order of generation, which is also the order of the values in the space.
For a space of dimension n, the time complexity of the '01' inversion algorithm over the entire space can be roughly estimated as a linear order O(n).In PairDiag code, a 64-bit integer is used to represent a valence vector, so the total number of orbits allowed is less than 63 after excluding the sign bit.In the calculation, all the integers created are stored in an 1D array.Each element in the array, like in Table 1, has two properties, one is the binary value representing the wave function |i , and the other is the index number representing the position i.Since the array generated by the algorithm is strictly in ascending order and obeys a special combination rule, in addition to extracting the wave function |i directly from the given index i, the index i of any element can also be calculated from its binary value |i .

Vector Search
For locating the index i for a element |i in the basis array, the program provides two search methods for different situations: the binary search and the hash search.The binary search can always be used for a sorted array.The search process starts from the middle element of the array.If the middle element is exactly the element to be found, the search process ends, if the element is greater than (or less than) the middle element, the search is performed in the half of the array that is greater than (or less than) the middle element, and starts with the "sub middle" element as before.This search algorithm makes full use of the order relationship between the elements in a divide and conquer strategy by halving the search range after every comparison.The search task can be completed within O(log n) in the worst case.The use of binary search in the program is when the total number of base vectors is less than C m n due to the truncation, which will be introduced later.
Without truncation, the total element number is equal to C m n , and the structure of array remains intact, a more efficient hash search from a function f (|i ) = i is used in the program.To write the hash function, we use O m to represent the orbit number for the m th occupied orbit in the vector |i .For example 11010B, there are three occupied bits, O 1 for the first occupied bit is 2, O 2 for the second one is 4, O 3 = 5 is for the third one.With these definitions, we can represent the function f (|i ) for a vector |i with m occupied bits as The time complexity of this hash search algorithm is O(1).

Matrix Construction
With the basis generated and the search algorithms provided, we can now construct the sparse Hamiltonian matrix in an efficient way by evaluating all the non-zero matrix elements directly.The diagonal elements in the Hamiltonian matrix are usually non-zero, and the value of elements H i,i = i| Ĥ|i determined by the first item of the Hamiltonian in Eq. 2 is Only when the orbit n in vector |i is fully occupied, the value of i|S + n S − n |i will be 1, otherwise 0. The value of non-diagonal elements H i, j = H j,i is determined by the second term of the Eq. 2. For a vector |i with index i in a space of C m n , if we mark one of the m occupied orbits as O and one of the n-m empties as E, then "scatter" the pair from O to E with the pair scattering operator to form a new vector | j = S + E S − O |i , the matrix element i| Ĥ| j = V EO will be non-zero if V EO 0. The position of this element (i, j) in the matrix can be obtained by searching the index j of the vector | j .Combining the different O and E in |i , the total number of such | j and also the non-zero H i, j is m(n − m).In the PairDiag code, binary search or hash search is used to locate index of different | j .Through the simple bit operations and search, we can evaluate all the non-zero elements in the Hamiltonian matrix directly.Still using the previous example in Table 1 with assigning single-particle energies from 1 to 4, and using the constant -0.2 as the overall pairing interaction strength.The Hamiltonian can be expressed as a 6×6 real symmetric matrix.For the first row, the diagonal element referring to Eq. 5 is H 1,1 = 0011| Ĥ|0011 = (2×1-0.2)+(2×2-0.2) = 5.6.The position of the 4 non-diagonal elements with the element values -0.2 are (1,3) . The final matrix is

Matrix Diagonalization
For diagonalization, the Lanczos [18] algorithm appears as the most suitable method since only the first few states of pairing Hamiltonian are needed.As a simplification of Arnoldi method [21] for the Hermitian matrix, the principle of the Lanczos (in Algorithm.2) is a projection technique on a Krylov subspace [21].From a starting vector p, a new vector q i is generated in each iteration, and these Lanczos vectors are needed when performing reorthogonalization and Rayleigh-Ritz projection.Usually high-quality results require a large number of iterations, but the computer memory to store the vectors also grows as the iteration number increases.A restart method can avoid the difficulty by limiting the maximum number of iterations, and when reaching the maximum, the process is restarted with new starting vectors.Since Algorithm.2 can start with only one vector, the most straightforward way is to use the groundstate Ritz vector.
In the PairDiag program, the Lanczos [22] + QR [21] algorithm is performed.The default starting vector is [1, 0, • • • , 0] T .During iterations, reorthogonalization to all Lanczos vectors through the Gram-Schmidt procedure is used to cure the loss of orthogonality.The maximum number of iterations is an adjustable parameter Lanc Limit.When reaching the maximum, The QR [21] algorithm is performed to calculate the Ritz pairs.According to the user's choice, the program can Algorithm 2 Lanczos iteration.Input: starting vector p Output: Lanczos vectors q i , tridiagonal matrix L return the ground state of this subspace without restart.and can also perform Restart Lanczos, in which the calculation will be restarted by the ground-state Ritz vector.The Hamiltonian matrix is mainly used for matrixvector multiplication in the Lanczos iterations, all the non-zero matrix elements are calculated dynamically without taking up much memory, and this on the fly approach also reduces the time complexity since the matrix is sparse.

Truncation
One will need to truncate the model space if the dimension becomes too large to be handled efficiently on a desktop.Several algorithms can be considered to help truncate the model space including the so-called importance truncation approach (see, e.g., [23]).In the present program we have implemented a simple truncation algorithm by ordering all basis vectors accordingly the values of the corresponding diagonal matrix elements and exclude all the basis vectors with diagonal matrix elements above certain cutoff factor.The users only need to define a maximum dimension to be considered, Dimension Limit.If the full-space dimension C m n exceeds the value of Dimension Limit, the program will truncate the space to the desired number by excluding vectors with the highest values of the diagonal matrix elements.When truncation is required, the program will calculate the maximum and minimum diagonal elements of the Hamiltonian matrix, and fill all the diagonal elements into the histogram form the minimum to the maximum with 10 8 bins.A cutoff truncation value is obtained by counting the bases bottom up until the number of diagonal elements below this value is approximately equal to Dimension Limit.Finally, all vectors with diagonal elements greater than this cutoff value will be excluded.After implementing truncation, the vector space is no longer complete and the vector can only be located by the inefficient binary search.

Description of the Code
The PairDiag code is written in Fortran 95 and packaged in a Fortran module called PairDiag.The use of the module requires three steps.

step 1: Initialize the Inputs
4 public variables represent the inputs of the module must be explicitly initialized by the user before the calculation.The first two determine the space, and the last two define the Hamiltonian matrix elements.
• N Orbit: Integer(kind=8).The number of orbits included which should not exceed 63. • N Pairs: Real(kind=8).The number of pairs in the system which should not exceed the value of N Orbit.For even-mass systems, N Pairs should be an integer (N Pairs = 3 for 6 particles).For oddmass systems, half integers are expected (N Pairs = 3.5 for 7 particles).• SPE: Real(kind=8),dimension(63).1D array for the single-particle energy of each orbit, the first N Orbit elements will be used.• P F: Real(kind=8),dimension(63, 63).2D array for the pairing interaction strength between the involved orbits, the first N Orbit × N Orbit elements should be initialized in a real symmetric manner.
There are four parameters that can be optionally adjusted in the source code, PairDiag.f90.
• Lanc Limit: Integer(kind=8).The step size of the Lanczos algorithm, the default value is 50 and the recommended range is between 10 and 50.• Lanc Error: Real(kind=8).The convergence control of the Restart Lanczos algorithm, the default value is 1×10 −5 , which meets general accuracy requirements.• Dimension Limit: Integer(kind=8).The dimension limit of the valance space, the default value is 1×10 9 .Truncation will be applied if the dimension exceeds the number.• Print Mode: Integer(kind=1).Only when the value is 0 (default), the program will print calculation information on the terminal.

step 2: Call the Subroutine
There is only one public subroutine that can be called in the PairDiag module.

• Diag Sovler([Mode], [Block]):
The subroutine calculates the pairing Hamiltonian in Eq. 2 or Eq. 3 depending on the input N Pairs.For even-mass systems, all the algorithms described above for the basis and matrix will be used.For odd-mass systems, the program calculates the Ĥb in Eq. 3 based on the method of even-mass systems.
The subroutine Diag Sovler([Mode], [Block]) can optionally accept two integer parameters.Mode affects the process of Lanczos, while Blocks only affect how odd systems are handled.If the user want to use the parameter Block, the first parameter Mode must be explicitly initialized.
• Block = 0 (default): Integer(kind=4).For oddmass systems, the program calculates all the possible Ĥb in Eq. 3, and the solution of the subspace with the lowest ground-state energy is returned as the final result.
For odd-mass systems, the program only calculates ĤBlock and return the result.

step 3: Analyze the Outputs
After the calculation, the information of the ground state and the Hamiltonian matrix will be stored in the following public variables.Since the details of the eigenvector are not essential and the amount of data is usually large, only the corresponding occupation numbers are saved, and also the quantity that can be derived from the occupation numbers is not given.

A simple program example for the standard pairing
Hamiltonian with the PairDiag module can be found in the Appendix A. Users can also modify the program according to their own requirements.A brief description of the variables and subroutines in the module can refer to the Appendix B.

Parallelization and Compilation
The "0" inversion algorithm for generating base vectors cannot be easily parallelized due to its iterative nature.The parallelization of the program is mainly performed in the process of matrix construction and diagonalization.In the present program, only OpenMP [24] parallelism has been implemented in order to facilitate the implementation of the code into other nuclear structure programs.The code runs in OpenMP parallel mode by default after being compiled with the -fopenmp option in the provided Makefile.The number of parallel threads is not set by the code, so the user can set the environment variable OMP NUM THREADS to the desired number.The PairDiag code has been tested under both the ifort and gfortran compilers in the Linux system, and we recommend the ifort compiler due to the higher efficiency shown.

Discussion
We now briefly discuss the performance of the program in specific calculations.The reference machine is a desktop computer with an Intel Core i7-7700K 4.2GHz×8 CPU and a total of 47GB memory.The compiler used is the Intel Fortran compiler (ifort version 19.0.0.117) under the Ubuntu 16.04 system.In the following calculations, single-particle energies take integers incremented from 1, and the constant pairing interaction strength V i j = G are used.The constant pairing interaction strength here is mainly for the convenience of comparison.In actual calculations, the matrix can take the general orbit-dependent form.

Iteration and Convergence
For the Lanczos as a projection-based approach, the larger subspace formed by more iterations often leads higher quality results.Fig. 1 shows the convergence of the subspace ground-state eigenvalues (Energy Ground) as a function of the number of iterations (Lanc Limit) without restart (Mode = 1).In the case where only the ground state is desired, it generally takes about 50 iterations to achieve the convergence with good accuracy.For the calculation with dimension N and iteration R, the memory needed to store the basis and Lanczos/Ritz vectors is about 8N(R+2)×10 −9 GB in total, which means at least 41.6GB of memory is required for N = 10 8 and R = 50.In the program, the iteration number can be adjusted according to the local memory condition.But a larger value is recommended whenever possible.When the value of Lanc Limit is   small, the accuracy of a single calculation becomes poor, and a restart strategy will be necessary.
In the restart mode (Mode = 0), the convergence condition is | β i /α i | ≤ Lanc Error.Since the Lanczos is restarted with the ground-state Ritz vector, the ground state will converge first with | β 1 /α 1 | reaching the threshold in a restart.| β 1 /α 1 | represents the quality of the ground-state eigenvector which can be expanded as The closer q 1 is to the ground-state, the smaller the | β 1 /α 1 | will be.Fig. 2 shows the changes in | β 1 /α 1 | after the first restart corresponding to different Lanc Limit with G = −0.4.For Lanc Limit = 50, | β 1 /α 1 | after the first restart is far less than the default threshold, so the iteration will be simply terminated.When Lanc Limit is small, the Lanczos may be restarted in several rounds until convergence.

Comparison with Other Programs
Below we show the numerical accuracy of the Pair-Diag module by comparison with other programs.First, we use the Lapack package as a reference.In the space of C 8  16 with dimension 12870, we compared the results of the ground states between the two packages with different pairing interaction strength but fixed singleparticle energies incremented from 1.In calculations, Lanc Limit was 50, and Lanc Error was 10 −5 .In Table 2, we present the ground-state eigenvalues from the two packages with different strength G.For the eigenvectors in PairDiag, the user can access Q Matrix in the subroutine Result Output() described in Appendix B. The negligible difference between the results indicates that the calculation of PairDiag is very reliable.
Calculations with large dimension is not accessible with Lapack due to memory limitations.For the stan-   dard pairing problem, the eigenvalues can also be obtained by numerical algorithm based on the Richardson approach [14].We have developed a very efficient and robust solver for the Richardson equation [25,26].In Table 3, we present the ground-state eigenvalues from the PairDiag package and our Richardson solver for systems with different dimensions and with G = −0.4.In the calculation, Lanc Limit was set to 50 for the first three calculations, 42 and 7 for the last two due to memory limitations.As shown in Table 3, the difference between the two estimates is small even for systems with large dimensions.

Running Time
The most time-consuming part in the program is the matrix-vector multiplication during Lanczos iterations.Therefore, the running time of the entire calculation is mainly determined by the total number of iterations and the time for the matrix-vector multiplication per iteration.The total number of iterations can vary depending on interactions, spaces, and also the user's choice of error tolerance.The time cost of a single iteration is expected to be proportional to the total number of nonzero elements in the Hamiltonian matrix.For the space  of C m n , the number of non-zero matrix elements is 3 represents the relationship between the CPU time per iteration and the number of non-zero matrix elements from about 200 different calculations with the hash search, in which a good linearity is shown.
Fig. 4 shows the running time of a single iteration as a function of dimension under the hash search algorithm from dimension 10 4 to 10 8 .The fitting curve in the figure, y = 2.054×10 −6 x 1.104 , can be used as an empirical formula to estimate the CPU time per iteration.The CPU time is not the actual clock time.One iteration with dimension 1.5×10 8 takes about 2268 seconds of CPU time, but for the case of eight cores in parallel, it only takes about 290 seconds actually, which is about one-eighth of the CPU time.The binary search is used in the program under truncations, it is more time consuming and usually takes about three times longer than the hash search as shown in Fig. 5.

Convergence of the truncation calculation
When the dimension of the system to be calculated far exceeds the memory limit, the truncation method can be used to reduce the memory demand.The deviation of the truncated space calculation from that of the full space depends on the strength of the interaction and the level of the truncation.Fig. 6 shows the results of the truncated calculation under different interaction strength.As can be seen from the figure, the calculation with weaker interactions can withstand more truncations.The calculated eigenvalues are more sensitive to truncation than the occupancy numbers from the corresponding eigenvectors.

Summary
In summary, we presented an efficient algorithm for solving the eigenproblem of the general pairing Hamiltonian based on diagonalization in the seniority-zero or seniority-one space.We presented an efficient algorithm for solving the eigenvalue problem of the general pairing Hamiltonian based on diagonalization in the seniority-zero space.Basis vectors are generated by the '01' inversion algorithm.All the non-zero elements of the Hamiltonian matrix are evaluated on the fly.The restart Lanczos algorithm is used for the diagonalization.The present code is parallelized with OpenMP.The program can perform calculations on both the evenmass and odd-mass systems.The program also provides adjustable parameters for flexibility and a simple truncation method.The program can replace the BCS approximation in self-consistent iterative Hartree-Fock-BCS calculations.For the large space with Ω ∼ 30, the calculations can be done within a few hours.

Acknowledgement
The support to X. Arrays: • SPE: The 1D arrays for single-particle energies.
• P F: The 2D array for pairing strength.
• B Array: The 1D array for the basis vectors.
• C Array: The 2D array for binomial coefficients in hash search.

Figure 1 :
Figure 1: Convergence of the ground-state eigenvalue as a function of iterations with G = −0.8(left red Y-axis, red curve, and red solid square dots) and G = −0.2(right blue Y-axis, blue curve, and blue solid circle dots) for the space of C13  26 with dimension about 10 7 .The inset shows the difference between results from two consecutive iterations.

Figure 3 :
Figure 3: The CPU time per Lanczos iteration as a function of the total non-zero matrix elements number with the hash search.The solid points are from the measurements and the dotted curve, y = 6.469×10 −8 x, is the result of fitting.

Figure 4 :Figure 5 :
Figure 4: The CPU time per Lanczos iteration as a function of the dimension with the hash search.The dotted curve, y = 2.054×10 −6 x 1.104 , is the result of fitting.

Figure 6 :
Figure 6: Convergence of the calculation as a function of truncation level for a system with full dimension C 10 30 ∼ 3.0 × 10 7 with different interaction strength G. Left panel shows the convergence of the ground-state energy as a function of the dimension of the truncated model space.Right panel shows the changes of occupation numbers for different orbitals under different truncations where N corresponds to the full space dimension.
Y. Liu provided by China Scholarship Council of No. 201700260183 during his study in Sweden is acknowledged.• Lanc Error: In restart mode.the convergence condition in | β i /α i | ≤ Lanc Error.• Dimension Limit: Dimension limit for truncation.• N Orbit and N Pairs: The input number of orbits and pairs.• Total and Occup: The number of orbits and pairs used in the calculation.• B Dimension and L Dimension: The Dimension of the space and step size of the Lanczos.• Convergence and Truncated: Flags for convergence and truncation.• Run Mode, Block Mode, and Print Mode: Flags for run, block, and print.• Energy Blocked: Single-particle energy of the blocked orbit.• Energy Ground: The output ground-state eigenvalue.• Monopole Min and Monopole Max: The minimum and maximum of the diagonal elements.• Posit Min: The position of the vector with the minimum diagonal element.
• Q Matrix: The 2D array for the Lanczos/Ritz vectors.• L Matrix: The 2D array for the Lanczos Matrix and eigenvalues.• N Occup: The 1D array for occupation numbers.• Monopole Hist: The temporary 1D array for truncation histogram.• I Vector and Q Vector: The temporary 1D array for Lanczos.• O Array and V Array: The temporary 1D array for vector search.• T Matrix and P Matrix: The temporary 2D array for QR.Subroutines and functions: • Diag Solver(Mode, Block): The only public subroutine that analyzes the input.• Combin Num(N, M): The function retruns the value of C M N according to Eq. 4. • Next State(State): The subroutine operates the input (State) according to the '01' inversion algorithm.1. • Monopole E(State): The function retruns the diagonal element value of input (State) according to Eq. 5. • Trun State(State): The function retruns logical .TURE.if Monopole E(State) ≤ Monopole Trun.• Vector Initialize() and Vector Restart(): The subroutines that initialize the starting vector of Lanczos to [1, 0, • • • , 0] T and Q Matrix(1, :).• Bina State(D, L) and Hash State(D, L): The subroutines that calculate non-zero matrix elements and positions related to the input State using binary and hash search.• Lanczos Iteration(): The subroutine for Lanczos iteration from starting vector I Vector.• QR Decompose(): The subroutine for QR decompose to the L Matrix.• Even System() and Odds System(): The subroutine for the even-mass and the odd-mass system calculation.• Lanczos Restart(): The subroutine that combines the Lanczos Iteration() and QR Decompose() in Restart Mode.• Initialize(): The subroutine that allocates memory for dynamic arrays and initializes basis vectors.• Results Output(): The subroutine that calculates the occupation numbers and other outputs.• Destory(): The subroutine that releases all dynamic memories.

Table 1 :
Index and binary values of all integers in the space of 2 pairs in 4 orbits, The decimal values shown displays the ascending order.

Table 2 :
Numerical comparisons of PairDiag and Lapack.G are the constant pairing interaction strength, E PairDiag and E Lapack are the ground-state eigenvalues, ∆ vector are defined as |V 2 PairDiag (i) − V 2 Lapack (i)|, where V PairDiag and V Lapack are the calculated groundstate eigenvectors.

Table 3 :
[26]nd-state eigenvalues for the standard pairing problem with G=-0.4 from the PairDiag calculation and the polynomial algorithm solution of the Richardsons equations[26].