A hybrid classical-quantum workflow for natural language processing

Natural language processing (NLP) problems are ubiquitous in classical computing, where they often require significant computational resources to infer sentence meanings. With the appearance of quantum computing hardware and simulators, it is worth developing methods to examine such problems on these platforms. In this manuscript we demonstrate the use of quantum computing models to perform NLP tasks, where we represent corpus meanings, and perform comparisons between sentences of a given structure. We develop a hybrid workflow for representing small and large scale corpus data sets to be encoded, processed, and decoded using a quantum circuit model. In addition, we provide our results showing the efficacy of the method, and release our developed toolkit as an open software suite.


I. INTRODUCTION
Natural language processing (NLP) is an active area of both theoretical and applied research, and covers a wide variety of topics from computer science, software engineering, and linguistics, amongst others. NLP is often used to perform tasks such as machine translation, sentiment analysis, relationship extraction, word sense disambiguation and automatic summary generation [1]. Most traditional NLP algorithms for these problems are defined to operate over strings of words, and are commonly referred to as the "bag of words" approach [2]. The challenge, and thus limitation, of this approach is that the algorithms analyse sentences in a corpus based on meanings of the component words, and lack information from the grammatical rules and nuances of the language. Consequently, the qualities of results from these traditional algorithms are often unsatisfactory when the complexity of the problem increases.
On the other hand, an alternate approach called "compositional semantics" incorporates the grammatical structure of sentences from a given language into the analysis algorithms. Compositional semantics algorithms include the information flows between words in a sentence to determine the meaning of the whole sentence [3]. One such model in this class is "(categorical) distributional compositional semantics", known as DisCoCat [4][5][6], which is based on tensor product composition to give a grammatically informed algorithm that computes the meaning of sentences and phrases. This algorithm has been noted to potentially offer improvements to the quality of results, particularly for more complex sentences, in terms of memory and computational requirements. However, the main challenge in its implementation is the need for large classical computational resources.
With the advent of quantum computer programming environments, both simulated and physical, a question may be whether one can exploit the available Hilbert space of such systems to carry out NLP tasks. The Dis-CoCat methods have a natural extension to a quantum mechanical representation, allowing for a problem to be mapped directly to this formalism [5]. Using an oraclebased access pattern, one can bound the number of accesses required to create the appropriate states for use by the DisCoCat methods [7]. Though, this requires the use of a quantum random access memory, or qRAM [8,9]. Currently, qRAM remains unrealised, and expectations are that the resources necessary to realise are as challenging as a fault tolerant quantum computer [10]. As such, it can be useful to examine scenarios where qRAM is not part of the architectural design of the quantum circuit. This will allow us to examine proof-of-concept methods to explore and develop use-cases later improved by its existence.
In this paper we examine the process for mapping a corpus to a quantum circuit model, and use the encoded meaning-space of the corpus to represent fundamental sentence meanings. With this representation we can examine the mapping of sentences to the encoding space, and additionally compare sentences with overlapping meaning-spaces. We follow a DisCoCat-inspired formalism to define sentence meaning and similarity based upon a given compositional sentence structure, and relationships between sentence tokens determined using a distributional method of token adjacency. This paper will be laid out as follows: Section II will give an introduction to NLP, the application of quantum models to NLP, and discuss the encoding strategy for a quantum circuit model. Section III will discuss the preparation methods required to enable quantum-assisted encoding and processing of the text data. Section IV will demonstrate the proposed methods using our quantum NLP software toolkit [11] sitting atop Intel Quantum Simulator (IQS) [12]. For this we showcase the methods, and compare results for corpora of different sizes and complexity. Finally, we conclude in Section V.

II. NLP METHODS
One of the main concerns of NLP methods is the extraction of information from a body of text, wherein the data is not explicitly structured; generally, the text is meant for human, rather than machine, consumption [13]. As such, explicit methods to infer meaning and understand a body of text are required to encode such data in a computational model.
Word embedding models, such as word2vec, have grown in popularity due to their success in representing and comparing data using vectors of real numbers [14]. Additionally, libraries and toolkits such as NLTK [15] and spaCy [16] offer community developed models and generally incorporate the latest research methods for NLP. The use of quantum mechanical effects for embedding and retrieving information in NLP has seen much interest in recent years [17][18][19][20][21][22][23].
An approach that aims to overcome the ambiguity offered by traditional NLP methods, such as the bag-of-words model is the categorical distributionalcompositional (DisCoCat) model [4,5]. This method incorporates semantic structure, where sentences are constructed through a natural tensoring of individual component words following a set of rules determined from category theory. These rule-sets for which sentence structures may be composed are largely based on the framework of pre-group grammars [24].
The DisCoCat approach offers a means to employ grammatical structure of sentences with token relationships in these sentences. Words that appear closer in texts are more likely to be related, and sentence structures can be determined using pre-group methods. These methods can easily be represented in a diagrammatic form, and allow for a natural extension to quantum state representation [6]. This diagrammatic form, akin to a tensor network, allows for calculating the similarity between other sentences. This similarity measure assumes an encoded quantum state representing the structure of the given corpus, and an appropriately prepared test state to compare with. This alludes to a tensorcontraction approach to perform the evaluation.
While this approach has advantages in terms of accuracy and generalisation to complex sentence structures, state preparation is something we must consider. Given the current lack of qRAM, the specified access bounds are unrealised [7], and so it is worth considering state preparation as part of the process. Ensuring an efficient preparation approach will also be important to enable processing on a scale to rival that of traditional highperformance computing NLP methods.
As such, we aim to provide a simplified model, framework and hybrid workflow for representing textual data using a quantum circuit model. We draw inspiration from the DisCoCat model to preprocess our data to a structure easily implementable on a quantum computer. We consider simple sentences of the form "noun -verb -noun" to demonstrate this approach. All quantum circuit simulations and preprocessing is performed by our quantum NLP toolkit (QNLP), sitting atop the Intel Quantum Simulator (formerly qHiPSTER) to handle the distributed high-performance quantum circuit workloads [12,25]. We release our QNLP toolkit as an open source (Apache 2.0) project, and have made it available on GitHub [11].

A. Representing meaning in quantum states
In this section, we discuss the implementation of the algorithms required to enable encoding, processing, and decoding of our data. We consider a simplified restricted example of the sentence structure "noun-verb-noun" as the representative encoding format. To represent sentence meanings using this workflow, we must first consider several steps to prepare our corpus data set for analysis: 1. Data must be pre-processed to tag tokens with the appropriate grammatical type; stop-words (e.g. "the", "a", "at", etc.) and problematic (e.g. nonalphanumeric) characters should be cleaned from the text to ensure accurate tagging, wherein type information is associated with each word.
2. The pre-processed data must be represented in an accessible/addressable (classical) memory medium.
3. There must be a bijective mapping between the preprocessed data and the quantum circuit representation to allow both encoding and decoding.
Assuming an appropriately prepared dataset, the encoding of classical data into a quantum system can be mapped to two different approaches: state (digital), or amplitude (analogue) encoding [26,27]. We aim to operate in a mixed-mode approach: encoding and representing corpus data using state methods, then representing and comparing test sentence data through amplitude adjustment, measurement, and overlap.
Our approach to encoding data starts with defining a fundamental language (basis) token set for each representative token meaning space (subject nouns, verbs, object nouns). The notion of similarity, and hence orthogonality, with language can be a difficult problem. Do we consider the words "stand" and "sit" to be completely opposite, or are they similar because of the type of action taken? For this work, we let the degree of 'closeness' be determined by the distributional nature of the terms in the corpus; words further apart in the corpus are more likely to be opposite.
To efficiently encode the corpus data, we decide to represent the corpus in terms of the n most fundamentally common tokens in each meaning space. This draws similarity with the use of a word embedding model to represent a larger space of tokens in terms of related meanings in a smaller space [28][29][30]. This is necessary as representing each token in the corpus matching the sentence structure type can create a much larger meaning space than is currently representable, given realistic simulation constraints. However, one can note as we increase the limit of fundamental tokens in our basis, we tend to the full representative meaning model.
Taking inspiration from the above methods, we implement an encoding strategy that given the basis tokens, maps the remaining non-basis tokens to these, given some distance cut-off in the corpus. A generalised representation of each token t i , in their respective meaning space m would be defined as where d i,j defines the distance between the base token m j and non-base t i . As such, we obtain a linear combination of the base tokens with representative weights to describe the mapped tokens.
We have identified the following key steps to effectively pre-process data for encoding: 1. Tokenise the corpus and record position of occurrence in the text.
2. Tag tokens with the appropriate meaning space type (e.g. noun, verb, stop-word, etc.) 3. Separate tokens into noun and verb datasets.
4. Define basis tokens in each set as the N nouns and N verbs most frequently occurring tokens.
5. Map basis tokens in each respective space to a fully connected graph, with edge weights defined by the minimum distance between each other basis token.
6. Calculate the shortest Hamiltonian cycle for the above graph. The token order within the cycle is reflective of the tokens' separation within the text, and a measure of their similarity.
7. Map the basis tokens to binary strings, using a given encoding scheme.
8. Project composite tokens (i.e. non-basis tokens) onto the basis tokens set using representation cutoff distances for similarity, W nouns and W verbs . 9. Form sentences by matching composite nounverb-noun tokens using relative distances and a noun-verb distance cut-off, W nv .
After conducting the pre-processing steps, the corpus is represented as a series of binary strings of basis tokens. At this stage the corpus is considered prepared and can be encoded into a quantum register.

B. Token encoding
To ensure the mapping of the basis words to encoding pattern is reflective of the underlying distributional relationships between words in the corpus, it is necessary to choose an encoding scheme such that the inter-token relationships are preserved. While many more complex schemes can give insightful relationships, we choose a cyclical encoding scheme where the Hamming distance, d H , between each bit-string is equal to the distance between the bit-strings in the data set. For 2 and 4-qubit registers respectively, this would equate to the patterns given p n (1) = 0. For the simple 2-bit pattern, this equates to a Gray code mapping, but differs for larger register sizes. With this encoding scheme, we can show that the Hamming distances between each pattern and others in the set have a well-defined position-to-distance relationship.
As an example, let us consider a 4-element basis of tokens given by b ={up, down, left, right}. We define up and down as opposites, and so should preserve the largest Hamming distance between them. This requires mapping the tokens to either 00,11, or to 10,01 for these pairs. Similarly, we find the same procedure with the remaining tokens. In this instance, we have mapped the tokens as up → 00; down → 11; left → 01; right → 10; which preserves the relationships we have discussed earlier in III A.
Once again, it is worth noting that the notion of similarity is complex when considering words, as such is the concept of orthogonality. It may be argued that the updown relationship has more similarities than, say, a leftdown relationship, but for the purpose of our example this definition is sufficient. Care ought to be taken into defining inter-token relationships, requiring some domain expertise of the problem being investigated. The choice of inter-token relationship taken during preparation will influence the subsequent token mappings determined later in the process.
For our work we have deemed it sufficient to define these similarities by distance between the tokens in a text; larger distances between tokens defining a larger respective Hamming distance, and smaller distances a smaller one. We can similarly extend this method to Edge weights between the bit-strings represent the Hamming distances, dH between connected nodes. By mapping the tokens to the appropriate basis bit-string we can use the Hamming distances to represent differences between tokens.
larger datasets, though the ordering problem requires a more automated approach. For the 4-qubit encoding scheme, we must define a strategy to map the tokens to a fully connected graph, where again the respective positions of the bit-strings reflect the Hamming distance between them, as shown in Fig. 1. To effectively map tokens to these bit-strings, we use the following procedure: 1. Given the chosen basis tokens, and their positions in the text, create a graph where each basis token is a single node.
2. Calculate the distances between all token positions given a pairing of each token with the others, for all (n 2 − n)/2 pairings.
3. For simplicity, choose the minimum distances between each of the pairings, and create edges with this as the given weight. As an aside, alternative methods can also be used, such as mean, median, etc.
4. With the given fully-connected graph, find the minimum Hamiltonian cycle, and use the returned ordering to map the tokens onto the bit-strings.
For the calculated minimum Hamiltonian cycle, the relationships between each of the tokens will be preserved, and can effectively be mapped onto the bit-string encoding scheme. It can be noted that alternative encoding schemes and distance orderings could potentially be investigated, but will remain beyond the scope of this current work. For our purposes we make use of the networkx package for the finding the minimum Hamiltonian cycle [31].

C. Methods for quantum state encoding
To simplify our encoding procedure, we can assume a binary representation of distance for eq. (1), wherein all tokens within the given cutoff are equally weighted. This allows us to encode the states as an equal-weighted superposition, and is easily implemented as a quantum circuit [32,33].
For notational simplicity, we define the following mappings: X a : |a → |¬a , CX a,b : |a |b → |a |a ⊕ b nCX a1,...an,b :|a 1 . . . |a n |b → |a 1 . . . |a n |b ⊕ (a 1 ∧ · · · ∧ a n ) where |a and |b are computational basis states, X is the Pauli-X (σ x ) gate, CX and nCX are the controlled X, and n-controlled NOT (nCX) operations, respectively. Additionally, we may define controlled operations using any arbitrary unitary gate using a similar construction of the above. The goal of this algorithm is to encode a set of bitstrings representing our token meaning-space as an equal weighted superposition state. For a set of N unique binary patterns p (i) = {p Each of the binary vectors are encoded sequentially. For each iteration of the encoding algorithm, a new state is generated in the superposition (excluding the final iteration). The new state generated is termed as the active state of the next iteration. All other states are said to be inactive. Note, in each iteration of the algorithm, the active state will always be selected with |u = |01 .
During a single iteration, a binary vector is stored in integer format, which is then serially encoded bit-wise into the auxiliary register |a resulting in the state |ψ 1 : This binary representation is then copied into the memory register |m of the active state by applying a 2CX gate on |ψ 1 : Next, we apply a CX followed by a X gate to all qubits in |m using the corresponding qubits in |a as controls: This sets the qubits in |m to 1 if the respective qubit index in both |m and |a match, else to 0. Thus, the state whose register |m matches the pattern stored in |a will be set to all 1's while the other states will have at least one occurrence of 0 in |m . Now that the state being encoded has been selected, an nCX operation is applied to the first qubit in the auxiliary register using the qubits in |m as the controls: The target qubit whose initial value is 0 will be set to 1 if |m consists of only 1's. This is the case when the pattern in |m is identical to the pattern being encoded (i.e. the pattern stored in |a ).
In order to populate a new state into the superposition, it is required to effectively 'carve-off' some amplitude from the existing states so the new state has a nonzero coefficient. To do this, we apply a controlled unitary matrix CS (i) to the second auxiliary qubit u 2 using the first auxiliary qubit u 1 as a control: where with i ∈ Z + , and φ(i) = − cos −1 ((i − 2)/i). The newly generated state will be selected with |u = |11 , while the previous active state used to 'carve-off' this new state selected with |u = |10 . All other states will be selected with |u = |00 . To apply the next iteration of the algorithm we uncompute the steps from equations (5) -(7) as: This results in the previous active state now being selected with |u = |00 while the new state with |u = |01 , which identifies it as the new active state. The previous active state's memory register now contains the pattern {a Finally, the register |a for every state must be set to all zeroes by sequentially applying X gates to each qubit in |a according to the pattern that was just encoded. The quantum register is now ready for the next iteration to encode another pattern. Following the encoding of all patterns, our state will be |ψ = |a |u |m Note, this algorithm assumes that the number of patterns to be encoded is known beforehand, which is required to generate the set of S i matrices and apply them in the correct order. The total number of qubits used in this algorithm is 2n + 2, of which n + 2 are reusable after the implementation since the qubits in |a and |u are all reset to |0 upon completion. The additional n + 2 qubits allows for them to be used as intermediate scratch to enable the large n-controlled operations during the encoding stages. This ensures that we can perform the nCX operations with a linear, rather than polynomial, number of two-qubit gate calls [34].

D. Representing patterns using encoded data
The purpose of this methodology is to represent a single test pattern using the previously encoded meaningspace. The relative distance between each meaning-space state pattern and the single test pattern x = {x 1 , . . . , x n } is then encoded into the amplitude of each respective meaning-space state pattern. Thus, each represented state will have a coefficient proportional to the Hamming distance between itself and the test pattern. The method we present below calculates the binary difference between the target state's bit-string and the test pattern, denoted by d H .
The algorithm assumes that we already have N states of length n encoded into the memory register |m . The subsequent encoding requires 2n + 1 qubits; n qubits to store the test pattern, a single qubit register which the rotations will act on, and n qubits for the memory register. As our previously used encoding stage required 2n + 2 qubits, we can repurpose the |a and |u registers as the test pattern and rotation registers respectively. Our meaning-space patterns are encoded in the memory register |m , with registers |a and |u initialised as all 0's. Hence, our initial state is given by eq. (13).
Next, the test pattern x = {x 1 , . . . , x n } is encoded into the register |a sequentially by applying a X gate to each qubit whose corresponding classical bit x i is set: Rather than overwriting register |a with the differing bit-values, a two qubit controlled R y (θ) (2CR y ) gate is applied, such that θ = π n . This is done by iteratively applying the 2-controlled R y gate with a j and m j as control qubits to rotate |u if both control qubits are set for j = 1, . . . , n. The operation is performed twice, such that a j = 1, m j = 1 and by appropriately flipping the bits prior to use for a j = 0, m j = 0.
Finally, the test pattern stored in register |a is reset to consist of all 0's by applying a X gate to each qubit in |a whose corresponding classical bit is set to 1.
The above process can be written as follows: where the state after application is given by Applying the linear map we represent the meaning-space states weighted by the Hamming distance with the test pattern, x. The state following this is given by where qubit registers |a = |0 ⊗n and |u = |01 are left out for brevity.
With the above method we can examine the similarity between patterns mediated via the meaning space. While one may directly calculate the Hamming distance between both register states as a measure of similarity, by doing this we lose distributional meaning discussed from Section III B. As such, we aim to represent both patterns in the meaning-space, and examine their resulting similarity using state overlap, with the result defined by with P (i) = x (i) |P |x (i) , and x (i) as test pattern i.

A. Small-scale example
We now demonstrate an example of the method outlined in Sec. III for a sample representation and sentence comparison problem.
We opt for the simplified noun-verb-noun sentence structure, and define sets of words within each of these spaces, through which we can construct our full meaning space, following an approach outlined in [4].
For nouns, we have: (i) subjects, n s = {adult, child, smith, surgeon}; and (ii) objects, n o = {outside, inside}. For verbs, we have v = {stand, sit, move, sleep}. With these sets, we can represent the full meaning-space as given by Whilst all combinations may exist, subjected to a given training corpus, only certain patterns will be observed, allowing us to restrict the information in our meaning-space. For simplicity, we can choose our corpus to be a simple set of sentences: John rests inside. Mary walks outside. To represent these sentences using the bases given by eq. (21), we must determine a mapping between each token in the sentences to the bases. In this instance, we manually define the mapping by taking the following meanings: • John is an adult, and a smith. The state is then given as: |John = 1/ √ 2 (|adult + |smith ), which is a superposition of the number of matched entities from the basis set.
• Mary is a child, and a surgeon.
We also require meanings for rests and walks. If we examine synonyms for rests and cross-compare with our chosen vocabulary, we can find sit and sleep.
Similarly, for walks we can have stand and move. We can define the states of these words as |rest = 1/ √ 2 (|sit + |sleep ) and |walk = 1/ √ 2 (|stand + |move ). Now that we have a means to define the states in terms of our vocabulary, we can begin constructing states to encode the data.
We begin by tokenising the respective sentences into the 3 different categories: subject nouns, verbs, and object nouns. With the sentence tokenised, we next represent them as binary integers, and encode them using the processes of Sec. III. The basis tokens are defined in table I. We define the mapping of "John rests inside, Mary John If we consider the John and Mary sentences separately for the moment, they are respectively given by the states (1/2)|0 ⊗ (|10 + |11 ) ⊗ (|00 + |10 ) for John, and (1/2)|1 ⊗ (|00 + |01 ) ⊗ (|01 + |11 ) for Mary. Note that we choose a little endian encoding schema, wherein the subject nouns are encoded to the right of the register and object nouns to the left. Tidying these states up yields John rests inside → |J = 1 2 (|01100 + |01000 + |01110 + |01010 ),

Mary walks outside → |M
where the full meaning is given by |m = |J +|M √ 2 , which is a superposition of the 8 unique encodings defined by our meaning-space and sentences.
From here we will next encode a test state to be stored in register |a for representation using the encoded meaning-space. We use the pattern denoted by "Adult(s) stand inside", which is encoded as |a = |00000 . Constructing our full state in the format of eq. (14), we get By following the steps outlined in Sec. III D, rotating a single qubit from the control register |u based on the Hamming distance between both registers, and applying the map from eq. (18), the state of register |m encodes a representation of the test pattern in the amplitude of each unique meaning-space state. Through repeated preparation and measurement of the |m register we can observe the patterns closest to the test. Figure 2 shows the observed distribution using two different patterns; adult, sit, inside (00000, orange), and child, move, inside (00111, green) compared with the encoded meaning-space patterns following eq. (13) (blue).
Given this ability to represent patterns, we can extend this approach to examine the similarity of different patterns using eq. (20). One can create an additional memory register |m , and perform a series of SWAP tests between both encoded patterns, to determine a measure of similarity. For the above example, we obtain an overlap of F (00000, 00111) = 0.8602, denoting a good degree of similarity, given our chosen meaning-space. FIG. 2. Sentence encoding state distribution taken by multishot preparation and measurement of |m prior to, and post, the encoding of test patterns. Two distinct patterns are used: 00000 → (adult, stand, inside) (orange) and 00111 → (child, move, inside) (green). The distribution is sampled 5 × 10 4 times, and shows how the Hamming distance weighting modifies the distribution relative to the k = 8 unweighted meaning-space states (blue).

B. Automated large-scale encoding
As that the previous example was artificially constructed to showcase the method, an automated workflow that determines the basis and mapped tokens, and performs the subsequent experiment is beneficial. Here we perform the same analysis, but using Lewis Carroll's "Alice in Wonderland" in an end-to-end simulation.
To showcase the basis choice, we will consider the nouns basis set. We define a maximum basis set of 8 nouns (N nouns = 8), taken by their frequency of occurrence. Following the process outlined in Sec. III, we define a graph from these tokens, and use their intertoken distances to determine ordering following a minimum Hamiltonian cycle calculation. The resulting graph is shown by Fig. 3. From here we map the tokens to an appropriate set of encoding bit-strings for quantum state representation, making use of eq.(2). The resulting set of mappings is : We can now map the composite tokens onto the chosen basis encoding using a distance cut-off, W nouns . Following the inter-word distance calculation approach used to determine basis order, we calculate the distance between the other corpus tokens and the respective basis set. Taking the set of all nouns in the corpus as s n , and the noun basis set as b n ⊂ s n , for every token t n in s n we perform Tokens that fall outside W nouns are mapped to the empty set, ∅. This approach is then repeated for verbs, and lastly inter-dataset distances between noun-verb pairings, W nv , which are used to discover viable sentences. The mapped composite tokens may then be used to create a compositional sentence structure by tensoring the respective token states. Following the previous example, we may examine the automatic encoding and representation of the string "Hatter say queen" to the meaning-space patterns. Given that representing the text in its entirety would be a substantial challenge, we limit the amount of information to be encoded by controlling the pre-processing steps as N nouns = 8, N verbs = 4, W nouns = 5, W verbs = 5 and W vn = 4. Here N nouns is again the number of basis nouns in both subject and object datasets, N verbs the number of basis verbs, W nouns and W verbs the cutoff distances for mapping other nouns and verbs in the corpus nouns to the basis tokens, and W vn is the cutoff distance to relate noun and verb tokens.
For the above parameters, the method finds a subset of 75 unique patterns to represent the corpus. Following Section IV A one obtains the associated similarity of encoded elements by the resulting likelihood of occurrence, as indicated by Fig. 4, where we have prepared and sampled the |m register 5 × 10 4 times to build the distribution. Clear step-wise distinctions can be observed between the different categories of Hamming-weighted states, with the full list presented in Appendix C Table III. Given the basis encoding tokens from eq. (22), the string "Hatter say queen" can be mapped to the value 995 (1111100011 in binary). As before, we can also compare patterns mediated via the meaning-space. For the pattern "Hatter say Queen", the most similar patterns are "Hatter say King" (0111100011), "Hatter go Queen" (1111110011) and "Turtle say Queen" (1111100001) with overlaps of 0.974, 0.974 and 0.973 respectively. We include a variety of other encoded comparisons in the Appendix as Table. IV to showcase the method.

V. CONCLUSIONS
In this paper we have demonstrated methods for encoding corpus data as quantum states. Taking elements from the categorical distributional compositional semantic formalism, we developed a proof-of-concept workflow for preparing small and large scale data sets to be encoded, processed, and decoded using a given quantum register. We showed the preparation, encoding, comparison, and decoding of small and large datasets using the presented methods.
Recent works have shown the importance of the reduction in classical data to be represented on a quantum system [35]. The approach defined above follows an analogous procedure, representing the important elements of the corpus data using a fundamental subset of the full corpus data. Using this subset, we have shown how to represent meanings, and subsequently the calculation of similarity between different meaning representations. We have additionally released all of this work as part of an Apache licensed open-source toolkit [11].
For completeness, it is worth mentioning the circuit depths required to realise the above procedures. Taking the large scale example, we obtain single and two-qubit gate call counts of 2413 and 33175 respectively to encode the meaning space. This may be difficult to realise on current NISQ generation quantum systems, where the use of simulators instead allow us to make gains in understanding of applying these methods to real datasets.
The potential for circuit optimisation through the use of ZX calculus [36], or circuit compilation through tools such as CQC's t|ket may offer more realistic circuit depths, especially when considering mapping to physical qubit register topologies [37].
Very recent works on the implementation of the Dis-CoCat formalism on physical devices without the need for qRAM, have also emerged [38]. These methods may provide a more generalised approach to investigate quantum machine learning models in NLP and beyond, and have the potential to overcome the limitations discussed earlier with data encoding. We imagine the merging of this generalised approach [39] with the hybrid quantumclassical methods we have devised to allow interesting results and further development of this field. We leave this to future work.

ACKNOWLEDGMENTS
We would like to thank Prof. Bob Coecke and Dr. Ross Duncan for discussions and suggestions during the early stages of this work. The work leading to this publication has received funding from Enterprise Ireland and the European Union's Regional Development Fund. The opinions, findings and conclusions or recommendations expressed in this material are those of the authors and neither Enterprise Ireland nor the European Union are liable for any use that may be made of information contained herein. The authors also acknowledge funding and support from Intel during the duration of this project.