Skip to content
BY 4.0 license Open Access Published by De Gruyter October 26, 2021

Evolution of group-theoretic cryptology attacks using hyper-heuristics

  • Matthew J. Craven EMAIL logo and John R. Woodward

Abstract

In previous work, we developed a single evolutionary algorithm (EA) to solve random instances of the Anshel–Anshel–Goldfeld (AAG) key exchange protocol over polycyclic groups. The EA consisted of six simple heuristics which manipulated strings. The present work extends this by exploring the use of hyper-heuristics in group-theoretic cryptology for the first time. Hyper-heuristics are a way to generate new algorithms from existing algorithm components (in this case, simple heuristics), with EAs being one example of the type of algorithm which can be generated by our hyper-heuristic framework. We take as a starting point the above EA and allow hyper-heuristics to build on it by making small tweaks to it. This adaptation is through a process of taking the EA and injecting chains of heuristics built from the simple heuristics. We demonstrate we can create novel heuristic chains, which when placed in the EA create algorithms that out perform the existing EA. The new algorithms solve a greater number of random AAG instances than the EA. This suggests the approach may be applied to many of the same kinds of problems, providing a framework for the solution of cryptology problems over groups. The contribution of this article is thus a framework to automatically build algorithms to attack cryptology problems given an applicable group.

MSC 2010: 20P05; 68W30; 90C27; 94A60

1 Introduction

On NP-hard problems, the time taken to produce an algorithm to solve such problems is often vast. In such cases, users may use an “off the shelf” algorithm to obtain approximate solutions within an appropriate time. In this article, we take a different approach and attempt to design an algorithm in response to feedback from similar instances of the problem. Examples of such problems are those in group-theoretic cryptology (multiple conjugacy, Anshel–Anshel–Goldfeld [AAG, [1]] and word decomposition, for instance). These problems have been posed over varying types of groups serving as the base problems for key exchange protocols (KEPs) [1,2,3, 4,5] and subsequently attacked [6,7,8, 9,10,11, 12,13,14]. The group structures used are often intended to provide an extra encryption layer through the scrambling induced by the group presentation.

In this work, a preliminary hyper-heuristic framework is detailed which takes as inputs a proposed cryptographic base problem and a group structure, and, via machine learning techniques, generates operations for an enhanced EA which aims to solve an acceptable proportion of random instances of the problem. One of the main benefits of the hyper-heuristic approach is that we take an existing algorithm (which could, conceivably, be any algorithm) and allow the hyper-heuristic to suggest modifications to it. Therefore, the existing algorithm provides a baseline for any suggestions the hyper-heuristic proposes.

The framework is implemented in the GAP 4.8.7 [15] language (due to its compatibility with the ParGAP package [16], allowing use of MPI intra-core communication). This is tested on a case study of an AAG KEP [1,2] posed over polycyclic groups defined by a number field [17]. The aim is to generate mutation operators for algorithms which outperform the existing human-designed EA. These mutation operators are chains of simple heuristics, which are composed or learned. The generation of crossover, selection, and other heuristic components is outside the scope of this work.

Our contribution is an approach that, contrary to the above manual design of attacks, automatically builds attack mechanisms and attempts to break the above KEP. This approach is trained on a small set of instances and then validated on a second larger independent set of instances, illustrating that it generalises. This article is not proposing a single algorithm to attack, but rather a framework in which algorithms can automatically be generated and then tested, and is an example of the generate-and-test paradigm which has many applications in science, engineering, mathematics, and daily life. One of the drawbacks of our method is the large amount of computation time required; it takes relatively little time to generate an algorithm but a relatively long time to test it. One of the benefits of our approach, however, is that we can take existing approaches (as we do in this article, an EA [18,19]), and use it as a starting point from which we can improve.

In ref. [10], we observed that a human-designed EA performs better than the length attack algorithm of ref. [17]. In this article, we observe that an automatically designed EA performs better than the human-designed EA. We also conjecture that a random search algorithm will perform poorly on this problem. This is a pattern of performance typically seen in the metaheuristics literature. The reason for this ordering of these four types of solvers lies in the nature of the resulting search landscape. A human-designed EA is essentially a more sophisticated length attack algorithm, and a machine-designed EA is essentially slightly more sophisticated than a human-designed EA.

Typically, during the design of an algorithm, we need an understanding of the problem to design an algorithm. The algorithm thus captures our intuition about how to solve that problem (consider the problem of sorting and the large number of algorithms available, for instance). An algorithm is an explicit formalisation of our intuition: with cryptology, we have very little in the way of intuition to guide us. This is an opportunity for an automated method (largely unbiased) to invent new algorithms.

It is acknowledged that the detailed KEP has already been broken by refs [10,11] (the latter reference being a “field-based attack”), but the wish is to present this work as a preliminary study with a view towards application to other cryptanalytic problems. That is, given inputs of a group (e.g. polycyclic) and a cryptographic problem (e.g. AAG), the aim is to output an attack algorithm that solves the particular problem over the given group. It is thus argued that this type of algorithmic framework has a future in the disciplines of cryptology and possibly algorithmic questions in combinatorial group theory and may be extended to other structures and problem types. Recall that the inputs of the hyper-heuristic approach are a base problem and a group structure, and the output is an algorithm. This is in contrast to ref. [10], which exposes a particular algorithm where the input was a problem instance (i.e. a set of conjugacy equations) and the outputs were exact solutions (i.e. keys) or failure. The point of a hyper-heuristic approach is that it is able to generate algorithms such as ref. [10] among many others.

This work is organised as follows: in Section 2, we give an introduction to group-based cryptography, reviewing previously proposed KEP problems, before turning to an overview of hyper-heuristics. This is followed by Section 3, which introduces the notation and formalisation. In Section 4, we describe the experimental approach and detail parameter settings, discussing the results of our approach in Section 5. In Section 6, we conclude the article, including a discussion of further work resulting from this study and raising future research directions.

2 Background

In this section, we will first introduce group-based cryptography. We then give an introduction to hyper-heuristics.

2.1 Introduction to group-based cryptography

Group-based cryptography uses groups in the construction of cryptosystems and KEPs and has been an active area of research since the late 1990s. Proposed cryptosystems and their subsequent attacks (purported breaks) iterate one after the other with the aim of producing increasingly secure cryptography over time.

The late nineties were when group-based cryptography began in earnest, when the likes of refs [1,2,3] proposed KEPs based upon braid groups. As mentioned in the introduction, the braid groups were used due to the scrambling induced by the presentation of the group, and the consequent thought that the underlying problems (various guises of the conjugacy problem) were thought to be extremely difficult to solve. Solving the underlying problem would, in many cases, break the KEP and render any keys exchanged open to misuse by adversaries.

Both KEPs, and the underlying problems, were attacked in the next few years. Examples of such attacks were super summit set attacks [8] and the more practical length-based attacks (LBAs) [20]. The latter algorithms (also known as hillclimbers) build up solutions to instances of the problem gradually, beginning with a short candidate solution and making alterations to it based upon randomness. This altered solution is then compared to the old solution by some metric, mostly with regards to how “well” the candidate solves the instance (e.g. how many symbols remain after all possible cancellations have been conducted). If the altered solution is an improvement, then the current solution is set equal to the altered solution and the process repeats. If not then the altered solution is discarded.

Being practical and fast, LBAs became increasingly sophisticated through refs [6,7,12,14]. As LBAs became also increasingly capable of solving instances of the aforementioned KEPs, researchers began, in a search for more attack-resistant structures, to look for new groups and problems while keeping the general methodology. Examples of these platform groups are right-angled Artin groups [9] (a homomorphic pre-image of braid groups), small cancellation groups [21], matrix groups, Thompson’s group, and Grigorchuk’s group, to name but a few.

Polycyclic groups were first proposed as a platform group in 2004 [4] and were followed 10 years later by the works of refs [5] and [17], applying two distinct types of polycyclic groups to the AAG [1] problem (multiple conjugacy). The systems introduced were, in turn, broken by the works of ref [22] (for generalised Heisenberg groups), [11] and [10] (via a parallel EA). The latter work was shown to be more efficient, and more successful, than previous LBA attacks. Although the approach on the proposed KEP was successful, we wish to take it further into the domain of hyper-heuristics and use the KEP as a test bed for our framework. An excellent summary of group-theoretic cryptology in general can be found in ref. [13].

2.2 Introduction to hyper-heuristics

Informally, hyper-heuristics offer to take a number of existing computational search techniques and combine them to make a new heuristic. This new heuristic is intended to have more of the strengths of each of the heuristics and fewer of their weaknesses. The motive of a hyper-heuristic is not to out perform a state-of-the-art algorithm on a single instance of a problem. Rather, the aim of hyper-heuristic approaches is to perform well across a range of problem instances. That is, hyper-heuristics attempt to offer robust performance across a set of problems rather than specialised performance on a narrow set of specific instances. These problems may be problem instances from a given domain, such as the travelling salesman problem. Or the problem instances may be drawn from different problem domains (e.g. timetabling or vehicle routing). In this article we are developing a hyper-heuristic framework to solve problem instances from a single domain: cryptology.

We should also be careful to distinguish between optimisation and supervised machine learning. Optimisation typically has an objective function we wish to evaluate and a parameter value which is a global optimum. Often this is difficult to achieve and also difficult to know when it has been achieved. In contrast, with supervised machine learning, we typically have a set of example cases which we use to train a model. We then have a second set of independent example cases which are used to determine if the model performs well in general on cases which were not included in the training phase. Optimisation has a single stage (optimising), while machine learning has two main stages (training and testing). Nor do we have the issue of over-fitting in optimisation, but the issue of over-fitting may arise in machine learning. In summary, in this article we are using a machine learning approach (hyper-heuristics), with an independent training and test set, to build a heuristic used for optimisation, the objective being to minimise the length.

Hyper-heuristics can be viewed in the context of heuristics and metaheuristics. These three terms are often confused. Let us begin by looking first at heuristics, metaheuristics, and finally hyper-heuristics.

A heuristic is domain-specific algorithm (often called a rule of thumb) which does not solve a problem to optimality (as such problems are often NP-hard or NP-complete), but rather offers to deliver suboptimal solutions in feasible time. That is, a heuristic is a strategy that aims to deliver an approximation to a solution to a given problem in a fast, rather than an overly elaborate, way. An example of a heuristic is the Lin–Kernighan algorithm which is applied to the traveling salesman problems (TSP). It does not make sense to apply the Lin–Kernighan algorithm to the knapsack problem, as it is specific to TSP. The Lin–Kernighan algorithm could be applied to other graph-based problems with a representation similar to the TSP, but the algorithm may not perform well as this is not what it was intended for. A metaheuristic is a general search-based algorithm which can be applied to spaces consisting of bit strings or permutations, for example depending on the representation of the problem instances. An example of a metaheuristic is a genetic algorithm which searches the space of bit strings of a given length.

Hyper-heuristics are different again. Typically, a hyper-heuristic uses a metaheuristic to search the space of problem-specific heuristics. That is, a hyper-heuristic is a “search methodolog[y] for choosing or generating (combining, adapting) heuristics [ ] , in order to solve a range of optimisation problems” [23, p. 2]. For example, see ref. [24]. Hyper-heuristics have successfully been applied to a number of different problem domains.

As combinatorial optimisation problems are a subset of all NP hard problems, it is not surprising that hyper-heuristics have been a popular approach. Applications include exam timetabling [25], bin packing [26], and employee rostering [27]. There have also been a number of well-referenced survey articles, including refs [28,29,30].

Hyper-heuristics typically do not generate complete algorithms; rather a component of an algorithm is targeted to be automatically designed by a generate and test approach. Hyper-heuristics have been used, for example, to generate components of evolutionary algorithms (EAs) such as genetic algorithms and evolutionary programming (e.g. crossover operators [31], mutation operators [32]) and form a large part of the literature in the automated design of algorithms [33].

In the context of this article, we are using hyper-heuristics in the following manner. We will take seven low-level heuristics, which are chained together randomly to effectively create new heuristics. These new chains of heuristics are then inserted into a standard EA (depicted in the work of ref. [10]) which is used to tackle the problem. This work begins in the next section.

3 Notation and formalisation

In this section, the AAG KEP over a certain type of polycyclic group is discussed. This is followed by the notation needed for the implementation of the hyper-heuristic. In this section, the notation broadly follows that of ref. [10] which describes the aforementioned EA.

3.1 Setup of problem

The AAG KEP [1,2] was posed over polycyclic groups in refs [5,17] and subsequently attacked in two distinct ways by the work of refs [10] and [11]. The main details of the protocol, following the exposition given in ref. [10] for a group G = g 1 , g 2 , , g n R , are as follows.

First, Alice chooses a subgroup A = a 1 , a 2 , , a N G generated by words a i in the generators of G such that L 1 l G ( a i ) L 2 . Bob then does similarly to produce a subgroup = b 1 , b 2 , , d N G . All of A , B , and G are made public. Alice chooses her private key A = a μ 1 ε 1 a μ 2 ε 2 a μ L ε L , where each a i A and μ i = ± 1 (for all i = 1 , , L ). She now calculates

A 1 b 1 A , A 1 b 2 A , , A 1 b N A ,

and sends these to Bob. Bob does similarly, producing B 1 a i B for i = 1 , , N and sends these to Alice (his private key is B ). From the information now exchanged, each individual can now produce the shared key (the commutator) [ A , B ] = A 1 B 1 A B .

If an adversary wishes to find either the private key A (or equivalently, B ), they may intercept the above conjugates either party sends to the other. Thus, the problem to be solved may be simply expressed as a subgroup restricted multiple conjugacy problem in the following way. Each instance of this problem is a set of N (frequently 20) conjugacy equations E = { E 1 , , E N }

E 1 : c 1 = A 1 b i A E 2 : c 2 = A 1 b i A E N : c N = A 1 b N A ,

posed over a finitely presented platform group G . A solution to the problem means that all the above equations are satisfied. One function of the rewriting rules (relators) R of G is to serve cryptographically as word obfuscators and thus hide the secret word (private key) A .

The problem is posed over polycyclic groups O U , where, by ref. [10], O is the additive group of the ring of integers of a number field K and its group of units is U . The number field is written as K = Q [ x ] / ( f ) , for f Z [ x ] a monic irreducible polynomial of degree d . To recap, the instance parameters associated with this setup are then the number of equations, N , the polynomial f , length L of the private key A in A , and L 1 and L 2 (the lower and upper bounds, respectively, on the lengths of a i in G ). Note that, in this work, we refer to either an exact solution or a candidate solution as appropriate. However, most references will be to candidate solutions but for the sake of brevity will be named solutions. In the context of hyper-heuristics and cryptology, we are using hyper-heuristics to generate candidate solutions to find an exact solution to the cryptographic problem. In this spirit, there are several functions at work we need to distinguish between.

3.2 Pertinent functions

The following functions are recapped from ref. [10]. Let a word w be expressed in the form w = f i 1 e i 1 f i 2 e i 2 f i r e i r for non-zero e j Z and f 1 , f 2 , , f n are the generators of the free group F . The length functions associated with the group G are then given by

( w ) = k = 1 r e i k

and

w t ( w ) = k = 1 r ω i k e i k ,

where, as in the above, ω j is the “sum of the lengths of the normal forms of the commutators [ g j , g k ] in G for k = 1 , , n .” That is, the length of w is the sum of the absolute powers (respectively, the weighted absolute powers) of individual generators f i that make up the word w . The basic EA cost function measures the quality of the candidate solutions produced by the EA and is given by

c ( w ) = i = 1 N ( α 1 b i α c i 1 ) ,

where α is the current EA solution (i.e. the approximation of the private key A ). This function has output of the sum of lengths of (normal form) reduced equations E 1 , E 2 , , E N . That is, the length of summand i (where i { 1 , , N } ) of the cost function is equal to the reduced length of each equation E i after its substitution with α . This function is used to drive search in the EA, since the population ranking is performed with respect to it. The cost used in the EA is broadly the cost vector produced by this basic function, involving the sum c , maximum and mean lengths of summands of c for the weighted and non-weighted length functions, given in ref. [10, p. 8–9]. The global optimum (minimum value) of c is zero; at this value, no fragments of the equations remain and the instance is completely solved.

The heuristic objective function is the metric used to compare the current heuristic chain over the given set of instances (training or testing) and is a vector given in the following order, each element computed over the set of instances:

  1. The mean best cost c over the unsuccessful EA runs;

  2. The negative of the success rate as a proportion of the total number of runs;

  3. From the successful runs, the mean number of generations used by the EA.

That is, this function tells the hyper-heuristic how good a given heuristic chain is. For the validation process the first and second elements of the above objective function are swapped, since we are now more concerned with the success rate. The hyper-heuristic attempts to minimise the above objective function, indicating a successful heuristic chain, as far as possible. Comparison of heuristic objective vectors, produced by two distinct heuristic chains, is performed lexicographically. Note that this function is often termed a fitness function in the evolutionary computation community.

3.3 Simple heuristics on the group

In previous work [10], six simple heuristics were used in an EA to break a proposed key exchange [17]. These are listed in Table 1 as H 1 H 6 . In this article, we are building new heuristic chains to inject into an EA. We have also added a seventh heuristic H 7 (swap) to this set of heuristics. Evolutionary operators may be otherwise thought of as heuristic on group elements w = f i 1 e i 1 f i 2 e i 2 f i r e i r .

Heuristic H 7 (swap) is designed to assist when symbols are in the “wrong place” in a word w , swapping two symbols at random positions and potentially triggering subsequent cancellation of symbols (and so an EA cost reduction). Essentially, all heuristics in the above table are random, with operations performed with random words or generators at random positions. The above is not a list of minimal heuristics: it is noted, for example, that heuristic H 1 can be achieved through repeated application of H 2 , as can H 5 and H 6 (which were specialised to the conjugacy problem).

Table 1

The seven simple heuristics used to build new heuristics. The first six heuristics were used in ref. [10]

Heuristic Description
H 1 Insertion of a subgroup generator a i
H 2 Insertion of a single generator f i
H 3 Deletion of a single generator
H 4 Substitution with a single generator
H 5 Position conjugation: conjugating a given position by f i
H 6 Subword conjugation: conjugating a subword by f i
H 7 Swap

3.4 EA parameter settings

The EA parameters are given in Table 2. The parameters were produced by early experimentation and the scaling down of the parameters in ref. [10] to approximately one quarter of their original values to achieve an effective set of EA parameters. This increases the speed of the EA. The original population size was 100. We do not claim optimality for these parameter settings but they are fixed for the entirely of the hyper-heuristic run. We suggest the hyper-heuristic approach is not particularly sensitive to these parameters since we are aiming to improve upon an already well-performing EA.

Heuristics are executed by first choosing a solution at random from the top 40% of the population (by cost). The selection operator is elitist; i.e. if n s solutions are to be selected, then the first is the minimum cost solution, with the remaining n s 1 solutions selected from the top 40% (i.e. after ranking by minimum cost) of the population at random. All random choices are uniform (as in ref. [10]).

Table 2

EA parameter settings

Parameter Value
Population size 25
Truncation selection 40%
H 1 6
H 2 1
H 3 1
H 4 5
H 5 1
H 6 1
Crossover 4
Selection 2
Chains 4
Number of generations Depends on experiment

It was chosen to have four solutions from each generation created by a heuristic chain. Testing this alongside the remaining nineteen solutions in each generation created by the same heuristic chain, it was found that this choice of four solutions turned out to be more advantageous (the average number of generations to solve decreased). H 7 does not appear in Table 2 as it does not operate in isolation (as part of the EA of ref. [10]), only in the context of the other six heuristics. Crossover is performed by choosing two words (from the top 40% of the population) w 1 , w 2 . Choosing two random positive integers r 1 ( w 1 ) , r 2 ( w 2 ) , one of the two words

w 1 [ 1 r 1 ] w 2 [ r 2 + 1 ( w 2 ) ] and w 2 [ 1 r 2 ] w 1 [ r 1 + 1 ( w 1 ) ]

is output [10] (where w [ s t ] is the subword between and including positions s and t of the word w ). The next section details the operation of the hyper-heuristic and the experimental setup.

4 Experimental setup

4.1 Hyper-heuristic implementation

As above, our objective is to create a hyper-heuristic that, given the AAG problem (Section 3.1) and a polycyclic group as previously stated, generates an algorithm which solves an acceptable number of instances of the problem. The term “acceptable" in this instance is taken to mean a higher number of instances than the original EA of ref. [10] with H 2 inserted (cf. “ H 2 ” column of Tables 35). To recap, our hyper-heuristic controls the injection of heuristic chains into an EA in order to determine the best heuristic chain. The initial heuristic chain can be the best heuristic known (i.e. H 2 ) or a random chain. If the initial heuristic chain is random, then the heuristic generator is called. This random chain is set to a random length between 2 and 10. We now present the core algorithmic contribution and how these algorithms are related. Algorithm 1 tests heuristic chains; the parameters used are H max = 20 , N train = 15 , N test = 50 , and N valid = 50 .

Algorithm 1 Heuristic generation and testing methodology
Input: Group G ; parameters: number of training instances N train ; number of testing instances N test ; number of validation instances N valid ; initial heuristic chain; maximum number, C max , of heuristics to generate.
Output: Runtime statistics; best heuristic chain found.
1: i 0 , i 1
2: while i C max do
3: if i = 1 then
4: C i initial chain
5: else
6: Call heuristic chain generator (Algorithm 2), giving chain C i .
7: end if
8: Execute the EA, with injected chain C i on all training instances.
Get metric M i , train on training set.
9: if i = 1 then
10: M train M i , train , i i
11: else
12: if M i , train < M train then Better chain found for training set; test chain on the testing set.
13: Execute the EA, with injected chain C i on all test instances.
Get metric M i , test .
14: If M 1 , test does not exist then execute the EA with injected chain
C 1 on all testing instances. Let M test M 1 , test .
15: if M i , test < M test then
16: i i A better chain has been found on the testing set.
17: end if
18: else
19: Accept chain C i (i.e. i i ) with probability p h .
Otherwise rewind chain back to the last best chain.
20: end if
21: end if
22: end while
23: if i 1 then
24: Compare chain C p with chain C 1 on the validation set of instances via execution of the EA with injected chains (i) C p and (ii) C 1 .
25: return timeout and C i . End.
26: end if

The EA referred to in Algorithm 1 is the EA of ref. [10] run on an input collection of instances. The EA parameter values are reduced as in Table 2. Note also that there is a probability, p h , that the current chain will be accepted if it does not perform better than the best chain found (on the training instances) so far.

The group definition of G is a piece of code which simply defines the group, its instance parameters over which the instance will be computed, and the cost functions. Next is the heuristic chain generator, Algorithm 2. If the initial heuristic chain is a random chain, then this random chain is created by appending a given number (here, a random number between two and ten) of simple heuristics randomly chosen from H 1 H 7 . Otherwise, the heuristic generator (Algorithm 2) generates new chains of simple heuristics from the chain given by the current step of Algorithm 1 by a process of insertion, deletion, or substitution at random positions in the heuristic chain. The heuristic is then returned in the form of a series of commands written into a file read by the EA when it is time to execute the chain. Chains not allowed include the set of all chains of the form H 3 k for some k > 0 (i.e. a chain consisting solely of deletions) or chains that are identical to those already examined in the hyper-heuristic run. We let p i = p s = 0.4 and p d = 0.2 .

Algorithm 2: Heuristic chain generator
Input: Set of heuristic chains C = { C 1 , , C k } already examined.
Output: New heuristic chain C .
C C i , the heuristic chain given by Algorithm 1.
1: while C C do
Choose operation at random subject to probabilities p i , p s , p d (of insertion, substitution and deletion respectively).
Perform chosen operation on C with a simple heuristic chosen at random from H 1 H 7 (if not deletion).
2: end while
3: return heuristic chain C . End.

An instance generator is also used. This creates instances at random, with random number seed based upon the computer clock. Included are instance parameters ( N , , L 1 , L 2 , G – Section 3.1), a random word function, and the cost functions of Section 3.1.

4.2 Details of implementation

In contrast with the work of ref. [10] where only the performance of a single proposed algorithm was investigated, in this article we are deviating from this approach and automatically building a family of algorithms. Each algorithm needs to be executed in order to evaluate its performance. Therefore, one of the downsides of this process is that we need to employ a training phase which requires a much longer time than several repetitions of a single algorithm. This then limits us to smaller-sized instances.

In the same spirit, a number of measures were also put into place to reduce processing time. First, an EA population size of 25 was used (with one slave processor being assigned to each population member). In addition, smaller EA iteration limits than those of [10] were set. On the training and testing instances, “maxsteps” is set to 50 for degrees 1, 2, and 3 of the polynomial f defining the number field K (which, of course, defines G ), and 100 for degrees 5 and 7. On the validation instances, “maxsteps” is set to 1,250 for degrees 1, 2, and 3, and 2,500 for degrees 5 and 7; this had the effect of a small decrease in the success rate of the EA compared to that of ref. [10] (and so the results are not directly comparable). The polynomials f used for the above degrees were x 1 , x 2 x 1 , x 3 x 1 , x 5 x 3 1 , and x 7 x 3 1 , being consistent with those of refs [17,10].

All instances are run with an initial word length of 10 generators (in EA generation 1) to avoid bias to the insertion operators which would occur with an initial length of 1 (for example). No instances of degree 9 or above were attempted due to the time complexity of computation and reduction of words in the groups concerned (for more details the interested reader should consult [10]). The number of instances used in each phase of the hyper-heuristic were 15 (training), 50 (testing), and 50 (validation). This number of validation instances is generally accepted in the hyper-heuristics community, though is application-dependent. The number of heuristic chains run by the hyper-heuristic is H max = 20 .

All experiments were run on a high-performance cluster containing Intel Xeon E5620 CPU 2.40 GHz processors. The hyper-heuristic was implemented in the GAP language [15], with the Polycyclic[34] package used for computation with polycyclic group elements. The ParGAP [16] package was used to handle MPI communications between processors. Due to the domain, popular hyper-heuristic packages such as Hyflex [35] are not suitable for use because we are using GAP, a specialist group theory language. As above, each experiment was run on 26 cores (1 “master” core to control, and 25 “slave” cores, one for each EA population member). The code of this section is available at https://github.com/MJCraven/Hyperheuristic_group, with the instances available at ref. [36].

5 Experimental results

In this section, hyper-heuristic experiments are run, varying initial input and instance parameters. To recap, the EA with the heuristic chains injected is then executed on the previously detailed 15 training instances. If the performance improves over that of previous heuristic chains, then it is run with the testing set (50 random instances). If the performance over this set improves over that of previous heuristic chains, then the current chain is assigned as the new best chain. This is continued until the end of the run, after which the chain is validated over the validation set of (a distinct set of) 50 random instances. For a single hyper-heuristic run, for each of 20 heuristic chains and, assuming at least one better heuristic chain is found, around 500 problem instances are run in total.

5.1 The best simple heuristic

An LBA attack (a hillclimber) was created for each simple heuristic. These attacks were run on a selection of random instances, with the percentage of successful runs as 1.7, 51.7, 0, 0, 1.7, and 1.7%, respectively, for H 1 H 6 . This indicates that a heuristic on its own, unless it builds appropriate solutions, is unlikely to be successful for a large set of random instances. In this case, H 2 seems to be more successful since it builds solutions by gradually increasing solution length. Hence, the hyper-heuristic is initialised with the chain composed solely of a single execution of H 2 . See Table 2 for algorithm parameter settings.

5.2 Observations on the evolution to build heuristic chains

The following details are presented for each experiment. The first column of Tables 3, 4, 5 is the degree of the polynomial f , a main instance parameter. The second column is the validation set metric (success rate, mean cost from unsuccessful runs, mean number of generations from successful runs to solve the instance) from the EA with the best known heuristic chain (insertion – H 2 ). Note that this is not a hillclimber (LBA) but the prior EA where the number of individuals per population created by heuristics H 1 H 6 is restricted, and the remaining individuals created by the applicable chain (a single repetition of H 2 ). The third column is the validation set metric from the EA with best injected heuristic chain found, followed by the iteration on which the best heuristic chain was found. The fifth column gives the chain, where H i k refers to k repeated executions of H i . The last column is the number of hyper-heuristic runs taken to find the best heuristic; unsuccessful runs were those for which either no better chain than H 2 was found or, more commonly, a better chain (on the grounds of testing and training performance) was found but performed worse than H 2 on the validation set. Some observations on the results are noted in the next subsection.

Table 3

Comparison of results on 50 validation instances

d Insertion ( H 2 ) GA with chain Iter Best chain # Runs
1 [100%, 0, 7.88] [100%, 0, 7.62] 6 H 2 H 1 H 4 4
2 [100%, 0, 157.08] [100%, 0, 96.04] 2 H 2 H 7 1
3 [100%, 0, 101.84] [100%, 0, 81.54] 16 H 5 H 3 2 H 7 H 2 H 5 2
5 [60%, 299.55, 491.23] [66%, 329.53, 695.39] 10 H 3 H 7 6
7 [32%, 476.44, 785.94] [42%, 557.90, 854.05] 6 H 6 H 3 H 7 1

The parameters used were N = 20 , L 1 = 10 , L 2 = 13 , L = 5 , as in Section 4.1. Those instances used by the present work are taken from the same distributions as those used by ref. [10].

Table 4

Comparison of results on 50 validation instances

d Insertion ( H 2 ) GA with chain Iter Best chain # Runs
1 [100%, 0, 3.92] [100%, 0, 3.60] 15 H 7 H 1 H 2 12
2 [100%, 0, 38.30] [100%, 0, 32.68] 18 H 2 H 3 1
3 [100%, 0, 80] [100%, 0, 66.60] 5 H 2 H 1 H 2 H 4 1
5 [76%, 25.67, 501.47] [92%, 20.75, 488.26] 13 H 7 H 3 H 2 H 1 H 5 H 4 H 7 H 5 1
7 [58%, 37.33, 585.34] [66%, 41.29, 497.39] 7 H 5 H 2 H 7 H 3 2 6

The parameters used were N = 5 , L 1 = 5 , L 2 = 8 , L = 5 .

Table 5

Comparison of results on 50 validation instances.

d Insertion ( H 2 ) GA with chain Iter Best chain # Runs
1 [100%, 0, 7.30] [100%, 0, 7.02] 9 H 5 4 H 4 H 1 3
2 [96%, 35, 141.25] [98%, 29, 163.51] 3 H 2 H 7 4
3 [92%, 37.5, 180.54] [96%, 37, 160.33] 20 H 3 H 5 2 H 3 H 5 H 3 H 1 H 5 H 6 1
5 [52%, 617.17, 577.38] [58%, 141.33, 888.03] 8 H 6 H 3 H 4 H 1 2
7 [12%, 344.64, 947.5] [18%, 289.76, 1115.89] 13 H 4 2 H 5 H 6 H 7 H 3 H 2 2

The parameters used were N = 5 , L 1 = 15 , L 2 = 18 , L = 5 .

5.3 Observations and discussion of results

By Tables 35, it is clear that the approach enables the creation of more successful heuristic chains than the EA of ref. [10]. Since the hyper-heuristic relies on a stochastic algorithm (the EA), some runs are more successful than others. For example, some hyper-heuristic runs may uncover several chains proving more successful than the initial heuristic chain (e.g. Table 4 with d = 7 ). On the other hand, however, some hyper-heuristic runs may discover no chains at all that are more effective than the initial heuristic (recall this information is recorded in the final column of Tables 35). This latter conclusion seems to be more common for d = 1 where a high percentage of instances is solved by the EA with the initial heuristic H 2 .

Note, in addition, that for many small d (e.g. d = 1 or d = 2 ), all problem instances are solved by the EA with the injected simple heuristic H 2 . Thus, the only option to improve performance, in the sense it is measured in this work, is to solve those instances in a smaller mean number of generations. For example, the degree d = 1 in Table 3 shows that 100% of problem instances are solved by the initial heuristic in a mean of 7.88 generations. This is improved marginally, solving all instances with a mean of 7.62 generations by the later chain H 2 H 1 H 4 . This suggests that for larger d 5 , for example, more “room for improvement” is possible by the hyper-heuristic.

As is often the case with EAs and hyper-heuristics, high performance computing is an advantage due to the large amounts of time required to solve a large number of instances. The parameter with the largest influence is the degree d (see ref. [10] for further details). All the above results exhibit an improvement over the results of ref. [10] (and so [17]). By the above results, there do not seem to be patterns formed in the best heuristic found and so it is probable that there do not exist chains that work better for one particular degree.

5.4 Characteristics of the framework

Through experience, and by the above analysis, some characteristics of the framework (in the context of the AAG problem and polycyclic groups defined by a number field) are observed.

First, to the best of the authors’ knowledge, random instances have not been classified in terms of difficulty. For example, an EA that solved instance A of a problem in an average of 3,000 generations (over, say, 10 repetitions) may well solve instance B , with the same instance parameters, in 100 generations. That is, for a given set of instance parameters ( N , L , L 1 , and L 2 ) there is a large variability in difficulty for randomly-generated problems. In the experience of the authors, this effect seems to worsen for higher degrees. Recall that L is the key length in the subgroup A G . Due to the lengths L 1 and L 2 (in G ) of elements in A G the length of the key may be rather large after mapping to its image in G . Combined with the relator lengths in the presentations of the groups, this makes problem hardness difficult to classify. This imposes a constraint on the hyper-heuristic, since a consistent measure of performance over a small number of instances is difficult to obtain. Hence, a relatively large number of instances are needed, at least on the testing and validation.

Combinatorial optimisation problems typically have an objective function where, when a small change is made to the input, there is a correspondingly small change in the output value. This is reflected in the so-called “deep-valley hypothesis” [37]. This property is often assumed when metaheuristics are applied, as metaheuristics typically make a small change to the solution in order to bring about a small improvement in the objective values. However, the objective function in this article is, because of the group presentations used, unlikely to display the deep-valley property and this means that the feedback provided by a more rugged “landscape” does not guide the search as efficiently. This is manifested by a heuristic chain having a low success rate on the training instances but also having a high success rate on the testing instances or vice versa.

Hyper-heuristics may be applied to continuous optimisation problems, where real-valued feedback from the objective function may guide the search process. The situation is more complex for the current optimiser since the objective value is discrete: that is, the optimiser has either solved a given instance or it has not. This work goes some way to ameliorate this issue by including the least EA cost reached as part of the performance metric. The hyper-heuristic is hill-climbing in the space of heuristic chains. In the next section, the article is concluded.

6 Concluding remarks

6.1 Summary

This work exhibits the automatic generation of novel heuristic chains to improve an existing EA which has previously been demonstrated to effectively attack a given KEP. That is, this approach is a framework for learning (i.e. generating and testing in a hyper-heuristic setting) cryptanalytic attacks. We are not proposing a single algorithm to tackle this problem. Our stance is distinctly different: we propose a framework to automatically generate algorithms for the attack. One of the advantages of this approach is that it automates the rather mechanical task of generating new attack algorithms for which there are often few design principles to guide us.

This is thus an ideal match for a generative hyper-heuristic approach where novel algorithms can be freely and easily generated. This avoids the task of manually generating algorithms for which we often have scant means of evaluating their effectiveness other than actually testing them out on problems of interest. An evaluation metric is all a hyper-heuristic needs to produce new heuristic chains.

6.2 Contributions

This article makes the following key contributions to the field:

  1. The proposal that hyper-heuristics are a suitable framework in which to generate and test heuristic chains to break a KEP.

  2. The implementation and application of a hyper-heuristic to automatically build chains of simple heuristics to break a given KEP.

  3. The demonstration that chains of simple heuristics trained on one set of problem instances can then generalise to solve a second independent set of problem instances.

Note that we do not claim that we have produced the best possible heuristic, but an algorithmic framework which can generate variations in existing well-performing algorithms.

6.3 Further work

In the realms of further work, we would like to generalise the hyper-heuristic framework to work towards proving or disproving security of proposed group-theoretic KEPs. The framework exhibited is expandable, enabling other groups and group-theoretic problems to be used as “plug ins.” Possible other uses could be to show that some proposed KEPs prove resistant to LBA attacks (i.e KEPs for which the hyper-heuristic does not yield high-performing heuristic chains after many runs). For example, would the conjugacy search problem in finitely generated metabelian groups or generalised metabelian Baumslag–Solitar groups [38] be breakable (the authors in a preprint suggest that LBA algorithms are ineffective) with the approach? Similarly, would the n -root and subgroup membership search problems in polycyclic groups [39], or the conjugacy problem and hidden subgroup problem in Engel groups [40], be breakable? Further, by ref. [41], there are open questions related to complexity of some problems in polycyclic groups (power conjugacy problem or geodesic length problem, for example) or other problems that may be used in KEPs. The complexity of the above problems may be analysed using the hyper-heuristic framework, potentially giving further information about the exact solutions of these problems (if they exist) and complementing such key-finding approaches as refs [42] and [43].

It is hoped this work may encourage machine learning and hyper-heuristic approaches in cryptology. For example, many platform groups have algebraic structure which makes it possible to find exact solutions to the given problem. This requires specialist knowledge about those structures. If a new platform group is proposed then the knowledge of that group may be insufficient for cryptographic analysis. Some of this structure, or techniques for such analyses, may be discoverable by our approach and in a faster and more automated way than traditional mathematical methods of discovery. While this technique may seem rudimentary, as the price of human discovery increases with inflation and the cost of computing decreases according to Moore’s law, this type of approach will become more cost effective and beneficial.

Acknowledgements

The authors gratefully acknowledge the Centre for Mathematical Sciences at the University of Plymouth and the Operational Research Group at Queen Mary University of London for their generous research support and encouragement. Thanks also go to the reviewers for their helpful comments.

  1. Funding information: None declared.

  2. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  3. Conflict of interest: The authors state no conflict of interest.

  4. Data availability statement: Locations for the code and data used in this article are specified at the end of Section 4.

References

[1] Anshel I , Anshel M , Fisher B , Goldfeld D . New key agreement protocols in braid group cryptography. In Cryptographers’ Track at the RSA Conference. Berlin Heidelberg: Springer; 2001. p. 13–27. 10.1007/3-540-45353-9_2Search in Google Scholar

[2] Anshel I , Anshel M , Goldfeld D . An algebraic method for public-key cryptography. Math Res Lett. 1999;6:287–92. 10.4310/MRL.1999.v6.n3.a3Search in Google Scholar

[3] Ko KH , Lee SJ , Cheon JH , Han JW , Kang JS , Park C . New public-key cryptosystem using braid groups. In Annual International Cryptology Conference. Berlin Heidelberg: Springer; 2000. p. 166–83. 10.1007/3-540-44598-6_10Search in Google Scholar

[4] Eick B , Kahrobaei D . Polycyclic groups: a new platform for cryptology? arXiv:http://arXiv.org/abs/math/0411077, 2004. Search in Google Scholar

[5] Kahrobaei D , Lam HT . Heisenberg groups as platform for the aag key-exchange protocol. In 2014 IEEE 22nd International Conference on Network Protocols (ICNP). Piscataway: IEEE; 2014. p. 660–4. 10.1109/ICNP.2014.105Search in Google Scholar

[6] Garber D , Kaplan S , Teicher M , Tsaban B , Vishne U . Probabilistic solutions of equations in the braid group. Adv. Appl. Math. 2005;35:323–34. 10.1016/j.aam.2005.03.002Search in Google Scholar

[7] Garber D , Kaplan S , Teicher M , Tsaban B , Vishne U . Length-based conjugacy search in the braid group. Contemporary Math. 2006;418:75. 10.1090/conm/418/07947Search in Google Scholar

[8] Franco N , González-Meneses J . Conjugacy problem for braid groups and garside groups. J Algebra. 2003;266(1):112–32. 10.1016/S0021-8693(03)00292-8Search in Google Scholar

[9] Craven MJ , Jimbo HC . Evolutionary algorithm solution of the multiple conjugacy search problem in groups, and its applications to cryptography. Groups-Complexity-Cryptol. 2012;4(1):135–65. 10.1515/gcc-2012-0002Search in Google Scholar

[10] Craven MJ , Robertz D . A parallel evolutionary approach to solving systems of equations in polycyclic groups. Groups Complexity Cryptol. 2016;8(2):109–25. 10.1515/gcc-2016-0012Search in Google Scholar

[11] Kotov M , Ushakov A . Analysis of a certain polycyclic-group-based cryptosystem. J Math Cryptol. 2015;9(3):161–7. 10.1515/jmc-2015-0013Search in Google Scholar

[12] Myasnikov AD , Ushakov A . Length based attack and braid groups: cryptanalysis of Anshel–Anshel–Goldfeld key exchange protocol. In International Workshop on Public Key Cryptography. Berlin: Springer; 2007. p. 76–88. 10.1007/978-3-540-71677-8_6Search in Google Scholar

[13] Myasnikov A , Shpilrain V , Ushakov A . Group-based cryptography. Advanced Courses in Mathematics CRM Barcelona. Berlin Heidelberg: Springer Science and Business Media; 2008. Search in Google Scholar

[14] Ruinskiy D , Shamir A , Tsaban B . Length-based cryptanalysis: The case of Thompson’s group. J Math Cryptol. 2007;1(4):359–72. 10.1515/jmc.2007.018Search in Google Scholar

[15] The GAP Group. Gap - groups, algorithms, and programming, version 4.8.7. 2015. http://www.gap-system.org Search in Google Scholar

[16] Cooperman G . Pargap, version 1.4.0. Available from: http://www.gap-system.org/Packages/pargap.html, 2013. Search in Google Scholar

[17] Garber D , Kahrobaei D , Lam HT . Length-based attacks in polycyclic groups. J Math Cryptol. 2015;9(1):33–43. 10.1515/jmc-2014-0003Search in Google Scholar

[18] Woodward JR , Swan J . Automatically designing selection heuristics. In Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation. New York: ACM; 2011. p. 583–90. 10.1145/2001858.2002052Search in Google Scholar

[19] Woodward JR , Swan J . The automatic generation of mutation operators for genetic algorithms. In Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation. New York: ACM; 2012. p. 67–74. 10.1145/2330784.2330796Search in Google Scholar

[20] Hughes J , Tannenbaum A . Length-based attacks for certain group based encryption rewriting systems. Workshop SECI02, Tunis, Tunisia. 2002. Search in Google Scholar

[21] Shpilrain V , Zapata G . Using decision problems in public key cryptography. Groups-Complexity-Cryptol. 2009;1(1):33–49. 10.1515/GCC.2009.33Search in Google Scholar

[22] Blaney KR , Nikolaev A . A PTIME solution to the restricted conjugacy problem in generalized Heisenberg groups. Groups Complexity Cryptol. 2016;8(1):69–74. 10.1515/gcc-2016-0003Search in Google Scholar

[23] Burke EK , Hyde MR , Kendall G , Ochoa G , Özcan E , Woodward JR . Exploring hyper-heuristic methodologies with genetic programming. In Computational Intelligence. New York: Springer; 2009. p. 177–201. 10.1007/978-3-642-01799-5_6Search in Google Scholar

[24] Bai R . A model for fresh produce shelf space allocation and inventory management with freshness condition dependent demand. INFORMS J Comput. 2007;20(1):78–85. 10.1287/ijoc.1070.0219Search in Google Scholar

[25] Bilgin B , Özcan E , Korkmaz EE . An experimental study on hyper-heuristics and exam timetabling. In International Conference on the Practice and Theory of Automated Timetabling. Berlin: Springer; 2006. p. 394–412. 10.1007/978-3-540-77345-0_25Search in Google Scholar

[26] Ross P , Schulenburg S , Mariiin-Bläzquez JG , Hart E . Hyper-heuristics: learning to combine simple heuristics in bin-packing problems. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation. New York: ACM; 2002. p. 942–8. Search in Google Scholar

[27] Burke EK , Kendall G , Soubeiga E . A tabu-search hyperheuristic for timetabling and rostering. J Heurist. 2003;9(6):451–70. 10.1023/B:HEUR.0000012446.94732.b6Search in Google Scholar

[28] Ross P . Hyper-heuristics. In Search methodologies. Berlin: Springer; 2005. p. 529–56. 10.1007/0-387-28356-0_17Search in Google Scholar

[29] Burke EK , Gendreau M , Hyde M , Kendall G , Ochoa G , Özcan E , Qu R . Hyper-heuristics: a survey of the state of the art. J Operat Res Soc. 2013;64(12):1695–724. 10.1057/jors.2013.71Search in Google Scholar

[30] Burke EK , Hyde MR , Kendall G , Ochoa G , Özcan E , Woodward JR . A classification of hyper-heuristic approaches: revisited. In Handbook of Metaheuristics. New York: Springer; 2019. p. 453–77. 10.1007/978-3-319-91086-4_14Search in Google Scholar

[31] Goldman BW , Tauritz DR . Self-configuring crossover. In Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation. New York: ACM, 2011. p. 575–82. 10.1145/2001858.2002051Search in Google Scholar

[32] Hong L , Woodward J , Li J , Özcan E . Automated design of probability distributions as mutation operators for evolutionary programming using genetic programming. In European Conference on Genetic Programming. Berlin Heidelberg: Springer; 2013. p. 85–96. 10.1007/978-3-642-37207-0_8Search in Google Scholar

[33] Haraldsson SO , Woodward JR . Automated design of algorithms and genetic improvement: contrast and commonalities. In Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation. New York: ACM; 2014. p. 1373–80. 10.1145/2598394.2609874Search in Google Scholar

[34] Eick B , Nickel W , Horn M . Polycyclic, version 2.11. Available from http://www.gap-system.org/Packages/polycyclic.html, 2013. Search in Google Scholar

[35] Ochoa G , Hyde M , Curtois T , Vazquez-Rodriguez J , Walker J , Gendreau M , et al. Hyflex: a benchmark framework for cross-domain heuristic search. In Hao J-K , Middendorf M , editors. European Conference on Evolutionary Computation in Combinatorial Optimisation, LNCS 7245. Berlin: Springer; 2012. p. 136–47. 10.1007/978-3-642-29124-1_12Search in Google Scholar

[36] Craven MJ , Woodward JR . Instances for group-theoretic cryptology. 2020. Available at https://pearl.plymouth.ac.uk/handle/10026.1/15752.Search in Google Scholar

[37] Hains DR , Whitley LD , Howe AE . Revisiting the big valley search space structure in the tsp. J Operat Res Soc. 2011;62(2):305–12. 10.1057/jors.2010.116Search in Google Scholar

[38] Gryak J , Kahrobaei D , Martinez-Perez C . On the conjugacy problem in certain metabelian groups. Glasgow Math J. 2019;61(2):251–69. 10.1017/S0017089518000198Search in Google Scholar

[39] Gryak J , Kahrobaei D . The status of polycyclic group-based cryptography: a survey and open problems. Groups Complexity Cryptol. 2016;8 (2): 171–86. Preprint version at https://arxiv.org/abs/1607.05819v2.10.1515/gcc-2016-0013Search in Google Scholar

[40] Kahrobaei D , Noce M . Algorithmic problems in engel groups and cryptographic applications. Int J Group Theory. 2020;8(4):231–50. Search in Google Scholar

[41] Gryak J , Haralick RM , Kahrobaei D . Solving the conjugacy decision problem via machine learning. Exp Math. 2020;29(1):66–78. 10.1080/10586458.2018.1434704Search in Google Scholar

[42] Myasnikov A , Roman’kov V . A linear decomposition attack. Groups Complexity Cryptol. 2015;7(1):81–94. 10.1515/gcc-2015-0007Search in Google Scholar

[43] Roman’kov V . A nonlinear decomposition attack. Groups Complexity Cryptol. 2016;8(2):197–207. 10.1515/gcc-2016-0017Search in Google Scholar

Received: 2021-05-22
Revised: 2021-05-22
Accepted: 2021-09-28
Published Online: 2021-10-26

© 2022 Matthew J. Craven and John R. Woodward, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 15.5.2024 from https://www.degruyter.com/document/doi/10.1515/jmc-2021-0017/html
Scroll to top button