1 Introduction

Correlation clustering is the problem of clustering the vertices of a graph whose edges are assigned positive-type and negative-type real-valued weights that express, respectively, positive and negative evidence of placing the endpoints of an edge in the same cluster (Bansal et al. 2004; Bonchi et al. 2022). Two formulations of correlation clustering exist: the minimization one (Min-CC) aims at minimizing the sum of the negative-type intra-cluster edge weights plus the sum of the positive-type inter-cluster edge weights; in the maximization counterpart (Max-CC), the objective is dual, i.e., to maximize the sum of the positive-type intra-cluster edge weights plus the sum of the negative-type inter-cluster edge weights. Correlation clustering has been extensively studied from a theoretical point of view, and it has been applied in numerous real-world scenarios (Bonchi et al. 2014; Pandove et al. 2018).

Correlation clustering with unknown edge weights. Traditionally, in correlation clustering it is assumed that the edge weights are all given as input; for instance, they could have been derived from, e.g., past user-interaction history, crowdsourcing, experimental trials, and so on. This has the disadvantage that clustering has to be performed after that all the weights are available, which is unfeasible in several real contexts. To overcome this, here we focus for the first time on a correlation-clustering setting where edge-weight assessment is carried out while performing the clustering.

We devise the following scenario. Edge weights are random variables whose probability distributions and means are unknown and do not change during the whole process. An estimate of the mean of the edge-weight distributions is maintained. Initial estimates are randomly generated or computed based on prior knowledge. There are multiple rounds of clustering. A clustering performed at any round gives feedback on how to adjust the mean estimates, so as to make them improve round after round. The rationale is that, placing a clustering, actual interactions among the vertices can be observed, and hence used as a real evidence to profitably update the mean estimates. More specifically, in Min-CC (resp. Max-CC) one gets feedback about the negative-type (resp. positive-type) intra-cluster edge weights and the positive-type (resp. negative-type) inter-cluster edge weights. A clustering at every round may be computed by taking into account the current mean estimates (exploitation) based on an (exact or approximate) oracle; alternatively, a clustering can be yielded without looking at the mean estimates, so as to get feedback on edge weights for which limited knowledge has been acquired so far (exploration). In our context, alternating between exploiting the oracle with estimated weights to determine a clustering and observing the feedbacks on the edges of the graph induced by this clustering, makes it possible to improve the estimate of the edge weights, since more observed data are collected upon which the estimates are computed.

Both exploitation and exploration have pros and cons. The former yields clusterings that rely on established—but partial—knowledge. The latter allows for expanding the current knowledge, which is supposed to yield better-quality clusterings in the next rounds, but it also may lead to possibly inaccurate clusterings in the first rounds.

Getting the best exploration-exploitation tradeoff is a key desideratum. The effectiveness of such a tradeoff is measured by the (expected) cumulative quality of the clusterings produced in all the rounds. This is the ultimate objective to be optimized, and a major challenge in the design of proper algorithms.

It should be noted that the aforementioned exploration-exploitation tradeoff refers to a reinforcement learning paradigm, which, in this work, we adopt by resorting to the Combinatorial Multi-Armed Bandit (CMAB) framework (Chen et al. 2016, 2018a; Kveton et al. 2015a, b; Lagrée et al. 2016; Wang and Chen 2017; Xu et al. 2020). The CMAB framework has been contextualized to several specific problems, including influence maximization (Chen et al. 2016; Vaswani and Lakshmanan 2015; Wu et al. 2019), community detection (Mandaglio and Tagarelli 2019a, b), community exploration (Chen et al. 2018b), shortest-path discovery (Talebi et al. 2017), feature selection (Liu et al. 2021). In our previous work (Mandaglio et al. 2020), we devise non-CMAB algorithms for a correlation-clustering problem variant in which interactions between entities are characterized by known input probability distributions and conditioned by external factors within the environment where the entities interact. Remarkably, none of those settings is any close to the one we consider in this work, i.e., devising a CMAB framework for (correlation) clustering.

Applications. The domain we consider in this work finds application in all those contexts where it is not preferable (or not permitted) to wait until edge weights have been produced before performing a clustering. Rather, it is desired to place clustering solutions soon, learn the weights along the way, and tolerate that the clustering quality will be less good in the initial rounds, while getting improved as the rounds go by.

For instance, we might consider a team formation scenario, where individuals need to be organized (clustered) into teams (Juárez et al. 2022). Individuals are associated with technical/soft skills which are required for task assignments within the teams. Any two individuals exhibiting a certain skill-level similarity should be assigned to the same team, and conversely to different teams if they are dissimilar to each other; clearly, given the variety of skills and their compatibility levels, the exact degree of matching between two individuals’ skills are not a-priori known at the beginning of the team formation process, and indeed similarities should be learned through team-formation history. In this regard, individuals collaborate with both their teammates and individuals from other teams, for, e.g., general coordination purposes. A desirable goal is to establish teams so as to maximize the overall (i.e., intra-team plus inter-team) similarity between pairs of individuals. This is a problem that can easily be casted to correlation clustering, where vertices correspond to individuals, clusters correspond to teams, and the positive-type and negative-type edge weights correspond to intra-team and inter-team similarities, respectively. Note that empathy (i.e., mutable) characteristics between individuals are here discarded, since this may cause drift in the likelihood that they (dis)like each other once they are (temporarily) members of the same team, i.e., edge weights would change through the rounds. In this regard, an analogous scenario is the task allocation for robots, each of which is programmed to handle a number of operations. Correlation clustering would be helpful to enable forming coalitions between robots, in order to allocate them to tasks to be completed optimally according to some efficiency requirements.

Two further example scenarios are commercial scheduling (e.g., Bollapragada and Garbiras 2004; Giallombardo et al. 2016) and shelf space allocation (e.g., Hübner et al. 2021). The former consists in optimally assigning a set of commercials to fill in each advertisement slot scheduled by a TV broadcaster, where vertices are commercials and edge weights denote marketing-driven benefits in assigning, resp. separating, any two commercials within the same, resp. to different slots; edge weights might initially be estimated by accounting for requirements provided by either the brand customers and the TV braodcaster, then the weights will be adjusted by observing the feedback provided by (online) market-surveys (e.g., delivered to targeted audience of the TV programme schedule). Shelf space allocation is to model the shelf space dimensioning and positioning for allocating selected products based on practical retail requirements, so as to maximize the product selling; here, the feedback observed from the selling outcomes, as well as from how the customers welcome or not the retailer choices, would be related to the opportunity of ensuring brand visibility, or improving customers satisfaction.

All the aforementioned scenarios correspond to well-known optimization problems in operation research and related fields; nonetheless, they are also key-enabling in emerging contexts, such as the development of smart production systems brought by Industry 4.0 (Grillo et al. 2022). However, such problems have not commonly been addressed in terms of correlation clustering; and the few existing exceptions (e.g., Dutta et al. (2019)) are far from a CMAB perspective—unlike we study in this work—which needs to be profitably adopted since correlation-clustering weights are unlikely to be apriori known.

Contributions. The one we deal with in this work is a natural reinforcement-learning scenario, which, to the best of our knowledge, has never been considered in the context of correlation clustering. We tackle it by designing—for the first time—a Combinatorial Multi-Armed Bandit (CMAB) (Chen et al. 2016) framework for correlation clustering. In doing so, we achieve a mix of modeling, algorithmic, technical, and empirical contributions, including principled framework design and problem formulations, design and theoretical analysis of algorithms, tricks to make the algorithms work in practice, experimental evaluation. More in details, our main contributions are as follows:

  • We novelly formulate correlation clustering in a CMAB setting, by providing a contextualization of the main ingredients of a typical CMAB framework and CMAB formulations for both Min-CC and Max-CC (Sect. 3). Among others, a key impactful consequence of this contribution is that it enables the use of generalist CMAB approximation algorithms/heuristics for CMAB-based Min-CC/Max-CC with minimum customization effort. In this regard, we show how the popular Combinatorial Upper Confidence Bound (CUCB) method (Chen et al. 2016) can be employed in the context of Max-CC (Appendix A.2).

  • We introduce the Combinatorial Lower Confidence Bound (CLCB) method, which can be viewed as the counterpart of CUCB for minimization problems, and show how to suitably customize it in order to handle Min-CC instances (Sect. 4).

  • The effectiveness of a CMAB algorithm is typically assessed in terms of regret, i.e., a measure of how far the (expected) cumulative quality of the solutions yielded by an algorithm is from the optimal cumulative quality. In this regard, Chen et al. (2016) provide a regret analysis of the CUCB method, which shows that, if the underlying combinatorial-optimization problem satisfies certain properties, CUCB is guaranteed to achieve a regret that is at most logarithmic in the number of clustering rounds, in presence of an approximation oracle. Here, we build upon Chen et al.’s result and show that:

    • Our CMAB formulation of Max-CC satisfies Chen et al.’s properties, thus, CUCB achieves logarithmic regret for Max-CC as well (Appendix A.2.1).

    • We devise a principled regret definition for Min-CC. According to this definition, we also provide a regret analysis that, along the lines of Chen et al.’s analysis for CUCB, proves that CLCB achieves logarithmic regret in the number of clustering rounds for Min-CC in presence of an approximation oracle (Sect. 4.1). Our regret definition and analysis for Min-CC are general enough to be reused in any minimization CMAB problems with approximation oracle. This is a per-se contribution, as, to the best of our knowledge, no regret definitions/analyses for CMAB minimization problems (with approximation oracle) exist in the literature.

  • We further investigate the applicability of the CLCB-like algorithm in practice (Sect. 4.2). A key desideratum in this regard is to employ the traditional Pivot algorithm for Min-CC  (Ailon et al. 2008) as an (approximation) oracle within CLCB, for its efficiency, theoretical yet empirical effectiveness, and ease of implementation. A major challenge here is that, to achieve its approximation guarantees (and to provide effective solutions in practice too), Pivot needs the input edge weights to satisfy the probability constraint. Unfortunately, the CLCB algorithm does not guarantee the fulfilment of this constraint at every round. Thus, we design novel variants of the basic CLCB where the correlation-clustering problem instances to be given as input to Pivot (are close to) meet the probability constraint.

  • We conduct an extensive evaluation to experimentally test the performance of CMAB correlation-clustering algorithms, including the algorithms devised in this work, as well as (correlation-clustering-customized) popular CMAB heuristics, such as \(\epsilon\)-greedy, pure exploitation, and Combinatorial Thompson Sampling (Wang and Chen 2018) (Sects. 57). We consider the Min-CC context only, due to the availability of practical approximation oracles (unlike Max-CC). Results show superior and close accuracy of CMAB methods over non-CMAB baselines and a reference method that performs correlation clustering with the true edge weights, respectively. Also, the per-round runtime of CMAB methods is (at worst) comparable to the runtime of executing a linear-time correlation-clustering algorithm once.

Section 2 discusses background and related work. Section 8 concludes the paper.

2 Background and related work

2.1 Correlation clustering

The minimization (Min-CC) and maximization (Max-CC) formulations of correlation clustering aim at minimizing disagreements and maximizing agreements, respectively. They are formally defined as follows:

Problem 1

(Min-CC  (Ailon et al. 2008)) Given a graph \(G=(V, E)\), and nonnegative weights \(w _{uv}^+\), \(w _{uv}^- \in \mathbb {R}_0^+\) for each edge \((u, v) \in E\), find a clustering \({\mathcal {C}}^*: V \rightarrow {\mathbb {N}}^+\) such that:

$$\begin{aligned} {\mathcal {C}}^* \ = \ {\text {argmin}}_{\mathcal {C}}~~ f_{min} ({\mathcal {C}}) \ = \ {\text {argmin}}_{\mathcal {C}} \sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}(u) = {\mathcal {C}}(v) \end{array}} w ^-_{uv} \quad + \!\!\! \sum _{\begin{array}{c} (u, v) \in E, \\ {\mathcal {C}}(u) \ne {\mathcal {C}}(v) \end{array}} w ^+_{uv}. \end{aligned}$$
(1)

Problem 2

(Max-CC  (Ailon et al. 2008)) Given a graph \(G=(V, E)\), and nonnegative weights \(w _{uv}^+\), \(w _{uv}^- \in \mathbb {R}_0^+\) for each edge \((u, v) \in E\), find a clustering \({\mathcal {C}}^*: V \rightarrow {\mathbb {N}}^+\) such that:

$$\begin{aligned} {\mathcal {C}}^* \ = \ {\text {argmax}}_{\mathcal {C}}~~ f_{max} ({\mathcal {C}}) \ = {\text {argmax}}_{\mathcal {C}}\sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}(u) = {\mathcal {C}}(v) \end{array}} w ^+_{uv} \quad + \!\!\! \sum _{\begin{array}{c} (u, v) \in E, \\ {\mathcal {C}}(u) \ne {\mathcal {C}}(v) \end{array}} w ^-_{uv}. \end{aligned}$$
(2)

In the above problems, and hereinafter, we let a clustering be represented as an injective function that expresses cluster-membership for the vertices in V.

Min-CC and Max-CC are equivalent in terms of optimality and complexity class [both \({{\textbf{N}}}{{\textbf{P}}}\)-hard  (Bansal et al. 2004; Shamir et al. 2004)], but have different approximation-guarantee properties, with the latter being easier in this regard. On general edge weights, both Min-CC and Max-CC are \(\textbf{APX}\)-hard  (Charikar et al. 2005), with Max-CC admitting constant-factor approximation algorithms (Charikar et al. 2005; Swamy 2004), and with the best known approximation factor for Min-CC being \({\mathcal {O}} (\log |V|)\) (and unlikely to be meliorable) (Charikar et al. 2005; Demaine et al. 2006).

When restrictions on weights are imposed, the problems become more tractable. For instance, in the seminal work by Bansal et al. (2004), which requires the input graph to be complete, and the weights to be binary and with exactly one nonzero weight for each weight pair (i.e., \(\forall u,v \in V\), \((w _{uv}^+, w _{uv}^-) \!\in \! \{(0,1),(1,0)\}\)), Max-CC admits a PTAS (Bansal et al. 2004), and Min-CC admits constant-factor approximations (Ailon et al. 2008; Bansal et al. 2004; Charikar et al. 2005; Chawla et al. 2015; van Zuylen and Williamson 2007) [although still remaining \(\textbf{APX}\)-hard  (Charikar et al. 2005)].

Attention has also been devoted to weight bounds that go beyond Bansal et al.’s ones, but are still restrictive enough to allow Min-CC to achieve constant-factor guarantee. These include weights satisfying the probability constraint (i.e., \(w ^+_{uv} + w ^-_{uv} = 1\), \(\forall u,v \in V\)) (Ailon et al. 2008), generalizations of it (i.e., \(\forall u,v \in V\), \(w ^+_{uv} \le 1\), \(w ^-_{uv} \le h\) for some \(h \in [1,+\infty )\), and \(w _{uv}^+ + w _{uv}^- \ge 1\)) (Puleo and Milenkovic 2015), triangle inequality (i.e., \(w ^-_{uz} \le w ^-_{uv} + w ^-_{vz}\), \(\forall u,v,z \in V\)) (Ailon et al. 2008), or global constraints (Mandaglio et al. 2021). The probability constraint is particularly appealing: in fact, under such a constraint, Pivot–a randomized algorithm for Min-CC that is widely recognized for its theoretical guarantees, efficiency, and ease of implementation—achieves a 5-approximation (in expectation) (Ailon et al. 2008). Coupling the probability constraint with triangle inequality lowers Pivot ’s (expected) approximation factor to 2 (Ailon et al. 2008).

Although considering various types of weight, all the above works still assume that edge weights are all available as input. In this work, we go beyond this limitative view, and focus on the context where edge weights are not available beforehand, but they have to be discovered while performing (multiple rounds of) clustering.

Beyond basic correlation clustering. Several extensions to the basic correlation-clustering formulations have been studied, including constrained/relaxed formulations (e.g., constraining the number/size of clusters, allowing overlapping clusters), and adaptations to more sophisticated types of graph (e.g., bipartite graphs, labeled graphs, multilayer graphs, hypergraphs) or nonconventional computational settings (e.g., online, parallel, streaming). We point the interested reader to Bonchi et al. (2014, 2022); Pandove et al. (2018) for more details on these advanced topics. Here, let us just discuss the problem of query-efficient correlation clustering (Bressan et al. 2019; García-Soriano et al. 2020), which, to our knowledge, is the only correlation-clustering extension that exhibits some (slight) similarity with the setting we study in this work. Query-efficient correlation clustering assumes that edge weights are discovered by querying an oracle, and the goal is to cluster the input graph by using a limited budget of Q queries (\(Q \ll {\mathcal {O}} (|V|^2)\)). Although it is still assumed that edge weights are not available beforehand (like in our setting), query-efficient correlation clustering focuses on a scenario that remains profoundly different from the one tackled in this work. In fact, it considers a hard limit Q on the number of edge weights that can be ultimately discovered, which is a restriction that is not present in our setting. Moreover, the feedback on edge weights is given by an oracle, which provides true edge weights for any query, at any time. Instead, in our setting, the feedback consists in a sample of the weight distributions that are used to update the weight estimates, and is provided by the clustering itself (there is no oracle). Finally, existing approaches to query-efficient correlation clustering (Bressan et al. 2019; García-Soriano et al. 2020) handle binary weights only.

2.2 Combinatorial multi-armed bandit

Combinatorial Multi-Armed Bandit (CMAB) is a popular reinforcement-learning framework to learn how to perform actions by exploring/exploiting the feedback from an environment (Chen et al. 2016). It extends basic Multi-Armed Bandit (MAB) (Berry and Fristedt 1985) so that the actions to be performed/learned correspond to combinatorial structures (superarms) that are defined on top of simpler, basic actions (base arms). Specifically, a CMAB instance consists of m base arms. Each base arm i is assigned a set \(\{X_{i,t}\mid 1\le i \le m, 1 \le t \le T \}\) of random variables, where T is the number of rounds. The support of each \(X_{i,t}\)—assumed to range from [0, 1]—indicates the random “outcome” of playing base arm i in round t. This outcome is interpreted as a feedback from the environment and used to carry out the learning process. The random variables \(\lbrace X_{i,t} \rbrace _{t=1}^T\) of the same arm i are independent and identically distributed, according to some unknown distribution with unknown expectation \(\mu _i\). Random variables of different base arms may be dependent or distributed with different laws. Estimates \(\lbrace {\hat{\mu }}_i \rbrace _{i=1}^m\) of the true unknown \(\lbrace \mu _i \rbrace _{i=1}^m\) expectations are kept (and updated) at every round.

A CMAB instance also includes a set \({\mathcal {A}} \subseteq 2^{[m]}\) of possible superarms. \({\mathcal {A}}\) is typically defined as the subset of all subsets of base arms satisfying certain constraints. At each round t, a superarm \(A_t \in {\mathcal {A}}\) is played and the outcomes of the random variables \(X_{j,t}\), for all the base arms \(j \in A_t\), are observed. These outcomes can be used to update the knowledge on the estimates \(\lbrace {\hat{\mu }}_j \rbrace _{j \in A_t}\). Playing a superarm \(A_t\) gives a reward \(R_t(A_t)\), which is a random variable defined as a function of the outcomes of \(A_t\)’s base arms. \(R_t(A_t)\) may simply be a summation \(\sum _{j \in A_t} X_{j,t}\) of the outcomes of \(A_t\)’s base arms, but more complex (possibly nonlinear) rewards are allowed. In any case, it is often assumed that the expectation \({\mathbb {E}}[R_t(A_t)]\) is a function of only \(A_t\)’s base arms and all the \(\lbrace \mu _i \rbrace _{i = 1}^m\) (true) expectations. For minimization problems, the reward can be replaced by a notion \(L_t(A_t)\) of loss. The adaptation is straightforward.

The objective of a CMAB algorithm is to select a superarm to be played at every round, so as to maximize the cumulative expected reward obtained in all the rounds, i.e., \({\mathbb {E}}[\sum _{t = 1}^T R_t(A_t)]\). With this ultimate goal in place, a superarm \(A_t\) can be chosen by either exploiting the knowledge acquired from the outcomes of previous rounds, or exploring arms that have not been played much. Here is the exploration-exploitation tradeoff that usually appears in reinforcement-learning scenarios: a key design principle of any CMAB algorithm consists in deciding to what extent it should pick the arms that have provided good rewards so far (exploitation) or select different arms with the aim of getting even better rewards (exploration).

As for exploitation-aware superarms, it is assumed the availability of an oracle that computes a superarm based on the current estimates \(\lbrace {\hat{\mu }}_i \rbrace _{i = 1}^m\) of the base-arm expectations and the knowledge it possesses on the specific problem at hand. The oracle can be exact, i.e., it outputs \(A_t^* =\) \({\text {argmax}}_{A \subseteq {\mathcal {A}}} {\mathbb {E}}[{\hat{R}}_t(A)]\), or an \((\alpha ,\beta )\)-approximation one, for some \(\alpha , \beta \le 1\), i.e., it outputs a superarm \(A_t\) so that \(\Pr [{\mathbb {E}}[{\hat{R}}_t(A_t)] \ge \alpha ~{\mathbb {E}}[{\hat{R}}_t(A_t^*)]] \ge \beta\) (where \({\hat{R}}_t(\cdot )\) denotes the reward computed based on \(\lbrace {\hat{\mu }}_i \rbrace _{i = 1}^m\)).

The effectiveness of a (C)MAB algorithm is typically measured in terms of the so-called regret metric, which corresponds to the difference in the cumulative expected reward between always playing the optimal arm (possibly scaled by factors \(\alpha\) and \(\beta\) in case of \((\alpha ,\beta )\)-approximation oracles) and playing arms according to the algorithm. A major theoretical desideratum in this regard consists in providing a suitable regret analysis, which guarantees that the algorithm at hand achieves a certain bounded regret. The seminal work by Chen et al. (2016) shows that it is possible to design CMAB algorithms achieving \({\mathcal {O}} (\log T)\) regret,

and that this is a tight bound.

Regret definitions and analyses for CMAB maximization problems exist for both exact and approximation oracles (Chen et al. 2016; Wang and Chen 2017). As for minimization problems, to the best of our knowledge, they have been devised for exact oracles only (Cesa-Bianchi and Lugosi 2012; Talebi et al. 2017). In this work, we provide for the first time a regret analysis for a minimization problem (Min-CC) with approximation oracle. The generality of our regret definition and analysis make us believe that this is a contribution of interest for CMAB minimization problems in general, not only for (correlation) clustering.

3 Problem definition

In this section we provide the details of the proposed contextualization of CMAB to correlation clustering. As a first step, we let the weights \(w _e^+, w _e^-\) of every edge \(e \in E\) be modeled as random variables \(W_e^+, W_e^-\) with [0, 1] support,Footnote 1 and mean

$$\begin{aligned} \varvec{\mu } = \lbrace \varvec{\mu }^+, \varvec{\mu }^- \rbrace , \quad \varvec{\mu }^+ \!\!= \lbrace \mu _e^+ \!= {\mathbb {E}}[W_e^+] \rbrace _{e \in E}, \quad \varvec{\mu }^- \!\!= \lbrace \mu _e^- \!= {\mathbb {E}}[W_e^-] \rbrace _{e \in E}. \end{aligned}$$
(3)

All such random variables and their means are assumed to be unknown (as typical in CMAB), and not to change in the various clustering rounds. Any CMAB algorithm keeps estimates of the true means, which are denoted as:

$$\begin{aligned} \hat{\varvec{\mu }} = \lbrace \hat{\varvec{\mu }}^+, \hat{\varvec{\mu }}^- \rbrace , \quad \hat{\varvec{\mu }}^+ \!\!= \lbrace {\hat{\mu }}_e^+ \rbrace _{e \in E}, \quad \hat{\varvec{\mu }}^- \!\!= \lbrace {\hat{\mu }}_e^- \rbrace _{e \in E}. \end{aligned}$$
(4)

Let also every edge \(e = (u,v) \in E\) be represented by a pair of replicas, \(e^{in}\) and \(e^{out}\), which model the fact that e is an intra-cluster or inter-cluster edge (with respect to a given clustering), respectively. Let \({\mathcal {S}}^{in} = \{e^{in} \mid e \in E \}\) and \({\mathcal {S}}^{out} = \{e^{out} \mid e \in E\}\) be the sets of all intra-cluster and inter-cluster edge replicas, respectively. We make the base arms in CMAB correlation clustering correspond to the set \({\mathcal {S}} = {\mathcal {S}}^{in} \cup {\mathcal {S}}^{out}\) of all edge replicas (thus, the number of base arms is \(m = 2|E|\)), and a superarm be identified by a set of base arms that are consistent with the notion of clustering. Formally, a superarm corresponds to a clustering-compliant replica set:

Definition 1

(Clustering-compliant replica set) A set \(S \subseteq {\mathcal {S}}\) of edge replicas is clustering-compliant if (i) for all \(e \in E\), \(S\) does not contain both \(e^{in}\) and \(e^{out}\), and (ii) for all \(e_1 = (x,y), e_2 = (y,z), e_3 = (x,z) \in E\), if \(e_1^{in}, e_2^{in} \in S\), then \(e_3^{in} \in S\).

In the above definition, (i) is because an edge cannot be both intra-cluster and inter-cluster, while (ii) guarantees the transitive property that if vertices xy are within the same cluster and yz are within the same cluster, then xz must be in the same cluster too. Simply speaking, a superarm corresponds to a clustering. Thus, we hereinafter refer to “superarm” and “clustering” as two equivalent notions.

Table 1 Contextualization of CMAB to correlation clustering

The outcome of the base arms that are triggered while playing a superarm depends on the correlation-clustering formulation. In Min-CC, the outcome of every intra-cluster edge replica \(e^{in}\) comes from the corresponding negative-type-weight \(W_e^-\) random variable, while the outcome of every inter-cluster edge replica \(e^{out}\) comes from the corresponding positive-type-weight \(W_e^+\) random variable. The rationale is that, in Min-CC, the clustering quality is measured in terms of the negative-type weight of all intra-cluster edges and the positive-type weight of all the inter-cluster edges. Thus, placing a clustering (i.e., playing a superarm) is expected to give a feedback that is consistent with Min-CC ’s objective function: the outcome of \(e^{in}\) (resp. \(e^{out}\)) replicas should be used to update \(\mu _e^-\) (resp. \(\mu _e^+\)). Conversely, in Max-CC, \(e^{in}\) and \(e^{out}\) are assigned (and their outcome come from) \(W_e^+\) and \(W_e^-\), respectively.

The reward/loss corresponds to the correlation-clustering objective function, hence its definition depends on the correlation-clustering formulation too. Given a superarm \(S\), let \(S ^{in}\) and \(S ^{out}\) denote the intra-cluster and inter-cluster edge replicas in \(S\), respectively. Min-CC utilizes a disagreement-based loss \(d (S)\) defined as:

$$\begin{aligned} \small \textstyle d (S) = \sum _{e \in S ^{in}} W_e^- + \sum _{e \in S ^{out}} W_e^+, \end{aligned}$$
(5)

while Max-CC employs a reward \(a ({\mathcal {C}})\) defined in terms of agreements as:

$$\begin{aligned} \small \textstyle a (S) = \sum _{e \in S ^{in}} W_e^+ + \sum _{e \in S ^{out}} W_e^-. \end{aligned}$$
(6)

The expectations of \(d (\cdot )\) and \(a (\cdot )\) are as follows (by linearity of the expectation):

$$\begin{aligned} {\bar{d}}_{\varvec{\mu }} (S) = {\mathbb {E}}[d (S)] = \!\!\!\sum _{e \in S ^{in}} \mu _e^- +\!\!\! \sum _{e \in S ^{out}} \mu _e^+, \qquad {\bar{a}}_{\varvec{\mu }} (S) = {\mathbb {E}}[a (S)] = \!\!\!\sum _{e \in S ^{in}} \mu _e^+ + \!\!\!\sum _{e \in S ^{out}} \mu _e^-. \end{aligned}$$
(7)

where the “\({\varvec{\mu }}\)” subscript in \({\bar{d}}_{\varvec{\mu }}\) and \({\bar{a}}_{\varvec{\mu }}\) is to emphasize that those functions depend on the true means \(\varvec{\mu }\). Denoting by \({\mathcal {C}}_{S}\) the clustering corresponding to superarm \(S\), Eq. (7) can alternatively (yet equivalently) be written as:

$$\begin{aligned} {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_{S}) \ = \!\!\!\!\!\sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}_{S}(u) = {\mathcal {C}}_{S}(v) \end{array}} \!\!\mu ^-_{uv} \ \ + \!\!\!\! \sum _{\begin{array}{c} (u, v) \in E, \\ {\mathcal {C}}_{S}(u) \ne {\mathcal {C}}_{S}(v) \end{array}} \!\!\mu _{uv}^+, \qquad {\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}_{S}) \ = \!\!\!\!\!\sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}_{S}(u) = {\mathcal {C}}_{S}(v) \end{array}} \!\!\mu ^+_{uv} \ \ + \!\!\!\! \sum _{\begin{array}{c} (u, v) \in E, \\ {\mathcal {C}}_{S}(u) \ne {\mathcal {C}}_{S}(v) \end{array}} \!\!\mu _{uv}^-, \end{aligned}$$
(8)

Table 1 summarizes the elements of our CMAB correlation-clustering formulation.

CMAB-Min-CC and CMAB-Max-CC problems. Given a graph \(G = (V,E)\), we perform discrete rounds \(t=1, \ldots , T\), where at each round t, a clustering \({\mathcal {C}}_t\) of the vertices in V is computed and used to update the mean estimates \(\hat{\varvec{\mu }}^+, \hat{\varvec{\mu }}^-\) of the random variables modeling the positive-type and negative-type edge weights, respectively. As discussed above, in Min-CC, the weight of an edge e between vertices within the same cluster (resp. in different clusters) is interpreted as a random sample useful to update \({\hat{\mu }}_e^-\) (resp. \({\hat{\mu }}_e^+\)). In Max-CC, the opposite holds. The ultimate objective is to minimize/maximize the cumulative expected loss/reward of the clusterings yielded in all the rounds. Formally, the problems we tackle in this work are:

Problem 3

(CMAB-Min-CC) Given a graph \(G = (V, E)\) and a number \(T > 0\) of rounds, for every \(t=1,\ldots , T\) find a clustering \({\mathcal {C}}_t:V \rightarrow {\mathbb {N}}^+\) so as to minimize

$$\begin{aligned} \small \textstyle \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \sum _{t=1}^T {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) \right] . \end{aligned}$$
(9)

Problem 4

(CMAB-Max-CC) Given a graph \(G = (V, E)\) and a number \(T > 0\) of rounds, for every \(t=1,\ldots , T\) find a clustering \({\mathcal {C}}_t:V \rightarrow {\mathbb {N}}^+\) so as to maximize

$$\begin{aligned} \textstyle \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \sum _{t=1}^T {\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}_t) \right] . \end{aligned}$$
(10)

The expectation in Eqs. (9) and (10) is taken among all the random events generating the \({\mathcal {C}}_t\) clusterings (due to, e.g., possible randomization in the oracle that computes the clusterings). There is a further expectation in those equations, which is implicit in the definition of expected loss \({\bar{d}}_{\varvec{\mu }} (\cdot )\) and expected reward \({\bar{a}}_{\varvec{\mu }} (\cdot )\) (see Eq. 8).

As previously discussed, CMAB-Max-CC (resp. CMAB-Min-CC) requires an oracle to solve, for each round, a Max-CC (resp. Min-CC) instance according to the mean estimates \(\hat{\varvec{\mu }}^+, \hat{\varvec{\mu }}^-\). However, oracles available for Max-CC  (Charikar et al. 2005; Swamy 2004) are both inefficient and, more importantly, poorly useful in practice, since they are not able to output more than a fixed number of clusters (i.e., six). This implies that the corresponding CMAB setting (i.e., CMAB-Max-CC) will inherit this issue too, since the clusterings yielded at each round are obtained through these algorithms. This aspect is a showstopper in our context, as we are interested in algorithms that are effective and theoretically solid, yet capable of providing outputs whose quality is recognizable in practice too, not only theoretically. For this reason, we hereinafter focus our attention on algorithms for CMAB-Min-CC only. For completeness, algorithms for CMAB-Max-CC are however presented in Appendix A.2.

4 Algorithms for CMAB-Min-CC

In this section, we present algorithms for CMAB-Min-CC (Problem 3). We first focus on the context of general oracles for Min-CC (Sect. 4.1), and, then, on the case where the employed Min-CC oracles achieve theoretical guarantees only if the input meets certain properties (Sect. 4.2). Finally, we discuss the special case of input edge-weight distributions satisfying specific constraints (Sect. 4.3).

4.1 General Min-CC oracles

The CC-CLCB algorithm. We devise a variant of the so-called Combinatorial Upper Confidence Bound (CUCB) algorithm (Chen et al. 2016) which is an extension of the UCB1 method for MAB (Auer et al. 2002). It keeps, along with the estimate of the means of the base-arm random variables, confidence intervals within which the true means fall with overwhelming probability, and plays superarms based on the upper bound of those intervals. Our proposed variant, termed Combinatorial Lower Confidence Bound (CLCB), is tailored for minimization problems but follows the principles of CUCB: it maintains confidence intervals where the true means fall in with high probability, but, conversely to CUCB, it plays superarms based on the confidence-interval lower bounds.

Our customization of CLCB to Min-CC is termed CC-CLCB and outlined as Algorithm 1. CC-CLCB keeps track of the mean estimates \(\hat{\varvec{\mu }} = \lbrace \hat{\varvec{\mu }}^+, \hat{\varvec{\mu }}^- \rbrace\) (Eq. 4), and of the number \(T_e^+\) (resp. \(T_e^-\)) of times a sample from \(W_e^+\) (resp. \(W_e^-\)) random variable has been observed until the current round, for all \(e \in E\). At the beginning, \(\forall e \in E: T_e^+ = T^-_e = 0\), and \(\hat{\varvec{\mu }}\) are initialized, e.g., randomly or based on prior domain knowledge (Line 1). In every round t, the current mean estimates are adjusted with a term \(\rho ^\pm _e\) (defined based on Chernoff-Hoeffding bounds (Auer et al. 2002; Chen et al. 2016)), so as to foster, to some extent, the exploration of less often played base arms (Line 3). This leads to the adjusted means \(\lbrace {\widetilde{\mu }}^+_e, {\widetilde{\mu }}^-_e\rbrace _{e \in E}\) (Line 4), which are interpreted as positive-type and negative-type edge weights of a correlation-clustering instance, respectively, and are fed as input (along with G) to an oracle \({\textbf{O}}\) that computes a Min-CC solution \({\mathcal {C}}_t\) (Line 5). \({\mathcal {C}}_t\) is used as a feedback to update the mean estimates (Sect. 3, Table 1). Specifically, the weight of each intra-cluster (resp. inter-cluster) edge e is interpreted as a sample of \(W_e^-\) (resp. \(W_e^+\)), and is used to update \({\hat{\mu }}^-_e\), \(T_e^-\) (resp. \({\hat{\mu }}^+_e\), \(T_e^+\)). \({\hat{\mu }}^+_e\) and \({\hat{\mu }}^-_e\) are updated so as to be equal to the average of the samples from \(W_e^+\) and \(W_e^-\) observed so far, respectively (Lines 6–11).

figure a

Regret analysis of CC-CLCB. As correlation clustering is \({{\textbf{N}}}{{\textbf{P}}}\)-hard, it is unlikely that CC-CLCB can be equipped with an exact oracle \({\textbf{O}}\) for Min-CC running in polynomial time. Hence, in analyzing the theoretical guarantees of CC-CLCB, we consider the case where \({\textbf{O}}\) is a Min-CC\((\alpha ,\beta )\)-approximation oracle:

Definition 2

(Min-CC-\((\alpha , \beta )\)-approximation oracle) Given a Min-CC instance \(I \!=\! \langle (V,E), \lbrace \!(w _e^+\!,w _e^-)\! \rbrace _{e \in E}\rangle\), let \({\mathcal {C}}^*_I\) be the optimal solution to I. Given \(\alpha , \beta \in (0, 1]\), an algorithm for Min-CC is a min-\((\alpha , \beta )\)-approximation oracle if, for every input I, it yields a solution \({\mathcal {C}}\) such that \(\Pr [f_{min} ({\mathcal {C}}) \le \frac{1}{\alpha }~f_{min} ({\mathcal {C}}^*_I)] \ge \beta\) (where \(f_{min} (\cdot )\) is Min-CC ’s objective function, Eq. (1)).

The condition in Definition 2 to recognize \({\textbf{O}}\) as a Min-CC-\((\alpha , \beta )\)-approximation oracle needs to hold on every Min-CC instance that is given as input to \({\textbf{O}}\) at each round. Hence, the condition has to hold on the mean estimates, not the true ones. Similarly to the maximization counterpart, existing Min-CC algorithms achieving (\({\mathcal {O}} (\log |V|)\)) guarantees in expectation (Charikar et al. 2005; Demaine et al. 2006) can be employed as Min-CC-\((\alpha , \beta )\)-approximation oracles. More details in Appendix A.1.

We introduce a notion of \((\alpha ,\beta )\)-approximation regret, which can be viewed as the minimization counterpart of the traditional one defined in Chen et al. (2016) and used in maximization problems. Applied to the Min-CC context, this measure is defined as follows:

Definition 3

(Min-CC-\((\alpha , \beta )\)-approximation regret) Let \({\mathcal {C}}^*_{I}\) be the clustering minimizing the expected loss \({\bar{d}}_{\varvec{\mu }} (\cdot )\) (Eq. 7) on a CMAB-Min-CC instance I (w.r.t. the true \(\varvec{\mu }\) means, Eq. 3), let \({\mathcal {M}} = \max _{{\mathcal {C}} \in {\textbf{C}}(I)} {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}})\) (where \({\textbf{C}}(I)\) is the set of all clusterings of I), and let \(\lbrace {\mathcal {C}}_t\rbrace _{t=1}^T\) be the clusterings output by an algorithm \({\textbf{A}}\) run on I. The Min-CC-\((\alpha , \beta )\)-approximation regret of \({\textbf{A}}\) is

$$\begin{aligned} \textstyle Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T) = \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \sum _{t=1}^T {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) \right] - T\left[ \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}) + ({\mathcal {M}} - \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}))(1 -\beta ) \right] . \end{aligned}$$
(11)

The rationale of the above definition is as follows. First, being the focus on a minimization problem, the lower the probability \(\beta\) of success, the higher the loss value to compare with. Moreover, to take into account possible divergences of the approximation oracle from the optimum, and recalling that here are losses, not rewards, we add an extra term to the \(\frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I})\) loss that “interpolates” between the highest \(\beta =1\) probability (thus, we compare with \(\frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I})\)) and the worst \(\beta =0\) probability (thus, we compare with the maximum value \({\mathcal {M}}\) of loss). Note that the \(T\left[ \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}) + ({\mathcal {M}} - \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}))(1 -\beta ) \right]\) term in \(Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T)\) is used as a comparison for the (expected) performance \(\mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \sum _{t=1}^T {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) \right]\) achieved by the CMAB method at hand in the various rounds. It is defined by noticing that, in every round \(t = 1, \ldots , T\), a Min-CC-\((\alpha , \beta )\)-approximation oracle yields, with probability (at least) \(\beta\), a solution whose \({\bar{d}}_{\varvec{\mu }} (\cdot )\) value is at most \(\frac{1}{\alpha }\) times the optimum (i.e., \(\frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*)\)), and, with probability (at most) \(1 - \beta\), a solution whose \({\bar{d}}_{\varvec{\mu }} (\cdot )\) value is more than \(\frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*)\). In the latter case, consistently with the regret definition in maximization problems (Chen et al. 2016), we assume that the \({\bar{d}}_{\varvec{\mu }} (\cdot )\) value of the yielded solutions is equal to an upper bound \(UB = {\mathcal {M}}\) on \({\bar{d}}_{\varvec{\mu }}\). More precisely:

$$\begin{aligned}&\small \textstyle T \left[ \frac{1}{\alpha }~\beta ~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) + (1 - \beta )UB \right] \ = \ T \left[ \frac{1}{\alpha }~\beta ~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) + (1 - \beta ){\mathcal {M}} \right] \ = \ T \left[ \frac{1}{\alpha }~\beta ~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) + {\mathcal {M}} - \beta ~{\mathcal {M}} \right. \\&\small \textstyle \left. + \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) - \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) \right] \ = \ T \left[ \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) + \left( {\mathcal {M}} - \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}) \right) (1 -\beta ) \right] . \end{aligned}$$

The comparison term in \(Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T)\) is pessimistic in assuming that when the \((\alpha , \beta )\)-approximation oracle does not achieve approximation guarantees it yields solutions whose \({\bar{d}}_{\varvec{\mu }} (\cdot )\) value is equal to the upper bound \({\mathcal {M}}\). However, note that this happens with probability \(1 - \beta\). In our context, \(1 -\beta\) is in the order of \(|V|^{-c}\) (cf. Appendix A.1), with c set to 1 in our experiments (cf. Sect. 5). This means that the pessimistic assumption arises just in a tiny minority \(|V|^{-1}~T\) of the rounds. Also, the comparison term still adopts the optimistic assumption that the true \(\varvec{\mu }\) weights are known, while they are actually not for the CMAB method that is being evaluated in terms of \(Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T)\).

As typically required in (C)MAB, the above regret is consistent with the definition of cumulative expected reward/loss at hand (i.e., Eq. 9, in this case). Thus, minimizing that regret corresponds to solving CMAB-Min-CC (Problem 3). A key theoretical desideratum in CMAB (and online-learning settings in general), is having a regret bounded by some function that is sublinear in the number of rounds. This is motivated by the fact that the overall objective is typically a summation over the number of rounds, thus a regret growing (at least) linearly in the number of rounds is considered as a straightforward result that any algorithm can easily achieve.

As shown in the next theorem, CC-CLCB achieves a regret bound that is logarithmic in the number of rounds:

Theorem 1

Given \(\alpha , \beta \in (0,1]\), the Min-CC-\((\alpha , \beta )\)-approximation regret (Definition 3) of the CC-CLCB algorithm (Alg. 1), when equipped with a Min-CC-\((\alpha , \beta )\)-approximation oracle \({\textbf{O}}\) (Definition 2), is upper-bounded by a function that is \({\mathcal {O}} (\log T)\).

Proof

(sketch) The proof relies on the following main result: the \({\bar{d}}_{\varvec{\mu }} (\cdot )\) function (Eq. 8) satisfies the properties of monotonicity and 1-norm bounded smoothness. This triggers a (rather long yet complex) chain of further results along the line of those derived in Wang and Chen (2017) for the regret analysis of algorithms for CMAB maximization problems. The ultimate of such results attests the desired logarithmic regret bound. A more detailed proof is reported in Appendix A.3. \(\square\)

4.2 Min-CC oracles requiring the probability constraint

The CC-CLCB algorithm makes no assumptions on the input graph or edge-weight distributions. Thus, to achieve regret guarantees, CC-CLCB needs a Min-CC oracle whose approximation guarantees hold in general, without requiring restrictions on the input. As said above, algorithms of this kind, in the context of Min-CC, exist (Charikar et al. 2005; Demaine et al. 2006), but they suffer from issues such as limited efficiency yet not easy implementation (they need to solve a linear program of size \(\varOmega (|V|^3)\)), and non-constant (\({\mathcal {O}} (\log |V|)\)) approximation factor. A much better option would be to resort to the well-established Pivot  (Ailon et al. 2008), which is efficient (it takes linear time), easy to implement (it just randomly picks a vertex u and builds a cluster as composed of u and all vertices connected to u with and edge whose positive-type weight is no less than the negative-type one), and achieves constant-factor approximation. Unfortunately, the (expected factor-5) guarantees of Pivot hold only if the input graph is complete and the edge weights satisfy the probability constraint, i.e., \(w ^+_{uv} + w ^-_{uv} = 1\), \(\forall u,v \in V\). For this purpose, here we focus on the design of heuristic variants of CC-CLCB that favor the fulfilment of the probability constraint on the Min-CC instances to be processed by the oracle. The rationale is that the closer a Min-CC instance is to meet the probability constraint, the closer Pivot is to its “theoretical comfort zone”, thus expected to perform better.

The PC+Exp-CLCB algorithm. Our first proposal in this regard is the PC+Exp-CLCB algorithm (where “PC+Exp” means “probability constraint + exploration”). This algorithm, outlined as Algorithm 2, follows the same scheme as CC-CLCB, but it computes \(\lbrace {\widetilde{\mu }}^+_{uv}, {\widetilde{\mu }}^-_{uv}\rbrace _{u,v \in V}\) adjusted means so as to simultaneously favor some exploration and make the resulting Min-CC instance satisfy the probability constraint.

figure b

The Global-CLCB algorithm. As our second variant of CC-CLCB, we devise an algorithm—dubbed Global-CLCB–which builds Min-CC instances at each round that are as close as possible to meet a global constraint on the edge weights similar the one defined in Mandaglio et al. (2021). The fulfilment of this global constraint makes the probability-constraint-aware approximation guarantees still hold even if the probability constraint is locally violated. Global-CLCB mainly relies on the following result:

Theorem 2

Let \(I = \langle G = (V,E), \lbrace {\widetilde{\mu }}^+_{uv} \rbrace _{u,v \in V}, \lbrace {\widetilde{\mu }}^-_{uv} \rbrace _{u,v \in V} \rangle\) be a Min-CC instance. If \(\left( {\begin{array}{c}|V|\\ 2\end{array}}\right) ^{-1}\sum _{u, v \in V} ( {\widetilde{\mu }}^+_{uv} + {\widetilde{\mu }}^-_{uv} ) \ge 1\), then any Min-CC algorithm (e.g., Pivot) achieving (expected) factor-\(\varepsilon\) approximation in presence of the probability constraint achieves (expected) factor-\(\varepsilon\) approximation on I too.

Proof

(sketch) The result here is a special case of the one originally proved in Theorem 1 in Mandaglio et al. (2021), specifically arising for \(\varDelta _{max} = 1\). Therefore, the proof herein is exactly the same as the one of Theorem 1 in Mandaglio et al. (2021), with the only straightforward exception of replacing \(\varDelta _{max}\) with the constant 1. \(\square\)

Global-CLCB attempts to compute \(\lbrace {\widetilde{\mu }}^+_{uv}, {\widetilde{\mu }}^-_{uv}\rbrace _{u,v \in V}\) adjusted means that are as close as possible to satisfy Theorem 2. Global-CLCB is the same as CC-CLCB, except for their respective Line 4. A detailed pseudocode of Global-CLCB is reported in Algorithm 3. We point out that CC-CLCB ’s regret analysis does not hold for PC+Exp-CLCB or Global-CLCB. Deriving theoretical regret guarantees for these (or similar) heuristics is a challenging open question that we defer to future work.

figure c

4.3 Special edge-weight distributions

An interesting special input is the one of symmetric edge-weight distributions:

Definition 4

(Symmetric distributions) [0, 1]-support random variables \(W^+_e\), \(W^-_e\) have symmetric distributions if and only if \(W^+_{e}(x) = W^-_{e}(1-x)\), for all \(x \in [0,1]\).

Conceptually, this is like assuming that if a similarity equal to x holds for any two vertices, a \((1-x)\) distance implicitly holds for the same vertices as well.

CMAB-correlation-clustering instances where symmetry holds for all edge-weight distributions are easier to solve. In fact, symmetry in the distributions makes the instance at hand a full-information bandit setting: observing a sample \(x \sim W^+_{e}\) is equivalent to observing a sample \((1-x) \sim W^-_{e}\), for all \(e \in E\). This corresponds to having an outcome revealed for all the base arms, regardless of the superarm (clustering) played. In this case, therefore, exploration is meaningless. Rather, a full-exploitation strategy is worth to be performed, where a clustering considering solely the current mean estimates is yielded in each round. This strategy achieves a regret bound that is constant in the number of rounds, as stated by Theorem 3:

Theorem 3

Given \(\alpha , \beta \in (0,1]\), the Min-CC-\((\alpha , \beta )\)-approximation regret (Definition 3) of a full-exploitation strategy run on a CMAB-Min-CC instance where all edge-weight distributions are symmetric, and equipped with a Min-CC-\((\alpha , \beta )\)-approximation oracle (Definition 2), is upper-bounded by a function of T that is \({\mathcal {O}} (1)\).

Proof

(sketch) The full-information bandit setting allows for simplifying some intermediate math in the regret analysis of a non-full-information setting (Theorem 1). These simplifications ultimately lead to a \({\mathcal {O}} (1)\) regret bound. A detailed proof and a pseudocode of the full-exploitation strategy are in Appendix A.4. \(\square\)

Fig. 1
figure 1

Example CMAB-Min-CC solutions obtained by \(\epsilon\)-greedy on HighlandTribes, and by CC-CLCB on Contiguous-USA, over \(T=200\) rounds, using the linear-programming method in Charikar et al. (2005) as an oracle. Vertex colors correspond to cluster memberships, while edges are colored as black, resp. red, if there is an edge-level agreement, resp. disagreement, between the true edge weights and their estimated values; by edge-level agreement, we mean that the disequality between the positive-type weight and negative-type weight of an edge holds the same on the true weights and their estimates. Values within parentheses refer to NMI (with arithmetic mean normalization) between the CMAB-Min-CC solution by the oracle over the graph with mean estimates and the corresponding solution over the graph with true means

4.4 Visualization example

Figure 1 provides a visualization of the HighlandTribes and Contiguous-USA graphs, used as case in point (cf. Section 5), and their CMAB-Min-CC clusterings produced by \(\epsilon\)-greedy and CC-CLCB, respectively, using the same oracle in both cases. In particular, we show the outcomes referring to the using correlation-clustering linear-programming method in Charikar et al. (2005) as an oracle. Our goal here is to provide empirical evidence of the significance of the CMAB-Min-CC setting and effectiveness of the CMAB-Min-CC methods. To this purpose, we show three snapshots of execution on each graph, namely at the initial, middle and final round of a method. Besides visualizing the cluster memberships of vertices—note that vertices of one cluster share the same color at any round, but color memberships may change at different rounds—we also use black edges and red edges to distinguish between edge-level agreements and disagreements, respectively, which denote that the disequality between the positive-type weight and negative-type weight of an edge holds, or does not hold, the same on the true weights and their estimates; formally, an edge (uv) is colored as black if \((\mu _{uv}^+> \mu _{uv}^- \wedge {\hat{\mu }}_{uv}^+ > {\hat{\mu }}_{uv}^-) \vee (\mu _{uv}^+ \le \mu _{uv}^- \wedge {\hat{\mu }}_{uv}^+ \le {\hat{\mu }}_{uv}^-)\), otherwise as red.

Two major remarks stand out by looking at the plots for each graph. First, the similarity measured in terms of normalized mutual information (NMI) between the CMAB-Min-CC solution produced by the oracle over the graph with mean estimates and the corresponding solution over the graph with true means, significantly improves as more rounds are carried out; in particular, as shown for HighlandTribes (plots (a-c)), already after few early rounds, NMI approaches the maximum value reached at the final round. Second, the number of edge-level agreements also rapidly increases after few rounds, until few disagreements are left at the final round.

5 Experimental methodology

Data. We consider ten publicly-available real-world graphs, as summarized in Table 2. Each of the five networks from bottom corresponds to the flattening of a network originally represented as a set of snapshot-graphs (Galimberti et al. 2020) (i.e., an edge between u and v exists in the flattened network if u and v were linked in at least one snapshot).

Edge weight distributions. The random variables \(W_e^+, W_e^-\) modeling the positive-type and negative-type edge weights in a Min-CC instance are assumed to follow a Bernoulli distribution, whose means are generated according to three schemes.

In the first two schemes, termed \(R\text{-}wd\) and \(PC\text{-}wd\), the original—possibly incomplete—network topology of the underlying input graph is maintained, meaning that \(\mu _e^+ \!= {\mathbb {E}}[W_e^+] = \mu _e^- \!= {\mathbb {E}}[W_e^-] = 0\), for all \(e \notin E\). As far as the means \(\mu _e^+\), \(\mu _e^-\) for each \(e \in E\), \(R\text{-}wd\) samples uniformly at random both \(\mu _e^+\) and \(\mu _e^-\) from the [0, 1] interval, independently from one another, i.e., \(\mu ^+_{e}, \mu ^-_{e} \sim Uniform(0, 1)\), for all \(e \in E\). On the other hand, \(PC\text{-}wd\) ensures that the probability constraint holds on the generated means, which corresponds to first sample \(\mu ^+_{e} \sim Uniform(0, 1)\), and then set \(\mu ^-_{e} = 1-\mu ^+_{e}\), for all \(e \in E\). As a result, for both \(R\text{-}wd\) and \(PC\text{-}wd\), \(\mu _e^+, \mu _e^- \in [0,1]\), while the samples observed from the edge weight distributions at each CMAB round \(\in \{0,1\}\), for all \(e \in E\). In particular, samples \(W_e^+ \!=\! 1\) and \(W_e^+ \!=\! 0\) (resp. \(W_e^- \!=\! 1\) and \(W_e^- \!=\! 0\)) are observed with probability \(\mu _e^+\) and \(1\!-\! \mu _e^+\) (resp. \(\mu _e^-\) and \(1\!-\! \mu _e^-\)), respectively.

The third scheme assumes the actual network topology imposes a binary, mutually exclusive setting for each pair of vertices, i.e., \(\mu ^+_{uv} \!=\! 1, \mu ^-_{uv} \!=\! 0\), if \((u,v) \in E\), and \(\mu ^+_{uv} \!=\! 0, \mu ^-_{uv} \!=\! 1\), if \((u, v) \notin E\). Since this setting leads to a new complete graph, the scheme is referred to as \(C\text{-}wd\), and it will be considered only for the smaller datasets, as it is computationally unfeasible to handle complete versions of the larger datasets. As \(\mu ^+_{uv}, \mu ^-_{uv} \in \{0,1\}\), for all \(u,v \in V\), the underlying \(W_{uv}^+, W_{uv}^-\) distributions are actually degenerate, and every sample observed in a CMAB round from \(W_{uv}^+\) (resp., \(W_{uv}^-\)) will be equal to 1 if \(\mu ^+_{uv}= 1\) (resp., \(\mu _{uv}^- = 1\)), and it will be 0 if \(\mu ^+_{uv}= 0\) (resp., \(\mu _{uv}^- = 0\)).

Table 2 Main characteristics of the real-world datasets used in our evaluation

Assessment criteria. The means \(\mu _{e}^+\), \(\mu _{e}^-\) generated via the above schemes correspond to the true correlation-clustering edge weights that are unknown to any CMAB method. These are used to evaluate the quality of the clusterings yielded in the various CMAB rounds via the average expected normalized cumulative Min-CC loss, calculated until each round t:

$$\begin{aligned} f^{(t)} = \textstyle \frac{1}{t} \sum _{i=1}^t \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \frac{{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_i)}{U} \right] , \end{aligned}$$
(12)

where \(C_i\) is the clustering of the i-th round, \(U = \sum _{u,v \in V} \max \{\mu _{uv}^+, \mu _{uv}^-\}\) is a normalization constant (equal to an upper bound on the Min-CC objective-function value, so that \({\bar{d}}({\mathcal {C}}_i)/U \!\in \! [0,1]\)), and the \(\mathop {\mathrm {{\mathbb {E}}}}\limits [\cdot ]\) expectation is computed by averaging the \({\bar{d}}_{\varvec{\mu }} (\cdot )\) values obtained over all the runs of execution of the randomized Min-CC oracles (see below). Note that Eq. (12) is a shorter and normalized version of Eq. (11). In fact, Eq. (12) is limited to the first term only in Eq. (11), as the second term is common to all the methods (under the same oracle). It is also normalized, so as to have the results on graphs of different size more easily comparable to each other.

As a second assessment criterion, we consider the error of the \({\hat{\mu }}_{e,t}^+\), \({\hat{\mu }}_{e,t}^-\) weight estimates at each round t, which is measured in terms of relative error norm as

$$\begin{aligned} ren^{(t)} = \sqrt{ \frac{\sum _{e \in E} ( {\mu }_e^+ - {\hat{\mu }}_{e,t}^+ )^2 + \sum _{e \in E} ({\mu }_e^- - {\hat{\mu }}_{e,t}^- )^2 }{\sum _{e \in E}({\mu }_e^+)^2 + \sum _{e \in E}({\mu }_e^-)^2} }, \end{aligned}$$
(13)

For both \(f^{(t)}\) and \(ren^{(t)}\) lower values correspond to better performance. Our main focus is on the \(f^{(T)}\), \(ren^{(T)}\) values at the final round \(t = T\), as they give a compact yet general evidence of the overall performance of a method. However, we also analyze the trend of \(ren^{(t)}\) over the various rounds to assess statistical significance (Sect. 6.5), and report evidence of such trends (Sect. 7).

Within this view, in the result tables presented in Sect. 6, we shall report \(f^{(T)}\) and \(ren^{(T)}\) values. Moreover, for the CMAB methods only, we also provide the growth rates, i.e., the average amount of relative change between the initial and the final round over the span T (in percentage):

$$\begin{aligned} gr^{\%}_{_{\!\!f}} = \left( \frac{{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_T)}{{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_1)} -1 \right) \times 100, \qquad gr^{\%}_{_{\!\!ren}} = \left( \frac{ren^{(T)}}{ren^{(1)}} -1 \right) \times 100. \end{aligned}$$
(14)

Finally, we are also interested in assessing the running time of the various tested methods (Sect. 6.4).

Methods. We involve methods falling into four approaches: (i) CMAB-Min-CC methods adopting the CLCB paradigm, (ii) classic general CMAB heuristics that, in this context, are customized to work for CMAB-Min-CC, (iii) baselines that do not follow the CMAB paradigm, and (iv) a reference method that performs clustering by utilizing the true edge weights. More specifically:

  1. (i)

    As CLCB-based methods, we include CC-CLCB (Algorithm 1), PC+Exp-CLCB (Algorithm 2), and Global-CLCB (Algorithm 3). Moreover, for both CC-CLCB and Global-CLCB, we also consider their CC-CLCB-m and Global-CLCB-m variants, which are less biased towards exploration. Specifically, following Wang and Chen (2018), CC-CLCB-m and Global-CLCB-m utilize uncertain terms defined as \(\rho _{e}^\pm = \sqrt{\ln t/2T^\pm _e}\) (instead of \(\rho _{e}^\pm = \sqrt{3 \ln t / 2T^\pm _e}\)).

  2. (ii)

    As CMAB heuristics, we involve the well-established \(\epsilon\)-greedy, pure exploitation (PE), and Combinatorial Thompson Sampling (CTS) (Wang and Chen 2018). As for \(\epsilon\)-greedy, we consider both a fixed exploration rate, set to 0.1, and an adaptive exploration rate, set to be proportional to \(t^{-1}\), at each t. These variants are dubbed EG-fixed and EG, respectively.

  3. (iii)

    As far as non-CMAB baselines, the idea is to set both types of unknown edge weights based on topological affinity of any two vertices’ neighborhoods, and then run a Min-CC algorithm [specifically, Pivot  (Ailon et al. 2008) in most experiments, and the linear programming approach dubbed LP+R  (Charikar et al. 2005) in the experiment in Sect. 6.3] on such an input, employing no weight learning strategy. More precisely, we resort to two well-known topological similarity measures, namely Jaccard index and Adamic-Adar index, to set the positive-type weights: for each \((u,v) \in E\), \(w _{uv}^+ = |N(u) \cap N(v)|/|N(u) \cup N(v)|\) using Jaccard, or \(w _{uv}^+ = | N(u) \cap N(v)|^{-1} \sum _{z \in N(u) \cap N(v)} (\log |N(z)|)^{-1}\) using (normalized) Adamic-Adar, where N(u) is the set of u’s neighbors. The negative-type weights are then derived in such a way that the probability constraint holds, i.e., \(w_{uv}^- = 1 - w _{uv}^+\).

  4. (iv)

    As a reference method, we consider clustering with the actual (i.e., true) edge weights via a state-of-the-art Min-CC algorithm (i.e., Pivot  (Ailon et al. 2008) in most experiments, and LP+R  (Charikar et al. 2005) in the experiment in Sect. 6.3). This method is termed Actual-weight.

Unless otherwise specified, all the CMAB methods are assumed to be equipped with the Pivot algorithm (Ailon et al. 2008) as an oracle for Min-CC. Pivot is used as a reference oracle because it is more usable in practice, due to its efficiency, approximation guarantees, and ease of implementation. However, we also carry out an experiment to evaluate the impact of a different oracle, specifically the LP+R algorithm (Ailon et al. 2008) (Sect. 6.3). As LP+R takes \(\varOmega (|V|^3)\) time just to build the linear program, this experiment is performed on the smaller datasets only.

Since the chosen Min-CC oracles are randomized algorithms, for every experiment, we perform \(\log _{2}|V|\) independent runs of the selected oracle per CMAB round (setting \(\delta =1, c=1\), cf. Appendix A.1), and take the best solution in terms of Min-CC objective with respect to the current weight estimates. In all the experiments, the number T of CMAB rounds is set to 500, while the number of runs of the Min-CC oracle for every round is set to 10.

Evaluation goals. As this is the first work that investigates Min-CC in a CMAB setting, our experiments are not really intended to assess the superiority of some proposed method(s) over the state of the art. Rather, our main objective here is to provide a comparative evaluation of a variety of CMAB heuristics, approximation algorithms, and heuristic variants of approximation algorithms in the context of CMAB-Min-CC, and derive experimental insights on the peculiarities of the various tested methods. Specifically, the main goals of our experimental evaluation are as follows:

  • Assess the performance of the CMAB methods (CC-CLCB, EG, EG-fixed, PE, CTS) in terms of \(f^{(T)}\) and \(ren^{(T)}\), and compare them to non-CMAB baselines (Adamic-Adar, Jaccard) and the reference Actual-weight method (Sect. 6.1).

  • Compare the performance of the various CLCB variants (CC-CLCB, CC-CLCB-m, PC+Exp-CLCB, Global-CLCB, Global-CLCB-m) to each other, in terms of \(f^{(T)}\) and \(ren^{(T)}\) (Sect. 6.2).

  • Evaluate the impact of varying the Min-CC oracle on the performance of the various CMAB methods (Sect. 6.3).

  • Evaluate the efficiency of all the selected methods (Sect. 6.4).

  • Perform a statistical significance analysis of the reported results (Sect. 6.5).

Further characterization. We also analyze the number of output clusters and the stability of the performance over the various rounds and runs of the tested methods (Sect. 7). This analysis is intended not really as a performance assessment, rather as an additional useful insight to better characterize the tested methods.

Implementation and testing environment. All the tested methods are implemented in Python 3.8, with some of them using external libraries. In particular, LP+R adopts the PuLP library for linear programming,Footnote 2Adamic-Adar and Jaccard use, respectively, the NetworkX and python-igraph libraries to compute the topological similarity scores.Footnote 3 All the experiments are carried out on the Cresco6 cluster,Footnote 4 a high-performance computing system running Linux Centos 7.4, and consisting of 434 nodes, where each one is equipped with two Intel(R) Xeon(R) Platinum 8160 CPU @2.10GHz x24 processor and 192GB ram.

Table 3 Performance in terms of \(f^{(T)}\) (Eq. 12) and (for the CMAB methods) \(gr^{\%}_{_{\!\!f}}\) (Eq. 14)
Table 4 Performance in terms of \(ren^{(T)}\) (Eq. 13) and (for the CMAB methods) \(gr^{\%}_{_{\!\!ren}}\) (Eq. 14)

6 Results

6.1 Performance of the CMAB methods

Quality of the clusterings (Table 3). As a first general remark, the non-CMAB baselines (Adamic-Adar, Jaccard) achieve the worst performance in all the datasets and weight settings, while Actual-weight is always the best method, with only a couple of exceptions. This was expected, as the non-CMAB baselines employ no strategy to learn the true weights, whereas Actual-weight operates on the true weights. Importantly, in most cases, the CMAB methods (CC-CLCB, EG, EG-fixed, PE, CTS) perform comparably or close to Actual-weight. The loss values of all the CMAB methods follow a decreasing trend over the rounds, as testified by the negative growth rates (and better shown in Fig. 2, Sect. 7). This was expected, since the CMAB algorithms learn how to cluster the vertices over time. In general, all the CMAB algorithms converge to solutions with lower growth rate in the \(PC\text{-}wd\) setting than the \(R\text{-}wd\) setting. Also, with the exception of HighlandTribes, the difference of the best loss scores is higher in the \(PC\text{-}wd\) setting than the the \(R\text{-}wd\) one. This complies with the fact that the probability constraint leads to an easier Min-CC clustering task.

Focusing on the CMAB methods, the best performance corresponds to PE. This can be explained since (i) the Min-CC oracle therein used (i.e., Pivot) is a randomized algorithm, thus, although with a pure-exploitation bandit strategy, it results in some implicit exploration; (ii) due to the peculiarity of our problem, each super arm admits a feedback from half the total number of arms, thus a bandit strategy with minimal exploration would likely perform better in the long run. CC-CLCB exhibits very good performance: it is comparable or close to the best methods in most datasets and weight settings, achieving maximum and average difference in loss with respect to the best performer(s) over all the configurations of 0.038 and 0.014, respectively.

Quality of the learned edge weights (Table 4). A first general observation is that the weight estimates of all the CMAB methods improve as the rounds progress, and the relative error goes down over time, leading to a negative growth rate. This is consistent with the clustering improvement by increasing rounds observed from Table 3. As expected, the non-CMAB baselines yield the highest error values, while Actual-weight clearly achieves zero error everywhere. Among the CMAB methods, EG and EG-fixed yield the most accurate estimates in the \(C\text{-}wd\) weight setting. In the \(R\text{-}wd\) and \(PC\text{-}wd\) settings, EG-fixed is (comparable to) the best performer on the smaller datasets (Karate, Dolphins, Zebra, HighlandTribes, Contiguous-USA), while on the bigger datasets, CTS is (comparable to) the best method. CC-CLCB achieves the best performance in three datasets (Zebra, Last.fm, PrimarySchool) for \(R\text{-}wd\) and \(PC\text{-}wd\) distributions. Importantly, for some methods, the good/bad performance on the weight-estimation task does not necessarily translate into an equally good/bad performance in the clustering results discussed above. This has a twofold motivation: (1) the underlying oracle is not an exact algorithm for Min-CC, thus it may happen that clustering with weight estimates lead to clusterings that are of better quality when evaluated in terms of the actual weights, and (2) CMAB methods like CC-CLCB adopt exploration strategies that perturb the current weight estimates before giving them to the oracle, which corresponds to perform clustering with weights that are actually different from the estimated ones.

Table 5 Performance in terms of \(f^{(T)}\) (Eq. 12) and \(gr^{\%}_{_{\!\!f}}\) (Eq. 14), for the CLCB methods (equipped with Pivot as a Min-CC oracle)

6.2 Performance of the CLCB-based CMAB methods

Quality of the clusterings (Table 5). In general, we observe that all the CC-CLCB variants perform rather closely to each other in all the configurations. Deepening the analysis, CC-CLCB-m and Global-CLCB-m are the best methods (in all the datasets but one) in the \(R\text{-}wd\) weight setting. Conversely, in the \(PC\text{-}wd\) and \(C\text{-}wd\) settings, the best method is PC+Exp-CLCB in most cases: specifically, it is the best performer on all the datasets (though on par with CC-CLCB-m and Global-CLCB-m on the larger ones) in \(PC\text{-}wd\), and on three out of five datasets in \(C\text{-}wd\). Also, Global-CLCB performs better in the \(PC\text{-}wd\) and \(C\text{-}wd\) settings than the \(R\text{-}wd\) one. These findings comply with the design principles of the CC-CLCB variants that favor the fulfilment of the probability constraint on the Min-CC instances to be processed by the underlying oracle (cf. Sect. 4.2), which clearly benefit from those settings like \(PC\text{-}wd\) and \(C\text{-}wd\) where the probability constraint actually holds.

Interestingly, CLCB and Global-CLCB achieve the same results in all the configurations (and the same holds for CLCB-m vs. Global-CLCB-m). This can be explained as CC-CLCB and Global-CLCB (and the same is for CLCB-m and Global-CLCB-m) compute adjusted weight estimates (Line 4 in Algorithms 1 and 3) so that the ordering between the positive-type weight estimate and the negative-type weight estimate is likely to be the same for both algorithms. In other words, despite CC-CLCB and Global-CLCB may compute different actual values of those weight estimates, the two algorithms are mostly consistent in yielding a positive-type weight estimate that is higher/lower than the negative-type one. This leads to very similar clusterings yielded by the Pivot Min-CC oracle in every run and every round of both CC-CLCB and Global-CLCB, as Pivot places any two vertices in the same clustering by solely checking whether the positive-type weight on the edge between those vertices is higher than the negative-type one, without looking at the specific values of those weights.

Table 6 Performance in terms of \(ren^{(T)}\) (Eq. 13) and \(gr^{\%}_{_{\!\!ren}}\) (Eq. 14), for the CLCB methods (equipped with Pivot as a Min-CC oracle)

Quality of the learned edge weights (Table 6). In terms of edge weights, the picture in the \(PC\text{-}wd\) and \(C\text{-}wd\) settings is roughly consistent with what observed in terms of clustering quality. Some differences arise in the \(R\text{-}wd\) setting, where, unlike the clustering quality criterion, CC-CLCB-m and Global-CLCB-m are the best methods only in a few configurations (they are outperformed mostly by PC+Exp-CLCB).

As another interesting observation, here are some differences between CLCB and Global-CLCB, and between CLCB-m and Global-CLCB-m. This confirms the argument discussed above, i.e., that those methods achieve the same clustering results even though they can learn different weight estimates.

6.3 Varying the Min-CC oracle

Table 7 shows the performance of all the competing methods when using LP+R as a Min-CC oracle (instead of Pivot). Here, we also show the relative difference (in percentage) between the score with LP+R and the corresponding score with Pivot. Thus, the more positive (resp. negative) such a relative difference, the worse (resp. better) the performance of using LP+R than using Pivot. The general trend in terms of clustering quality (Table 7a) is that LP+R leads to an increase (resp. decrease) in performance in the \(R\text{-}wd\) and \(PC\text{-}wd\) settings (resp. \(C\text{-}wd\) setting). This is likely due to the fact that \(R\text{-}wd\) and \(PC\text{-}wd\) are more challenging than \(C\text{-}wd\), as it is well-known that Min-CC is easier on complete-graph input instances (Charikar et al. 2005). In fact, LP+R provides approximation guarantees at each round, regardless of the weights in input to the oracle. Conversely, Pivot provides quality guarantees if the probability constraint holds on the given input, but this is not the case in a generic round t. The \(C\text{-}wd\) setting (i.e., complete graph with probability constraint) corresponds to the most favorable scenario for Pivot to provide approximation guarantees.

In terms of learned edge weights (Table 7b) the advantage of using LP+R is less evident. A reason might lie in the different random choices of the two algorithms (i.e., choosing the node around which a cluster is being built in Pivot, and integer rounding in LP+R): the random choices of Pivot likely lead to more exploration, hence a better chance to discover weights close to the true ones.

Table 7 Performance with LP+R as a Min-CC oracle, and relative difference (in percentage) between the score achieved with LP+R and the corresponding score achieved with Pivot
Table 8 Running times (in secs.) on the larger datasets. Results correspond to average runtime performances in the \(R\text{-}wd\) and \(PC\text{-}wd\) distribution settings and Pivot as a Min-CC oracle
Table 9 Running times (in secs.)

6.4 Efficiency

Table 8 shows the runtimes of the tested methods on the larger datasets, averaged over the various runs and over the \(R\text{-}wd\) and \(PC\text{-}wd\) weight settings. Despite all the CMAB methods are roughly comparable with each other, CTS is however the slowest method, as it involves additional sampling operations with respect to the other ones.

The CMAB methods take seconds on the smaller Last.fm and PrimarySchool datasets, around one hour on ProsperLoans, and up to 3–5 h on the largest datasets, i.e., Wikipedia and DBLP. In general, however, we can conclude that all the CMAB methods are rather efficient. Even the highest runtimes on Wikipedia and DBLP are not worrying, considering that such datasets have around 10 M edges, and, more importantly, that the reported runtimes are cumulative of all the 500 CMAB rounds. In fact, the highest per-round runtime of a CMAB method is always—at worst—comparable to the runtime of Actual-weight, which performs Min-CC clustering just once. In most cases, it is even less, likely because the time of the round-independent steps is amortized over the various rounds.

Further results are shown in Table 9 and include the use of both oracles for the smallest datasets in our collection, according to all weight settings. As it can be noted in the table, the above qualitative remarks on the relative differences between the methods are equally evident.

6.5 Statistical significance

Here we present a further step of analysis to assess the statistical significance of the performance of the CMAB-Min-CC methods CC-CLCB, Global-CLCB, PC+Exp-CLCB, CTS, and EG, when equipped with Pivot as a Min-CC oracle.

To this purpose, we resorted to a Friedman’s test. We designed it by considering all the methods, all the datasets, and all the weight settings in one single test. More specifically, we organized the data into a matrix with 5 columns (treatments) corresponding to the methods, and 250 rows (blocks) corresponding to the number of combinations between runs (10), datasets (10), and weight settings (R-wd and PC-wd available for all 10 datasets and C-wd available for 5 datasets), where each cell measures the average expected normalized cumulative loss (i.e., \(f^{(T)}\) (Eq. 12) obtained by a particular method at the last round (\(T=500\)) on a particular configuration of run, dataset, weight setting. (Note that each run corresponds to a different fixed seed for handling computation randomness.)

Our Friedman’s test results indicate that there are significant differences—\(\chi ^2(4) = 158.1\), p-value \(< 2.2\)E-16—in the average expected normalized cumulative losses in run/dataset/weight-setting blocks based on the methods, i.e., the methods have different effect on average expected normalized cumulative loss obtained on each run/dataset/weight-setting combination.

We also computed the Kendall’s coefficient of concordance (Kendall’s W) for measuring the effect size (degree of difference) for Friedman’s test. From the result above, Kendall’s W is 0.304, which indicates an effect size at the boundary of the “small” and the “moderate” effects based on Cohen’s interpretation guidelines (Tomczak and Tomczak 2014).

Since Friedman’s test is an omnibus test statistic, in order to know which methods are significantly different, we carried out Nemenyi’s all-pairs test as a post-hoc test for pairwise comparisons of methods, where the Bonferroni correction was used to adjust the p-values for multiple hypothesis testing at a 5% cut-off. Results show p-values in the range \((10^{-14}, 10^{-4})\) (i.e., significant differences) for all pairs but CC-CLCB vs. Global-CLCB. It should be noted that the lack of statistical difference between CC-CLCB and Global-CLCB is not surprising: in fact, in Sect. 6.2, we already noticed and explained why CC-CLCB and Global-CLCB achieve the same performance in terms of \(f^{(T)}\), in all the datasets and weight settings.

7 Additional experiments

Table 10 Number of clusters for \(R\text{-}wd\) weight setting: average over all rounds and runs, and (for the CMAB methods) relative (percentage) difference between the last and first round
Table 11 Number of clusters for \(PC\text{-}wd\) weight setting: average over all rounds and runs, and (for the CMAB methods) relative (percentage) difference between the last and first round
Table 12 Number of clusters for \(C\text{-}wd\) weight setting: average over all rounds and runs, and (for the CMAB methods) relative (percentage) difference between the last and first round

Clustering size. Tables 10, 11 and 12 show the number of clusters yielded by the tested methods, averaged over all the CMAB rounds and runs of the Min-CC oracle. For the CMAB methods, we also provide the difference (in percentage) between the number of clusters at the final round and the number of clusters at the first round (both averaged over the runs of the Min-CC oracle). By inspecting such results, we notice that in the cases of \(R\text{-}wd\) and \(PC\text{-}wd\) weight distributions, the use of LP+R oracle generally corresponds to less clusters compared to the Pivot oracle; some exceptions are observed for small-world datasets (e.g., Zebra, HighlandTribes) by most methods, especially with \(R\text{-}wd\). The \(PC\text{-}wd\) mostly lead to less clusters than \(R\text{-}wd\). Conversely, with the \(C\text{-}wd\) setting, the LP+R oracle leads consistently to a much larger number of clusters (at least double in many cases) than Pivot.

Moreover, we observe that the non-CMAB methods (i.e., Adamic-Adar and Jaccard) produce a relatively small number of clusters as long as the characteristics of the input dataset are those typical of a small-world network; for instance, in Last.fm and PrimarySchool, the clustering size is about 90% and 97% of the vertex set size, respectively. This is not surprising, as the adopted approaches of (CMAB) correlation clustering are not designed to optimize some criterion function defined on topological properties at meso- and macroscopic level (e.g., modularity), which results in a need for refining the clustering solutions through a cluster aggregation stage.

Fig. 2
figure 2

Performance in terms of \(f^{(t)}\) (Eq. 12), over a number \(t = 1,\ldots , 400\) of rounds, for the larger datasets, and \(R\text{-}wd\) and \(PC\text{-}wd\) weight distributions. All the CMAB methods are equipped with Pivot as a Min-CC oracle

Fig. 3
figure 3

Performance in terms of \(f^{(t)}\) (Eq. 12), over a number \(t = 1,\ldots ,400\) of rounds (iterations), for the larger datasets, and \(R\text{-}wd\) and \(PC\text{-}wd\) weight distributions

Performance over the CMAB rounds. Figures 2 and 3 illustrate the performance—in terms of average expected normalized cumulative Min-CC loss \(f^{(t)}\) (Eq. 12)—of the tested methods over the various CMAB rounds t. As expected, the CMAB methods mostly exhibit a decreasing trend, with a decrease in loss scores that is more consistent in the first rounds, until it gets progressively vanishing as the rounds go on, meaning convergence in the weight learning process (and, thus, in the clustering quality too). A few exceptions to this strictly monotonically decreasing trend arise (e.g., with some CLCB-based methods in Last.fm \(PC\text{-}wd\), ProsperLoans \(R\text{-}wd\), ProsperLoans \(PC\text{-}wd\), DBLP \(PC\text{-}wd\)). However, the minimum of the \(f^{(t)}\) function in all those exceptional cases is only slightly less than the value of \(f^{(t)}\) at convergence (i.e., the difference is less than 0.004). Thus, remembering also that \(f^{(t)}\) is an average of all the losses computed up to round t, we can conclude that those non-monotonic trends actually correspond to the normal fluctuations of the loss values in the first CMAB rounds, when there cannot be enough knowledge on the actual edge weights to get stable clustering quality.

Table 13 Coefficient of variation of \(f^{(T)}\) (Eq. 12) over all the CMAB rounds and runs of the Min-CC oracle, for \(R\text{-}wd\) weight setting
Table 14 Coefficient of variation of \(f^{(T)}\) (Eq. 12) over all the CMAB rounds and runs of the Min-CC oracle, for \(PC\text{-}wd\) weight setting
Table 15 Coefficient of variation of \(f^{(T)}\) (Eq. 12) over all the CMAB rounds and runs of the Min-CC oracle, for \(C\text{-}wd\) weight setting
Table 16 Coefficient of variation of \(ren^{(T)}\) (Eq. 13) over all the CMAB rounds and runs of the Min-CC oracle, for \(R\text{-}wd\) weight setting
Table 17 Coefficient of variation of \(ren^{(T)}\) (Eq. 13) over all the CMAB rounds and runs of the Min-CC oracle, for \(PC\text{-}wd\) weight setting
Table 18 Coefficient of variation of \(ren^{(T)}\) (Eq. 13) over all the CMAB rounds and runs of the Min-CC oracle, for \(C\text{-}wd\) weight setting

Stability over the CMAB rounds. Tables 13, 14, 15, 16, 17 and 18 show the coefficient of variation (i.e., ratio between standard deviation and mean) of the scores of the tested methods in terms of \(f^{(T)}\) (Eq. 12) and \(ren^{(T)}\) (Eq. 13) criteria, respectively. It can be observed that the coefficients of variations of \(f^{(T)}\) are typically vary small for all the methods: they mostly range from \([10^{-3}, 10^{-2}]\) in the smaller datasets, and from \([10^{-6}, 10^{-4}]\) in the larger datasets, with only a very few exceptions. In terms of \(ren^{(T)}\), the coefficients of variations in the \(R\text{-}wd\) and \(PC\text{-}wd\) weight settings are higher (especially in the smaller datasets), but they still remain rather small. In the \(C\text{-}wd\) setting, they are instead mostly equal or very close to zero. Therefore, as a general conclusion, we can state that the various tested methods exhibit high stability over the CMAB rounds and runs of the Min-CC oracle.

8 Conclusion

We have focused on the novel setting of correlation clustering where edge weights are unknown, and they need be discovered while performing multiple rounds of clustering. We have provided a Combinatorial Multi-Armed Bandit (CMAB) framework for correlation clustering, algorithms for it, analyses of the theoretical guarantees of these algorithms, more practical heuristics, and extensive experiments.

In the future, we plan to investigate the theoretical properties of our heuristics, advanced CMAB settings, and clustering problems other than correlation clustering.

For reproducibility purposes, we make source code and data available at: https://github.com/Ralyhu/CMAB-CC, and http://people.dimes.unical.it/andreatagarelli/CMAB-CC/.