A combinatorial multi-armed bandit approach to correlation clustering

Gullo, F.; Mandaglio, D.; Tagarelli, A.

doi:10.1007/s10618-023-00937-5

A combinatorial multi-armed bandit approach to correlation clustering

Open access
Published: 29 June 2023

Volume 37, pages 1630–1691, (2023)
Cite this article

Download PDF

You have full access to this open access article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

A combinatorial multi-armed bandit approach to correlation clustering

Download PDF

2079 Accesses
2 Altmetric
Explore all metrics

Abstract

Given a graph whose edges are assigned positive-type and negative-type weights, the problem of correlation clustering aims at grouping the graph vertices so as to minimize (resp. maximize) the sum of negative-type (resp. positive-type) intra-cluster weights plus the sum of positive-type (resp. negative-type) inter-cluster weights. In correlation clustering, it is typically assumed that the weights are readily available. This is a rather strong hypothesis, which is unrealistic in several scenarios. To overcome this limitation, in this work we focus on the setting where edge weights of a correlation-clustering instance are unknown, and they have to be estimated in multiple rounds, while performing the clustering. The clustering solutions produced in the various rounds provide a feedback to properly adjust the weight estimates, and the goal is to maximize the cumulative quality of the clusterings. We tackle this problem by resorting to the reinforcement-learning paradigm, and, specifically, we design for the first time a Combinatorial Multi-Armed Bandit (CMAB) framework for correlation clustering. We provide a variety of contributions, namely (1) formulations of the minimization and maximization variants of correlation clustering in a CMAB setting; (2) adaptation of well-established CMAB algorithms to the correlation-clustering context; (3) regret analyses to theoretically bound the accuracy of these algorithms; (4) design of further (heuristic) algorithms to have the probability constraint satisfied at every round (key condition to soundly adopt efficient yet effective algorithms for correlation clustering as CMAB oracles); (5) extensive experimental comparison among a variety of both CMAB and non-CMAB approaches for correlation clustering.

Impartial Selection with Additive Approximation Guarantees

Article 30 April 2022

Fair Correlation Clustering with Global and Local Guarantees

Impartial Selection with Additive Approximation Guarantees

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Correlation clustering is the problem of clustering the vertices of a graph whose edges are assigned positive-type and negative-type real-valued weights that express, respectively, positive and negative evidence of placing the endpoints of an edge in the same cluster (Bansal et al. 2004; Bonchi et al. 2022). Two formulations of correlation clustering exist: the minimization one (Min-CC) aims at minimizing the sum of the negative-type intra-cluster edge weights plus the sum of the positive-type inter-cluster edge weights; in the maximization counterpart (Max-CC), the objective is dual, i.e., to maximize the sum of the positive-type intra-cluster edge weights plus the sum of the negative-type inter-cluster edge weights. Correlation clustering has been extensively studied from a theoretical point of view, and it has been applied in numerous real-world scenarios (Bonchi et al. 2014; Pandove et al. 2018).

Correlation clustering with unknown edge weights. Traditionally, in correlation clustering it is assumed that the edge weights are all given as input; for instance, they could have been derived from, e.g., past user-interaction history, crowdsourcing, experimental trials, and so on. This has the disadvantage that clustering has to be performed after that all the weights are available, which is unfeasible in several real contexts. To overcome this, here we focus for the first time on a correlation-clustering setting where edge-weight assessment is carried out while performing the clustering.

We devise the following scenario. Edge weights are random variables whose probability distributions and means are unknown and do not change during the whole process. An estimate of the mean of the edge-weight distributions is maintained. Initial estimates are randomly generated or computed based on prior knowledge. There are multiple rounds of clustering. A clustering performed at any round gives feedback on how to adjust the mean estimates, so as to make them improve round after round. The rationale is that, placing a clustering, actual interactions among the vertices can be observed, and hence used as a real evidence to profitably update the mean estimates. More specifically, in Min-CC (resp. Max-CC) one gets feedback about the negative-type (resp. positive-type) intra-cluster edge weights and the positive-type (resp. negative-type) inter-cluster edge weights. A clustering at every round may be computed by taking into account the current mean estimates (exploitation) based on an (exact or approximate) oracle; alternatively, a clustering can be yielded without looking at the mean estimates, so as to get feedback on edge weights for which limited knowledge has been acquired so far (exploration). In our context, alternating between exploiting the oracle with estimated weights to determine a clustering and observing the feedbacks on the edges of the graph induced by this clustering, makes it possible to improve the estimate of the edge weights, since more observed data are collected upon which the estimates are computed.

Both exploitation and exploration have pros and cons. The former yields clusterings that rely on established—but partial—knowledge. The latter allows for expanding the current knowledge, which is supposed to yield better-quality clusterings in the next rounds, but it also may lead to possibly inaccurate clusterings in the first rounds.

Getting the best exploration-exploitation tradeoff is a key desideratum. The effectiveness of such a tradeoff is measured by the (expected) cumulative quality of the clusterings produced in all the rounds. This is the ultimate objective to be optimized, and a major challenge in the design of proper algorithms.

It should be noted that the aforementioned exploration-exploitation tradeoff refers to a reinforcement learning paradigm, which, in this work, we adopt by resorting to the Combinatorial Multi-Armed Bandit (CMAB) framework (Chen et al. 2016, 2018a; Kveton et al. 2015a, b; Lagrée et al. 2016; Wang and Chen 2017; Xu et al. 2020). The CMAB framework has been contextualized to several specific problems, including influence maximization (Chen et al. 2016; Vaswani and Lakshmanan 2015; Wu et al. 2019), community detection (Mandaglio and Tagarelli 2019a, b), community exploration (Chen et al. 2018b), shortest-path discovery (Talebi et al. 2017), feature selection (Liu et al. 2021). In our previous work (Mandaglio et al. 2020), we devise non-CMAB algorithms for a correlation-clustering problem variant in which interactions between entities are characterized by known input probability distributions and conditioned by external factors within the environment where the entities interact. Remarkably, none of those settings is any close to the one we consider in this work, i.e., devising a CMAB framework for (correlation) clustering.

Applications. The domain we consider in this work finds application in all those contexts where it is not preferable (or not permitted) to wait until edge weights have been produced before performing a clustering. Rather, it is desired to place clustering solutions soon, learn the weights along the way, and tolerate that the clustering quality will be less good in the initial rounds, while getting improved as the rounds go by.

For instance, we might consider a team formation scenario, where individuals need to be organized (clustered) into teams (Juárez et al. 2022). Individuals are associated with technical/soft skills which are required for task assignments within the teams. Any two individuals exhibiting a certain skill-level similarity should be assigned to the same team, and conversely to different teams if they are dissimilar to each other; clearly, given the variety of skills and their compatibility levels, the exact degree of matching between two individuals’ skills are not a-priori known at the beginning of the team formation process, and indeed similarities should be learned through team-formation history. In this regard, individuals collaborate with both their teammates and individuals from other teams, for, e.g., general coordination purposes. A desirable goal is to establish teams so as to maximize the overall (i.e., intra-team plus inter-team) similarity between pairs of individuals. This is a problem that can easily be casted to correlation clustering, where vertices correspond to individuals, clusters correspond to teams, and the positive-type and negative-type edge weights correspond to intra-team and inter-team similarities, respectively. Note that empathy (i.e., mutable) characteristics between individuals are here discarded, since this may cause drift in the likelihood that they (dis)like each other once they are (temporarily) members of the same team, i.e., edge weights would change through the rounds. In this regard, an analogous scenario is the task allocation for robots, each of which is programmed to handle a number of operations. Correlation clustering would be helpful to enable forming coalitions between robots, in order to allocate them to tasks to be completed optimally according to some efficiency requirements.

Two further example scenarios are commercial scheduling (e.g., Bollapragada and Garbiras 2004; Giallombardo et al. 2016) and shelf space allocation (e.g., Hübner et al. 2021). The former consists in optimally assigning a set of commercials to fill in each advertisement slot scheduled by a TV broadcaster, where vertices are commercials and edge weights denote marketing-driven benefits in assigning, resp. separating, any two commercials within the same, resp. to different slots; edge weights might initially be estimated by accounting for requirements provided by either the brand customers and the TV braodcaster, then the weights will be adjusted by observing the feedback provided by (online) market-surveys (e.g., delivered to targeted audience of the TV programme schedule). Shelf space allocation is to model the shelf space dimensioning and positioning for allocating selected products based on practical retail requirements, so as to maximize the product selling; here, the feedback observed from the selling outcomes, as well as from how the customers welcome or not the retailer choices, would be related to the opportunity of ensuring brand visibility, or improving customers satisfaction.

All the aforementioned scenarios correspond to well-known optimization problems in operation research and related fields; nonetheless, they are also key-enabling in emerging contexts, such as the development of smart production systems brought by Industry 4.0 (Grillo et al. 2022). However, such problems have not commonly been addressed in terms of correlation clustering; and the few existing exceptions (e.g., Dutta et al. (2019)) are far from a CMAB perspective—unlike we study in this work—which needs to be profitably adopted since correlation-clustering weights are unlikely to be apriori known.

Contributions. The one we deal with in this work is a natural reinforcement-learning scenario, which, to the best of our knowledge, has never been considered in the context of correlation clustering. We tackle it by designing—for the first time—a Combinatorial Multi-Armed Bandit (CMAB) (Chen et al. 2016) framework for correlation clustering. In doing so, we achieve a mix of modeling, algorithmic, technical, and empirical contributions, including principled framework design and problem formulations, design and theoretical analysis of algorithms, tricks to make the algorithms work in practice, experimental evaluation. More in details, our main contributions are as follows:

We novelly formulate correlation clustering in a CMAB setting, by providing a contextualization of the main ingredients of a typical CMAB framework and CMAB formulations for both Min-CC and Max-CC (Sect. 3). Among others, a key impactful consequence of this contribution is that it enables the use of generalist CMAB approximation algorithms/heuristics for CMAB-based Min-CC/Max-CC with minimum customization effort. In this regard, we show how the popular Combinatorial Upper Confidence Bound (CUCB) method (Chen et al. 2016) can be employed in the context of Max-CC (Appendix A.2).
We introduce the Combinatorial Lower Confidence Bound (CLCB) method, which can be viewed as the counterpart of CUCB for minimization problems, and show how to suitably customize it in order to handle Min-CC instances (Sect. 4).
The effectiveness of a CMAB algorithm is typically assessed in terms of regret, i.e., a measure of how far the (expected) cumulative quality of the solutions yielded by an algorithm is from the optimal cumulative quality. In this regard, Chen et al. (2016) provide a regret analysis of the CUCB method, which shows that, if the underlying combinatorial-optimization problem satisfies certain properties, CUCB is guaranteed to achieve a regret that is at most logarithmic in the number of clustering rounds, in presence of an approximation oracle. Here, we build upon Chen et al.’s result and show that:
- Our CMAB formulation of Max-CC satisfies Chen et al.’s properties, thus, CUCB achieves logarithmic regret for Max-CC as well (Appendix A.2.1).
- We devise a principled regret definition for Min-CC. According to this definition, we also provide a regret analysis that, along the lines of Chen et al.’s analysis for CUCB, proves that CLCB achieves logarithmic regret in the number of clustering rounds for Min-CC in presence of an approximation oracle (Sect. 4.1). Our regret definition and analysis for Min-CC are general enough to be reused in any minimization CMAB problems with approximation oracle. This is a per-se contribution, as, to the best of our knowledge, no regret definitions/analyses for CMAB minimization problems (with approximation oracle) exist in the literature.
We further investigate the applicability of the CLCB-like algorithm in practice (Sect. 4.2). A key desideratum in this regard is to employ the traditional Pivot algorithm for Min-CC (Ailon et al. 2008) as an (approximation) oracle within CLCB, for its efficiency, theoretical yet empirical effectiveness, and ease of implementation. A major challenge here is that, to achieve its approximation guarantees (and to provide effective solutions in practice too), Pivot needs the input edge weights to satisfy the probability constraint. Unfortunately, the CLCB algorithm does not guarantee the fulfilment of this constraint at every round. Thus, we design novel variants of the basic CLCB where the correlation-clustering problem instances to be given as input to Pivot (are close to) meet the probability constraint.
We conduct an extensive evaluation to experimentally test the performance of CMAB correlation-clustering algorithms, including the algorithms devised in this work, as well as (correlation-clustering-customized) popular CMAB heuristics, such as $\epsilon$-greedy, pure exploitation, and Combinatorial Thompson Sampling (Wang and Chen 2018) (Sects. 5–7). We consider the Min-CC context only, due to the availability of practical approximation oracles (unlike Max-CC). Results show superior and close accuracy of CMAB methods over non-CMAB baselines and a reference method that performs correlation clustering with the true edge weights, respectively. Also, the per-round runtime of CMAB methods is (at worst) comparable to the runtime of executing a linear-time correlation-clustering algorithm once.

Section 2 discusses background and related work. Section 8 concludes the paper.

2 Background and related work

2.1 Correlation clustering

The minimization (Min-CC) and maximization (Max-CC) formulations of correlation clustering aim at minimizing disagreements and maximizing agreements, respectively. They are formally defined as follows:

Problem 1

(Min-CC (Ailon et al. 2008)) Given a graph $G=(V, E)$, and nonnegative weights $w _{uv}^+$, $w _{uv}^- \in \mathbb {R}_0^+$ for each edge $(u, v) \in E$, find a clustering ${\mathcal {C}}^*: V \rightarrow {\mathbb {N}}^+$ such that:

$$\begin{aligned} {\mathcal {C}}^* \ = \ {\text {argmin}}_{\mathcal {C}}~~ f_{min} ({\mathcal {C}}) \ = \ {\text {argmin}}_{\mathcal {C}} \sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}(u) = {\mathcal {C}}(v) \end{array}} w ^-_{uv} \quad + \!\!\! \sum _{\begin{array}{c} (u, v) \in E, \\ {\mathcal {C}}(u) \ne {\mathcal {C}}(v) \end{array}} w ^+_{uv}. \end{aligned}$$

(1)

Problem 2

(Max-CC (Ailon et al. 2008)) Given a graph $G=(V, E)$, and nonnegative weights $w _{uv}^+$, $w _{uv}^- \in \mathbb {R}_0^+$ for each edge $(u, v) \in E$, find a clustering ${\mathcal {C}}^*: V \rightarrow {\mathbb {N}}^+$ such that:

$$\begin{aligned} {\mathcal {C}}^* \ = \ {\text {argmax}}_{\mathcal {C}}~~ f_{max} ({\mathcal {C}}) \ = {\text {argmax}}_{\mathcal {C}}\sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}(u) = {\mathcal {C}}(v) \end{array}} w ^+_{uv} \quad + \!\!\! \sum _{\begin{array}{c} (u, v) \in E, \\ {\mathcal {C}}(u) \ne {\mathcal {C}}(v) \end{array}} w ^-_{uv}. \end{aligned}$$

(2)

In the above problems, and hereinafter, we let a clustering be represented as an injective function that expresses cluster-membership for the vertices in V.

Min-CC and Max-CC are equivalent in terms of optimality and complexity class [both ${{\textbf{N}}}{{\textbf{P}}}$-hard (Bansal et al. 2004; Shamir et al. 2004)], but have different approximation-guarantee properties, with the latter being easier in this regard. On general edge weights, both Min-CC and Max-CC are $\textbf{APX}$-hard (Charikar et al. 2005), with Max-CC admitting constant-factor approximation algorithms (Charikar et al. 2005; Swamy 2004), and with the best known approximation factor for Min-CC being ${\mathcal {O}} (\log |V|)$ (and unlikely to be meliorable) (Charikar et al. 2005; Demaine et al. 2006).

When restrictions on weights are imposed, the problems become more tractable. For instance, in the seminal work by Bansal et al. (2004), which requires the input graph to be complete, and the weights to be binary and with exactly one nonzero weight for each weight pair (i.e., $\forall u,v \in V$, $(w _{uv}^+, w _{uv}^-) \!\in \! \{(0,1),(1,0)\}$), Max-CC admits a PTAS (Bansal et al. 2004), and Min-CC admits constant-factor approximations (Ailon et al. 2008; Bansal et al. 2004; Charikar et al. 2005; Chawla et al. 2015; van Zuylen and Williamson 2007) [although still remaining $\textbf{APX}$-hard (Charikar et al. 2005)].

Attention has also been devoted to weight bounds that go beyond Bansal et al.’s ones, but are still restrictive enough to allow Min-CC to achieve constant-factor guarantee. These include weights satisfying the probability constraint (i.e., $w ^+_{uv} + w ^-_{uv} = 1$, $\forall u,v \in V$) (Ailon et al. 2008), generalizations of it (i.e., $\forall u,v \in V$, $w ^+_{uv} \le 1$, $w ^-_{uv} \le h$ for some $h \in [1,+\infty )$, and $w _{uv}^+ + w _{uv}^- \ge 1$) (Puleo and Milenkovic 2015), triangle inequality (i.e., $w ^-_{uz} \le w ^-_{uv} + w ^-_{vz}$, $\forall u,v,z \in V$) (Ailon et al. 2008), or global constraints (Mandaglio et al. 2021). The probability constraint is particularly appealing: in fact, under such a constraint, Pivot–a randomized algorithm for Min-CC that is widely recognized for its theoretical guarantees, efficiency, and ease of implementation—achieves a 5-approximation (in expectation) (Ailon et al. 2008). Coupling the probability constraint with triangle inequality lowers Pivot ’s (expected) approximation factor to 2 (Ailon et al. 2008).

Although considering various types of weight, all the above works still assume that edge weights are all available as input. In this work, we go beyond this limitative view, and focus on the context where edge weights are not available beforehand, but they have to be discovered while performing (multiple rounds of) clustering.

Beyond basic correlation clustering. Several extensions to the basic correlation-clustering formulations have been studied, including constrained/relaxed formulations (e.g., constraining the number/size of clusters, allowing overlapping clusters), and adaptations to more sophisticated types of graph (e.g., bipartite graphs, labeled graphs, multilayer graphs, hypergraphs) or nonconventional computational settings (e.g., online, parallel, streaming). We point the interested reader to Bonchi et al. (2014, 2022); Pandove et al. (2018) for more details on these advanced topics. Here, let us just discuss the problem of query-efficient correlation clustering (Bressan et al. 2019; García-Soriano et al. 2020), which, to our knowledge, is the only correlation-clustering extension that exhibits some (slight) similarity with the setting we study in this work. Query-efficient correlation clustering assumes that edge weights are discovered by querying an oracle, and the goal is to cluster the input graph by using a limited budget of Q queries ($Q \ll {\mathcal {O}} (|V|^2)$). Although it is still assumed that edge weights are not available beforehand (like in our setting), query-efficient correlation clustering focuses on a scenario that remains profoundly different from the one tackled in this work. In fact, it considers a hard limit Q on the number of edge weights that can be ultimately discovered, which is a restriction that is not present in our setting. Moreover, the feedback on edge weights is given by an oracle, which provides true edge weights for any query, at any time. Instead, in our setting, the feedback consists in a sample of the weight distributions that are used to update the weight estimates, and is provided by the clustering itself (there is no oracle). Finally, existing approaches to query-efficient correlation clustering (Bressan et al. 2019; García-Soriano et al. 2020) handle binary weights only.

2.2 Combinatorial multi-armed bandit

Combinatorial Multi-Armed Bandit (CMAB) is a popular reinforcement-learning framework to learn how to perform actions by exploring/exploiting the feedback from an environment (Chen et al. 2016). It extends basic Multi-Armed Bandit (MAB) (Berry and Fristedt 1985) so that the actions to be performed/learned correspond to combinatorial structures (superarms) that are defined on top of simpler, basic actions (base arms). Specifically, a CMAB instance consists of m base arms. Each base arm i is assigned a set $\{X_{i,t}\mid 1\le i \le m, 1 \le t \le T \}$ of random variables, where T is the number of rounds. The support of each $X_{i,t}$—assumed to range from [0, 1]—indicates the random “outcome” of playing base arm i in round t. This outcome is interpreted as a feedback from the environment and used to carry out the learning process. The random variables $\lbrace X_{i,t} \rbrace _{t=1}^T$ of the same arm i are independent and identically distributed, according to some unknown distribution with unknown expectation $\mu _i$. Random variables of different base arms may be dependent or distributed with different laws. Estimates $\lbrace {\hat{\mu }}_i \rbrace _{i=1}^m$ of the true unknown $\lbrace \mu _i \rbrace _{i=1}^m$ expectations are kept (and updated) at every round.

A CMAB instance also includes a set ${\mathcal {A}} \subseteq 2^{[m]}$ of possible superarms. ${\mathcal {A}}$ is typically defined as the subset of all subsets of base arms satisfying certain constraints. At each round t, a superarm $A_t \in {\mathcal {A}}$ is played and the outcomes of the random variables $X_{j,t}$, for all the base arms $j \in A_t$, are observed. These outcomes can be used to update the knowledge on the estimates $\lbrace {\hat{\mu }}_j \rbrace _{j \in A_t}$. Playing a superarm $A_t$ gives a reward $R_t(A_t)$, which is a random variable defined as a function of the outcomes of $A_t$’s base arms. $R_t(A_t)$ may simply be a summation $\sum _{j \in A_t} X_{j,t}$ of the outcomes of $A_t$’s base arms, but more complex (possibly nonlinear) rewards are allowed. In any case, it is often assumed that the expectation ${\mathbb {E}}[R_t(A_t)]$ is a function of only $A_t$’s base arms and all the $\lbrace \mu _i \rbrace _{i = 1}^m$ (true) expectations. For minimization problems, the reward can be replaced by a notion $L_t(A_t)$ of loss. The adaptation is straightforward.

The objective of a CMAB algorithm is to select a superarm to be played at every round, so as to maximize the cumulative expected reward obtained in all the rounds, i.e., ${\mathbb {E}}[\sum _{t = 1}^T R_t(A_t)]$. With this ultimate goal in place, a superarm $A_t$ can be chosen by either exploiting the knowledge acquired from the outcomes of previous rounds, or exploring arms that have not been played much. Here is the exploration-exploitation tradeoff that usually appears in reinforcement-learning scenarios: a key design principle of any CMAB algorithm consists in deciding to what extent it should pick the arms that have provided good rewards so far (exploitation) or select different arms with the aim of getting even better rewards (exploration).

As for exploitation-aware superarms, it is assumed the availability of an oracle that computes a superarm based on the current estimates $\lbrace {\hat{\mu }}_i \rbrace _{i = 1}^m$ of the base-arm expectations and the knowledge it possesses on the specific problem at hand. The oracle can be exact, i.e., it outputs $A_t^* =$ ${\text {argmax}}_{A \subseteq {\mathcal {A}}} {\mathbb {E}}[{\hat{R}}_t(A)]$, or an $(\alpha ,\beta )$-approximation one, for some $\alpha , \beta \le 1$, i.e., it outputs a superarm $A_t$ so that $\Pr [{\mathbb {E}}[{\hat{R}}_t(A_t)] \ge \alpha ~{\mathbb {E}}[{\hat{R}}_t(A_t^*)]] \ge \beta$ (where ${\hat{R}}_t(\cdot )$ denotes the reward computed based on $\lbrace {\hat{\mu }}_i \rbrace _{i = 1}^m$).

The effectiveness of a (C)MAB algorithm is typically measured in terms of the so-called regret metric, which corresponds to the difference in the cumulative expected reward between always playing the optimal arm (possibly scaled by factors $\alpha$ and $\beta$ in case of $(\alpha ,\beta )$-approximation oracles) and playing arms according to the algorithm. A major theoretical desideratum in this regard consists in providing a suitable regret analysis, which guarantees that the algorithm at hand achieves a certain bounded regret. The seminal work by Chen et al. (2016) shows that it is possible to design CMAB algorithms achieving ${\mathcal {O}} (\log T)$ regret,

and that this is a tight bound.

Regret definitions and analyses for CMAB maximization problems exist for both exact and approximation oracles (Chen et al. 2016; Wang and Chen 2017). As for minimization problems, to the best of our knowledge, they have been devised for exact oracles only (Cesa-Bianchi and Lugosi 2012; Talebi et al. 2017). In this work, we provide for the first time a regret analysis for a minimization problem (Min-CC) with approximation oracle. The generality of our regret definition and analysis make us believe that this is a contribution of interest for CMAB minimization problems in general, not only for (correlation) clustering.

3 Problem definition

In this section we provide the details of the proposed contextualization of CMAB to correlation clustering. As a first step, we let the weights $w _e^+, w _e^-$ of every edge $e \in E$ be modeled as random variables $W_e^+, W_e^-$ with [0, 1] support,^{Footnote 1} and mean

$$\begin{aligned} \varvec{\mu } = \lbrace \varvec{\mu }^+, \varvec{\mu }^- \rbrace , \quad \varvec{\mu }^+ \!\!= \lbrace \mu _e^+ \!= {\mathbb {E}}[W_e^+] \rbrace _{e \in E}, \quad \varvec{\mu }^- \!\!= \lbrace \mu _e^- \!= {\mathbb {E}}[W_e^-] \rbrace _{e \in E}. \end{aligned}$$

(3)

All such random variables and their means are assumed to be unknown (as typical in CMAB), and not to change in the various clustering rounds. Any CMAB algorithm keeps estimates of the true means, which are denoted as:

$$\begin{aligned} \hat{\varvec{\mu }} = \lbrace \hat{\varvec{\mu }}^+, \hat{\varvec{\mu }}^- \rbrace , \quad \hat{\varvec{\mu }}^+ \!\!= \lbrace {\hat{\mu }}_e^+ \rbrace _{e \in E}, \quad \hat{\varvec{\mu }}^- \!\!= \lbrace {\hat{\mu }}_e^- \rbrace _{e \in E}. \end{aligned}$$

(4)

Let also every edge $e = (u,v) \in E$ be represented by a pair of replicas, $e^{in}$ and $e^{out}$, which model the fact that e is an intra-cluster or inter-cluster edge (with respect to a given clustering), respectively. Let ${\mathcal {S}}^{in} = \{e^{in} \mid e \in E \}$ and ${\mathcal {S}}^{out} = \{e^{out} \mid e \in E\}$ be the sets of all intra-cluster and inter-cluster edge replicas, respectively. We make the base arms in CMAB correlation clustering correspond to the set ${\mathcal {S}} = {\mathcal {S}}^{in} \cup {\mathcal {S}}^{out}$ of all edge replicas (thus, the number of base arms is $m = 2|E|$), and a superarm be identified by a set of base arms that are consistent with the notion of clustering. Formally, a superarm corresponds to a clustering-compliant replica set:

Definition 1

(Clustering-compliant replica set) A set $S \subseteq {\mathcal {S}}$ of edge replicas is clustering-compliant if (i) for all $e \in E$, $S$ does not contain both $e^{in}$ and $e^{out}$, and (ii) for all $e_1 = (x,y), e_2 = (y,z), e_3 = (x,z) \in E$, if $e_1^{in}, e_2^{in} \in S$, then $e_3^{in} \in S$.

In the above definition, (i) is because an edge cannot be both intra-cluster and inter-cluster, while (ii) guarantees the transitive property that if vertices x, y are within the same cluster and y, z are within the same cluster, then x, z must be in the same cluster too. Simply speaking, a superarm corresponds to a clustering. Thus, we hereinafter refer to “superarm” and “clustering” as two equivalent notions.

Table 1 Contextualization of CMAB to correlation clustering

Full size table

The outcome of the base arms that are triggered while playing a superarm depends on the correlation-clustering formulation. In Min-CC, the outcome of every intra-cluster edge replica $e^{in}$ comes from the corresponding negative-type-weight $W_e^-$ random variable, while the outcome of every inter-cluster edge replica $e^{out}$ comes from the corresponding positive-type-weight $W_e^+$ random variable. The rationale is that, in Min-CC, the clustering quality is measured in terms of the negative-type weight of all intra-cluster edges and the positive-type weight of all the inter-cluster edges. Thus, placing a clustering (i.e., playing a superarm) is expected to give a feedback that is consistent with Min-CC ’s objective function: the outcome of $e^{in}$ (resp. $e^{out}$) replicas should be used to update $\mu _e^-$ (resp. $\mu _e^+$). Conversely, in Max-CC, $e^{in}$ and $e^{out}$ are assigned (and their outcome come from) $W_e^+$ and $W_e^-$, respectively.

The reward/loss corresponds to the correlation-clustering objective function, hence its definition depends on the correlation-clustering formulation too. Given a superarm $S$, let $S ^{in}$ and $S ^{out}$ denote the intra-cluster and inter-cluster edge replicas in $S$, respectively. Min-CC utilizes a disagreement-based loss $d (S)$ defined as:

$$\begin{aligned} \small \textstyle d (S) = \sum _{e \in S ^{in}} W_e^- + \sum _{e \in S ^{out}} W_e^+, \end{aligned}$$

(5)

while Max-CC employs a reward $a ({\mathcal {C}})$ defined in terms of agreements as:

$$\begin{aligned} \small \textstyle a (S) = \sum _{e \in S ^{in}} W_e^+ + \sum _{e \in S ^{out}} W_e^-. \end{aligned}$$

(6)

The expectations of $d (\cdot )$ and $a (\cdot )$ are as follows (by linearity of the expectation):

$$\begin{aligned} {\bar{d}}_{\varvec{\mu }} (S) = {\mathbb {E}}[d (S)] = \!\!\!\sum _{e \in S ^{in}} \mu _e^- +\!\!\! \sum _{e \in S ^{out}} \mu _e^+, \qquad {\bar{a}}_{\varvec{\mu }} (S) = {\mathbb {E}}[a (S)] = \!\!\!\sum _{e \in S ^{in}} \mu _e^+ + \!\!\!\sum _{e \in S ^{out}} \mu _e^-. \end{aligned}$$

(7)

where the “${\varvec{\mu }}$” subscript in ${\bar{d}}_{\varvec{\mu }}$ and ${\bar{a}}_{\varvec{\mu }}$ is to emphasize that those functions depend on the true means $\varvec{\mu }$. Denoting by ${\mathcal {C}}_{S}$ the clustering corresponding to superarm $S$, Eq. (7) can alternatively (yet equivalently) be written as:

$$\begin{aligned} {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_{S}) \ = \!\!\!\!\!\sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}_{S}(u) = {\mathcal {C}}_{S}(v) \end{array}} \!\!\mu ^-_{uv} \ \ + \!\!\!\! \sum _{\begin{array}{c} (u, v) \in E, \\ {\mathcal {C}}_{S}(u) \ne {\mathcal {C}}_{S}(v) \end{array}} \!\!\mu _{uv}^+, \qquad {\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}_{S}) \ = \!\!\!\!\!\sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}_{S}(u) = {\mathcal {C}}_{S}(v) \end{array}} \!\!\mu ^+_{uv} \ \ + \!\!\!\! \sum _{\begin{array}{c} (u, v) \in E, \\ {\mathcal {C}}_{S}(u) \ne {\mathcal {C}}_{S}(v) \end{array}} \!\!\mu _{uv}^-, \end{aligned}$$

(8)

Table 1 summarizes the elements of our CMAB correlation-clustering formulation.

CMAB-Min-CC and CMAB-Max-CC problems. Given a graph $G = (V,E)$, we perform discrete rounds $t=1, \ldots , T$, where at each round t, a clustering ${\mathcal {C}}_t$ of the vertices in V is computed and used to update the mean estimates $\hat{\varvec{\mu }}^+, \hat{\varvec{\mu }}^-$ of the random variables modeling the positive-type and negative-type edge weights, respectively. As discussed above, in Min-CC, the weight of an edge e between vertices within the same cluster (resp. in different clusters) is interpreted as a random sample useful to update ${\hat{\mu }}_e^-$ (resp. ${\hat{\mu }}_e^+$). In Max-CC, the opposite holds. The ultimate objective is to minimize/maximize the cumulative expected loss/reward of the clusterings yielded in all the rounds. Formally, the problems we tackle in this work are:

Problem 3

(CMAB-Min-CC) Given a graph $G = (V, E)$ and a number $T > 0$ of rounds, for every $t=1,\ldots , T$ find a clustering ${\mathcal {C}}_t:V \rightarrow {\mathbb {N}}^+$ so as to minimize

$$\begin{aligned} \small \textstyle \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \sum _{t=1}^T {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) \right] . \end{aligned}$$

(9)

Problem 4

(CMAB-Max-CC) Given a graph $G = (V, E)$ and a number $T > 0$ of rounds, for every $t=1,\ldots , T$ find a clustering ${\mathcal {C}}_t:V \rightarrow {\mathbb {N}}^+$ so as to maximize

$$\begin{aligned} \textstyle \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \sum _{t=1}^T {\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}_t) \right] . \end{aligned}$$

(10)

The expectation in Eqs. (9) and (10) is taken among all the random events generating the ${\mathcal {C}}_t$ clusterings (due to, e.g., possible randomization in the oracle that computes the clusterings). There is a further expectation in those equations, which is implicit in the definition of expected loss ${\bar{d}}_{\varvec{\mu }} (\cdot )$ and expected reward ${\bar{a}}_{\varvec{\mu }} (\cdot )$ (see Eq. 8).

As previously discussed, CMAB-Max-CC (resp. CMAB-Min-CC) requires an oracle to solve, for each round, a Max-CC (resp. Min-CC) instance according to the mean estimates $\hat{\varvec{\mu }}^+, \hat{\varvec{\mu }}^-$. However, oracles available for Max-CC (Charikar et al. 2005; Swamy 2004) are both inefficient and, more importantly, poorly useful in practice, since they are not able to output more than a fixed number of clusters (i.e., six). This implies that the corresponding CMAB setting (i.e., CMAB-Max-CC) will inherit this issue too, since the clusterings yielded at each round are obtained through these algorithms. This aspect is a showstopper in our context, as we are interested in algorithms that are effective and theoretically solid, yet capable of providing outputs whose quality is recognizable in practice too, not only theoretically. For this reason, we hereinafter focus our attention on algorithms for CMAB-Min-CC only. For completeness, algorithms for CMAB-Max-CC are however presented in Appendix A.2.

4 Algorithms for CMAB-Min-CC

In this section, we present algorithms for CMAB-Min-CC (Problem 3). We first focus on the context of general oracles for Min-CC (Sect. 4.1), and, then, on the case where the employed Min-CC oracles achieve theoretical guarantees only if the input meets certain properties (Sect. 4.2). Finally, we discuss the special case of input edge-weight distributions satisfying specific constraints (Sect. 4.3).

4.1 General Min-CC oracles

The CC-CLCB algorithm. We devise a variant of the so-called Combinatorial Upper Confidence Bound (CUCB) algorithm (Chen et al. 2016) which is an extension of the UCB1 method for MAB (Auer et al. 2002). It keeps, along with the estimate of the means of the base-arm random variables, confidence intervals within which the true means fall with overwhelming probability, and plays superarms based on the upper bound of those intervals. Our proposed variant, termed Combinatorial Lower Confidence Bound (CLCB), is tailored for minimization problems but follows the principles of CUCB: it maintains confidence intervals where the true means fall in with high probability, but, conversely to CUCB, it plays superarms based on the confidence-interval lower bounds.

Our customization of CLCB to Min-CC is termed CC-CLCB and outlined as Algorithm 1. CC-CLCB keeps track of the mean estimates $\hat{\varvec{\mu }} = \lbrace \hat{\varvec{\mu }}^+, \hat{\varvec{\mu }}^- \rbrace$ (Eq. 4), and of the number $T_e^+$ (resp. $T_e^-$) of times a sample from $W_e^+$ (resp. $W_e^-$) random variable has been observed until the current round, for all $e \in E$. At the beginning, $\forall e \in E: T_e^+ = T^-_e = 0$, and $\hat{\varvec{\mu }}$ are initialized, e.g., randomly or based on prior domain knowledge (Line 1). In every round t, the current mean estimates are adjusted with a term $\rho ^\pm _e$ (defined based on Chernoff-Hoeffding bounds (Auer et al. 2002; Chen et al. 2016)), so as to foster, to some extent, the exploration of less often played base arms (Line 3). This leads to the adjusted means $\lbrace {\widetilde{\mu }}^+_e, {\widetilde{\mu }}^-_e\rbrace _{e \in E}$ (Line 4), which are interpreted as positive-type and negative-type edge weights of a correlation-clustering instance, respectively, and are fed as input (along with G) to an oracle ${\textbf{O}}$ that computes a Min-CC solution ${\mathcal {C}}_t$ (Line 5). ${\mathcal {C}}_t$ is used as a feedback to update the mean estimates (Sect. 3, Table 1). Specifically, the weight of each intra-cluster (resp. inter-cluster) edge e is interpreted as a sample of $W_e^-$ (resp. $W_e^+$), and is used to update ${\hat{\mu }}^-_e$, $T_e^-$ (resp. ${\hat{\mu }}^+_e$, $T_e^+$). ${\hat{\mu }}^+_e$ and ${\hat{\mu }}^-_e$ are updated so as to be equal to the average of the samples from $W_e^+$ and $W_e^-$ observed so far, respectively (Lines 6–11).

Regret analysis of CC-CLCB. As correlation clustering is ${{\textbf{N}}}{{\textbf{P}}}$-hard, it is unlikely that CC-CLCB can be equipped with an exact oracle ${\textbf{O}}$ for Min-CC running in polynomial time. Hence, in analyzing the theoretical guarantees of CC-CLCB, we consider the case where ${\textbf{O}}$ is a Min-CC$(\alpha ,\beta )$-approximation oracle:

Definition 2

(Min-CC-$(\alpha , \beta )$-approximation oracle) Given a Min-CC instance $I \!=\! \langle (V,E), \lbrace \!(w _e^+\!,w _e^-)\! \rbrace _{e \in E}\rangle$, let ${\mathcal {C}}^*_I$ be the optimal solution to I. Given $\alpha , \beta \in (0, 1]$, an algorithm for Min-CC is a min-$(\alpha , \beta )$-approximation oracle if, for every input I, it yields a solution ${\mathcal {C}}$ such that $\Pr [f_{min} ({\mathcal {C}}) \le \frac{1}{\alpha }~f_{min} ({\mathcal {C}}^*_I)] \ge \beta$ (where $f_{min} (\cdot )$ is Min-CC ’s objective function, Eq. (1)).

The condition in Definition 2 to recognize ${\textbf{O}}$ as a Min-CC-$(\alpha , \beta )$-approximation oracle needs to hold on every Min-CC instance that is given as input to ${\textbf{O}}$ at each round. Hence, the condition has to hold on the mean estimates, not the true ones. Similarly to the maximization counterpart, existing Min-CC algorithms achieving (${\mathcal {O}} (\log |V|)$) guarantees in expectation (Charikar et al. 2005; Demaine et al. 2006) can be employed as Min-CC-$(\alpha , \beta )$-approximation oracles. More details in Appendix A.1.

We introduce a notion of $(\alpha ,\beta )$-approximation regret, which can be viewed as the minimization counterpart of the traditional one defined in Chen et al. (2016) and used in maximization problems. Applied to the Min-CC context, this measure is defined as follows:

Definition 3

(Min-CC-$(\alpha , \beta )$-approximation regret) Let ${\mathcal {C}}^*_{I}$ be the clustering minimizing the expected loss ${\bar{d}}_{\varvec{\mu }} (\cdot )$ (Eq. 7) on a CMAB-Min-CC instance I (w.r.t. the true $\varvec{\mu }$ means, Eq. 3), let ${\mathcal {M}} = \max _{{\mathcal {C}} \in {\textbf{C}}(I)} {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}})$ (where ${\textbf{C}}(I)$ is the set of all clusterings of I), and let $\lbrace {\mathcal {C}}_t\rbrace _{t=1}^T$ be the clusterings output by an algorithm ${\textbf{A}}$ run on I. The Min-CC-$(\alpha , \beta )$-approximation regret of ${\textbf{A}}$ is

$$\begin{aligned} \textstyle Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T) = \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \sum _{t=1}^T {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) \right] - T\left[ \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}) + ({\mathcal {M}} - \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}))(1 -\beta ) \right] . \end{aligned}$$

(11)

The rationale of the above definition is as follows. First, being the focus on a minimization problem, the lower the probability $\beta$ of success, the higher the loss value to compare with. Moreover, to take into account possible divergences of the approximation oracle from the optimum, and recalling that here are losses, not rewards, we add an extra term to the $\frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I})$ loss that “interpolates” between the highest $\beta =1$ probability (thus, we compare with $\frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I})$) and the worst $\beta =0$ probability (thus, we compare with the maximum value ${\mathcal {M}}$ of loss). Note that the $T\left[ \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}) + ({\mathcal {M}} - \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}))(1 -\beta ) \right]$ term in $Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T)$ is used as a comparison for the (expected) performance $\mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \sum _{t=1}^T {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) \right]$ achieved by the CMAB method at hand in the various rounds. It is defined by noticing that, in every round $t = 1, \ldots , T$, a Min-CC-$(\alpha , \beta )$-approximation oracle yields, with probability (at least) $\beta$, a solution whose ${\bar{d}}_{\varvec{\mu }} (\cdot )$ value is at most $\frac{1}{\alpha }$ times the optimum (i.e., $\frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*)$), and, with probability (at most) $1 - \beta$, a solution whose ${\bar{d}}_{\varvec{\mu }} (\cdot )$ value is more than $\frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*)$. In the latter case, consistently with the regret definition in maximization problems (Chen et al. 2016), we assume that the ${\bar{d}}_{\varvec{\mu }} (\cdot )$ value of the yielded solutions is equal to an upper bound $UB = {\mathcal {M}}$ on ${\bar{d}}_{\varvec{\mu }}$. More precisely:

$$\begin{aligned}&\small \textstyle T \left[ \frac{1}{\alpha }~\beta ~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) + (1 - \beta )UB \right] \ = \ T \left[ \frac{1}{\alpha }~\beta ~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) + (1 - \beta ){\mathcal {M}} \right] \ = \ T \left[ \frac{1}{\alpha }~\beta ~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) + {\mathcal {M}} - \beta ~{\mathcal {M}} \right. \\&\small \textstyle \left. + \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) - \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) \right] \ = \ T \left[ \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_I^*) + \left( {\mathcal {M}} - \frac{1}{\alpha }~{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}) \right) (1 -\beta ) \right] . \end{aligned}$$

The comparison term in $Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T)$ is pessimistic in assuming that when the $(\alpha , \beta )$-approximation oracle does not achieve approximation guarantees it yields solutions whose ${\bar{d}}_{\varvec{\mu }} (\cdot )$ value is equal to the upper bound ${\mathcal {M}}$. However, note that this happens with probability $1 - \beta$. In our context, $1 -\beta$ is in the order of $|V|^{-c}$ (cf. Appendix A.1), with c set to 1 in our experiments (cf. Sect. 5). This means that the pessimistic assumption arises just in a tiny minority $|V|^{-1}~T$ of the rounds. Also, the comparison term still adopts the optimistic assumption that the true $\varvec{\mu }$ weights are known, while they are actually not for the CMAB method that is being evaluated in terms of $Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T)$.

As typically required in (C)MAB, the above regret is consistent with the definition of cumulative expected reward/loss at hand (i.e., Eq. 9, in this case). Thus, minimizing that regret corresponds to solving CMAB-Min-CC (Problem 3). A key theoretical desideratum in CMAB (and online-learning settings in general), is having a regret bounded by some function that is sublinear in the number of rounds. This is motivated by the fact that the overall objective is typically a summation over the number of rounds, thus a regret growing (at least) linearly in the number of rounds is considered as a straightforward result that any algorithm can easily achieve.

As shown in the next theorem, CC-CLCB achieves a regret bound that is logarithmic in the number of rounds:

Theorem 1

Given $\alpha , \beta \in (0,1]$, the Min-CC-$(\alpha , \beta )$-approximation regret (Definition 3) of the CC-CLCB algorithm (Alg. 1), when equipped with a Min-CC-$(\alpha , \beta )$-approximation oracle ${\textbf{O}}$ (Definition 2), is upper-bounded by a function that is ${\mathcal {O}} (\log T)$.

Proof

(sketch) The proof relies on the following main result: the ${\bar{d}}_{\varvec{\mu }} (\cdot )$ function (Eq. 8) satisfies the properties of monotonicity and 1-norm bounded smoothness. This triggers a (rather long yet complex) chain of further results along the line of those derived in Wang and Chen (2017) for the regret analysis of algorithms for CMAB maximization problems. The ultimate of such results attests the desired logarithmic regret bound. A more detailed proof is reported in Appendix A.3. $\square$

4.2 Min-CC oracles requiring the probability constraint

The CC-CLCB algorithm makes no assumptions on the input graph or edge-weight distributions. Thus, to achieve regret guarantees, CC-CLCB needs a Min-CC oracle whose approximation guarantees hold in general, without requiring restrictions on the input. As said above, algorithms of this kind, in the context of Min-CC, exist (Charikar et al. 2005; Demaine et al. 2006), but they suffer from issues such as limited efficiency yet not easy implementation (they need to solve a linear program of size $\varOmega (|V|^3)$), and non-constant (${\mathcal {O}} (\log |V|)$) approximation factor. A much better option would be to resort to the well-established Pivot (Ailon et al. 2008), which is efficient (it takes linear time), easy to implement (it just randomly picks a vertex u and builds a cluster as composed of u and all vertices connected to u with and edge whose positive-type weight is no less than the negative-type one), and achieves constant-factor approximation. Unfortunately, the (expected factor-5) guarantees of Pivot hold only if the input graph is complete and the edge weights satisfy the probability constraint, i.e., $w ^+_{uv} + w ^-_{uv} = 1$, $\forall u,v \in V$. For this purpose, here we focus on the design of heuristic variants of CC-CLCB that favor the fulfilment of the probability constraint on the Min-CC instances to be processed by the oracle. The rationale is that the closer a Min-CC instance is to meet the probability constraint, the closer Pivot is to its “theoretical comfort zone”, thus expected to perform better.

The PC+Exp-CLCB algorithm. Our first proposal in this regard is the PC+Exp-CLCB algorithm (where “PC+Exp” means “probability constraint + exploration”). This algorithm, outlined as Algorithm 2, follows the same scheme as CC-CLCB, but it computes $\lbrace {\widetilde{\mu }}^+_{uv}, {\widetilde{\mu }}^-_{uv}\rbrace _{u,v \in V}$ adjusted means so as to simultaneously favor some exploration and make the resulting Min-CC instance satisfy the probability constraint.

The Global-CLCB algorithm. As our second variant of CC-CLCB, we devise an algorithm—dubbed Global-CLCB–which builds Min-CC instances at each round that are as close as possible to meet a global constraint on the edge weights similar the one defined in Mandaglio et al. (2021). The fulfilment of this global constraint makes the probability-constraint-aware approximation guarantees still hold even if the probability constraint is locally violated. Global-CLCB mainly relies on the following result:

Theorem 2

Let $I = \langle G = (V,E), \lbrace {\widetilde{\mu }}^+_{uv} \rbrace _{u,v \in V}, \lbrace {\widetilde{\mu }}^-_{uv} \rbrace _{u,v \in V} \rangle$ be a Min-CC instance. If $\left( {\begin{array}{c}|V|\\ 2\end{array}}\right) ^{-1}\sum _{u, v \in V} ( {\widetilde{\mu }}^+_{uv} + {\widetilde{\mu }}^-_{uv} ) \ge 1$, then any Min-CC algorithm (e.g., Pivot) achieving (expected) factor-$\varepsilon$ approximation in presence of the probability constraint achieves (expected) factor-$\varepsilon$ approximation on I too.

Proof

(sketch) The result here is a special case of the one originally proved in Theorem 1 in Mandaglio et al. (2021), specifically arising for $\varDelta _{max} = 1$. Therefore, the proof herein is exactly the same as the one of Theorem 1 in Mandaglio et al. (2021), with the only straightforward exception of replacing $\varDelta _{max}$ with the constant 1. $\square$

Global-CLCB attempts to compute $\lbrace {\widetilde{\mu }}^+_{uv}, {\widetilde{\mu }}^-_{uv}\rbrace _{u,v \in V}$ adjusted means that are as close as possible to satisfy Theorem 2. Global-CLCB is the same as CC-CLCB, except for their respective Line 4. A detailed pseudocode of Global-CLCB is reported in Algorithm 3. We point out that CC-CLCB ’s regret analysis does not hold for PC+Exp-CLCB or Global-CLCB. Deriving theoretical regret guarantees for these (or similar) heuristics is a challenging open question that we defer to future work.

4.3 Special edge-weight distributions

An interesting special input is the one of symmetric edge-weight distributions:

Definition 4

(Symmetric distributions) [0, 1]-support random variables $W^+_e$, $W^-_e$ have symmetric distributions if and only if $W^+_{e}(x) = W^-_{e}(1-x)$, for all $x \in [0,1]$.

Conceptually, this is like assuming that if a similarity equal to x holds for any two vertices, a $(1-x)$ distance implicitly holds for the same vertices as well.

CMAB-correlation-clustering instances where symmetry holds for all edge-weight distributions are easier to solve. In fact, symmetry in the distributions makes the instance at hand a full-information bandit setting: observing a sample $x \sim W^+_{e}$ is equivalent to observing a sample $(1-x) \sim W^-_{e}$, for all $e \in E$. This corresponds to having an outcome revealed for all the base arms, regardless of the superarm (clustering) played. In this case, therefore, exploration is meaningless. Rather, a full-exploitation strategy is worth to be performed, where a clustering considering solely the current mean estimates is yielded in each round. This strategy achieves a regret bound that is constant in the number of rounds, as stated by Theorem 3:

Theorem 3

Given $\alpha , \beta \in (0,1]$, the Min-CC-$(\alpha , \beta )$-approximation regret (Definition 3) of a full-exploitation strategy run on a CMAB-Min-CC instance where all edge-weight distributions are symmetric, and equipped with a Min-CC-$(\alpha , \beta )$-approximation oracle (Definition 2), is upper-bounded by a function of T that is ${\mathcal {O}} (1)$.

Proof

(sketch) The full-information bandit setting allows for simplifying some intermediate math in the regret analysis of a non-full-information setting (Theorem 1). These simplifications ultimately lead to a ${\mathcal {O}} (1)$ regret bound. A detailed proof and a pseudocode of the full-exploitation strategy are in Appendix A.4. $\square$

4.4 Visualization example

Figure 1 provides a visualization of the HighlandTribes and Contiguous-USA graphs, used as case in point (cf. Section 5), and their CMAB-Min-CC clusterings produced by $\epsilon$-greedy and CC-CLCB, respectively, using the same oracle in both cases. In particular, we show the outcomes referring to the using correlation-clustering linear-programming method in Charikar et al. (2005) as an oracle. Our goal here is to provide empirical evidence of the significance of the CMAB-Min-CC setting and effectiveness of the CMAB-Min-CC methods. To this purpose, we show three snapshots of execution on each graph, namely at the initial, middle and final round of a method. Besides visualizing the cluster memberships of vertices—note that vertices of one cluster share the same color at any round, but color memberships may change at different rounds—we also use black edges and red edges to distinguish between edge-level agreements and disagreements, respectively, which denote that the disequality between the positive-type weight and negative-type weight of an edge holds, or does not hold, the same on the true weights and their estimates; formally, an edge (u, v) is colored as black if $(\mu _{uv}^+> \mu _{uv}^- \wedge {\hat{\mu }}_{uv}^+ > {\hat{\mu }}_{uv}^-) \vee (\mu _{uv}^+ \le \mu _{uv}^- \wedge {\hat{\mu }}_{uv}^+ \le {\hat{\mu }}_{uv}^-)$, otherwise as red.

Two major remarks stand out by looking at the plots for each graph. First, the similarity measured in terms of normalized mutual information (NMI) between the CMAB-Min-CC solution produced by the oracle over the graph with mean estimates and the corresponding solution over the graph with true means, significantly improves as more rounds are carried out; in particular, as shown for HighlandTribes (plots (a-c)), already after few early rounds, NMI approaches the maximum value reached at the final round. Second, the number of edge-level agreements also rapidly increases after few rounds, until few disagreements are left at the final round.

5 Experimental methodology

Data. We consider ten publicly-available real-world graphs, as summarized in Table 2. Each of the five networks from bottom corresponds to the flattening of a network originally represented as a set of snapshot-graphs (Galimberti et al. 2020) (i.e., an edge between u and v exists in the flattened network if u and v were linked in at least one snapshot).

Edge weight distributions. The random variables $W_e^+, W_e^-$ modeling the positive-type and negative-type edge weights in a Min-CC instance are assumed to follow a Bernoulli distribution, whose means are generated according to three schemes.

In the first two schemes, termed $R\text{-}wd$ and $PC\text{-}wd$, the original—possibly incomplete—network topology of the underlying input graph is maintained, meaning that $\mu _e^+ \!= {\mathbb {E}}[W_e^+] = \mu _e^- \!= {\mathbb {E}}[W_e^-] = 0$, for all $e \notin E$. As far as the means $\mu _e^+$, $\mu _e^-$ for each $e \in E$, $R\text{-}wd$ samples uniformly at random both $\mu _e^+$ and $\mu _e^-$ from the [0, 1] interval, independently from one another, i.e., $\mu ^+_{e}, \mu ^-_{e} \sim Uniform(0, 1)$, for all $e \in E$. On the other hand, $PC\text{-}wd$ ensures that the probability constraint holds on the generated means, which corresponds to first sample $\mu ^+_{e} \sim Uniform(0, 1)$, and then set $\mu ^-_{e} = 1-\mu ^+_{e}$, for all $e \in E$. As a result, for both $R\text{-}wd$ and $PC\text{-}wd$, $\mu _e^+, \mu _e^- \in [0,1]$, while the samples observed from the edge weight distributions at each CMAB round $\in \{0,1\}$, for all $e \in E$. In particular, samples $W_e^+ \!=\! 1$ and $W_e^+ \!=\! 0$ (resp. $W_e^- \!=\! 1$ and $W_e^- \!=\! 0$) are observed with probability $\mu _e^+$ and $1\!-\! \mu _e^+$ (resp. $\mu _e^-$ and $1\!-\! \mu _e^-$), respectively.

The third scheme assumes the actual network topology imposes a binary, mutually exclusive setting for each pair of vertices, i.e., $\mu ^+_{uv} \!=\! 1, \mu ^-_{uv} \!=\! 0$, if $(u,v) \in E$, and $\mu ^+_{uv} \!=\! 0, \mu ^-_{uv} \!=\! 1$, if $(u, v) \notin E$. Since this setting leads to a new complete graph, the scheme is referred to as $C\text{-}wd$, and it will be considered only for the smaller datasets, as it is computationally unfeasible to handle complete versions of the larger datasets. As $\mu ^+_{uv}, \mu ^-_{uv} \in \{0,1\}$, for all $u,v \in V$, the underlying $W_{uv}^+, W_{uv}^-$ distributions are actually degenerate, and every sample observed in a CMAB round from $W_{uv}^+$ (resp., $W_{uv}^-$) will be equal to 1 if $\mu ^+_{uv}= 1$ (resp., $\mu _{uv}^- = 1$), and it will be 0 if $\mu ^+_{uv}= 0$ (resp., $\mu _{uv}^- = 0$).

Table 2 Main characteristics of the real-world datasets used in our evaluation

Full size table

Assessment criteria. The means $\mu _{e}^+$, $\mu _{e}^-$ generated via the above schemes correspond to the true correlation-clustering edge weights that are unknown to any CMAB method. These are used to evaluate the quality of the clusterings yielded in the various CMAB rounds via the average expected normalized cumulative Min-CC loss, calculated until each round t:

$$\begin{aligned} f^{(t)} = \textstyle \frac{1}{t} \sum _{i=1}^t \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \frac{{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_i)}{U} \right] , \end{aligned}$$

(12)

where $C_i$ is the clustering of the i-th round, $U = \sum _{u,v \in V} \max \{\mu _{uv}^+, \mu _{uv}^-\}$ is a normalization constant (equal to an upper bound on the Min-CC objective-function value, so that ${\bar{d}}({\mathcal {C}}_i)/U \!\in \! [0,1]$), and the $\mathop {\mathrm {{\mathbb {E}}}}\limits [\cdot ]$ expectation is computed by averaging the ${\bar{d}}_{\varvec{\mu }} (\cdot )$ values obtained over all the runs of execution of the randomized Min-CC oracles (see below). Note that Eq. (12) is a shorter and normalized version of Eq. (11). In fact, Eq. (12) is limited to the first term only in Eq. (11), as the second term is common to all the methods (under the same oracle). It is also normalized, so as to have the results on graphs of different size more easily comparable to each other.

As a second assessment criterion, we consider the error of the ${\hat{\mu }}_{e,t}^+$, ${\hat{\mu }}_{e,t}^-$ weight estimates at each round t, which is measured in terms of relative error norm as

$$\begin{aligned} ren^{(t)} = \sqrt{ \frac{\sum _{e \in E} ( {\mu }_e^+ - {\hat{\mu }}_{e,t}^+ )^2 + \sum _{e \in E} ({\mu }_e^- - {\hat{\mu }}_{e,t}^- )^2 }{\sum _{e \in E}({\mu }_e^+)^2 + \sum _{e \in E}({\mu }_e^-)^2} }, \end{aligned}$$

(13)

For both $f^{(t)}$ and $ren^{(t)}$ lower values correspond to better performance. Our main focus is on the $f^{(T)}$, $ren^{(T)}$ values at the final round $t = T$, as they give a compact yet general evidence of the overall performance of a method. However, we also analyze the trend of $ren^{(t)}$ over the various rounds to assess statistical significance (Sect. 6.5), and report evidence of such trends (Sect. 7).

Within this view, in the result tables presented in Sect. 6, we shall report $f^{(T)}$ and $ren^{(T)}$ values. Moreover, for the CMAB methods only, we also provide the growth rates, i.e., the average amount of relative change between the initial and the final round over the span T (in percentage):

$$\begin{aligned} gr^{\%}_{_{\!\!f}} = \left( \frac{{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_T)}{{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_1)} -1 \right) \times 100, \qquad gr^{\%}_{_{\!\!ren}} = \left( \frac{ren^{(T)}}{ren^{(1)}} -1 \right) \times 100. \end{aligned}$$

(14)

Finally, we are also interested in assessing the running time of the various tested methods (Sect. 6.4).

Methods. We involve methods falling into four approaches: (i) CMAB-Min-CC methods adopting the CLCB paradigm, (ii) classic general CMAB heuristics that, in this context, are customized to work for CMAB-Min-CC, (iii) baselines that do not follow the CMAB paradigm, and (iv) a reference method that performs clustering by utilizing the true edge weights. More specifically:

(i)
As CLCB-based methods, we include CC-CLCB (Algorithm 1), PC+Exp-CLCB (Algorithm 2), and Global-CLCB (Algorithm 3). Moreover, for both CC-CLCB and Global-CLCB, we also consider their CC-CLCB-m and Global-CLCB-m variants, which are less biased towards exploration. Specifically, following Wang and Chen (2018), CC-CLCB-m and Global-CLCB-m utilize uncertain terms defined as $\rho _{e}^\pm = \sqrt{\ln t/2T^\pm _e}$ (instead of $\rho _{e}^\pm = \sqrt{3 \ln t / 2T^\pm _e}$).
(ii)
As CMAB heuristics, we involve the well-established $\epsilon$-greedy, pure exploitation (PE), and Combinatorial Thompson Sampling (CTS) (Wang and Chen 2018). As for $\epsilon$-greedy, we consider both a fixed exploration rate, set to 0.1, and an adaptive exploration rate, set to be proportional to $t^{-1}$, at each t. These variants are dubbed EG-fixed and EG, respectively.
(iii)
As far as non-CMAB baselines, the idea is to set both types of unknown edge weights based on topological affinity of any two vertices’ neighborhoods, and then run a Min-CC algorithm [specifically, Pivot (Ailon et al. 2008) in most experiments, and the linear programming approach dubbed LP+R (Charikar et al. 2005) in the experiment in Sect. 6.3] on such an input, employing no weight learning strategy. More precisely, we resort to two well-known topological similarity measures, namely Jaccard index and Adamic-Adar index, to set the positive-type weights: for each $(u,v) \in E$, $w _{uv}^+ = |N(u) \cap N(v)|/|N(u) \cup N(v)|$ using Jaccard, or $w _{uv}^+ = | N(u) \cap N(v)|^{-1} \sum _{z \in N(u) \cap N(v)} (\log |N(z)|)^{-1}$ using (normalized) Adamic-Adar, where N(u) is the set of u’s neighbors. The negative-type weights are then derived in such a way that the probability constraint holds, i.e., $w_{uv}^- = 1 - w _{uv}^+$.
(iv)
As a reference method, we consider clustering with the actual (i.e., true) edge weights via a state-of-the-art Min-CC algorithm (i.e., Pivot (Ailon et al. 2008) in most experiments, and LP+R (Charikar et al. 2005) in the experiment in Sect. 6.3). This method is termed Actual-weight.

Unless otherwise specified, all the CMAB methods are assumed to be equipped with the Pivot algorithm (Ailon et al. 2008) as an oracle for Min-CC. Pivot is used as a reference oracle because it is more usable in practice, due to its efficiency, approximation guarantees, and ease of implementation. However, we also carry out an experiment to evaluate the impact of a different oracle, specifically the LP+R algorithm (Ailon et al. 2008) (Sect. 6.3). As LP+R takes $\varOmega (|V|^3)$ time just to build the linear program, this experiment is performed on the smaller datasets only.

Since the chosen Min-CC oracles are randomized algorithms, for every experiment, we perform $\log _{2}|V|$ independent runs of the selected oracle per CMAB round (setting $\delta =1, c=1$, cf. Appendix A.1), and take the best solution in terms of Min-CC objective with respect to the current weight estimates. In all the experiments, the number T of CMAB rounds is set to 500, while the number of runs of the Min-CC oracle for every round is set to 10.

Evaluation goals. As this is the first work that investigates Min-CC in a CMAB setting, our experiments are not really intended to assess the superiority of some proposed method(s) over the state of the art. Rather, our main objective here is to provide a comparative evaluation of a variety of CMAB heuristics, approximation algorithms, and heuristic variants of approximation algorithms in the context of CMAB-Min-CC, and derive experimental insights on the peculiarities of the various tested methods. Specifically, the main goals of our experimental evaluation are as follows:

Assess the performance of the CMAB methods (CC-CLCB, EG, EG-fixed, PE, CTS) in terms of $f^{(T)}$ and $ren^{(T)}$, and compare them to non-CMAB baselines (Adamic-Adar, Jaccard) and the reference Actual-weight method (Sect. 6.1).
Compare the performance of the various CLCB variants (CC-CLCB, CC-CLCB-m, PC+Exp-CLCB, Global-CLCB, Global-CLCB-m) to each other, in terms of $f^{(T)}$ and $ren^{(T)}$ (Sect. 6.2).
Evaluate the impact of varying the Min-CC oracle on the performance of the various CMAB methods (Sect. 6.3).
Evaluate the efficiency of all the selected methods (Sect. 6.4).
Perform a statistical significance analysis of the reported results (Sect. 6.5).

Further characterization. We also analyze the number of output clusters and the stability of the performance over the various rounds and runs of the tested methods (Sect. 7). This analysis is intended not really as a performance assessment, rather as an additional useful insight to better characterize the tested methods.

Implementation and testing environment. All the tested methods are implemented in Python 3.8, with some of them using external libraries. In particular, LP+R adopts the PuLP library for linear programming,^{Footnote 2}Adamic-Adar and Jaccard use, respectively, the NetworkX and python-igraph libraries to compute the topological similarity scores.^{Footnote 3} All the experiments are carried out on the Cresco6 cluster,^{Footnote 4} a high-performance computing system running Linux Centos 7.4, and consisting of 434 nodes, where each one is equipped with two Intel(R) Xeon(R) Platinum 8160 CPU @2.10GHz x24 processor and 192GB ram.

Table 3 Performance in terms of $f^{(T)}$ (Eq. 12) and (for the CMAB methods) $gr^{\%}_{_{\!\!f}}$ (Eq. 14)

Full size table

Table 4 Performance in terms of $ren^{(T)}$ (Eq. 13) and (for the CMAB methods) $gr^{\%}_{_{\!\!ren}}$ (Eq. 14)

Full size table

6 Results

6.1 Performance of the CMAB methods

Quality of the clusterings (Table 3). As a first general remark, the non-CMAB baselines (Adamic-Adar, Jaccard) achieve the worst performance in all the datasets and weight settings, while Actual-weight is always the best method, with only a couple of exceptions. This was expected, as the non-CMAB baselines employ no strategy to learn the true weights, whereas Actual-weight operates on the true weights. Importantly, in most cases, the CMAB methods (CC-CLCB, EG, EG-fixed, PE, CTS) perform comparably or close to Actual-weight. The loss values of all the CMAB methods follow a decreasing trend over the rounds, as testified by the negative growth rates (and better shown in Fig. 2, Sect. 7). This was expected, since the CMAB algorithms learn how to cluster the vertices over time. In general, all the CMAB algorithms converge to solutions with lower growth rate in the $PC\text{-}wd$ setting than the $R\text{-}wd$ setting. Also, with the exception of HighlandTribes, the difference of the best loss scores is higher in the $PC\text{-}wd$ setting than the the $R\text{-}wd$ one. This complies with the fact that the probability constraint leads to an easier Min-CC clustering task.

Focusing on the CMAB methods, the best performance corresponds to PE. This can be explained since (i) the Min-CC oracle therein used (i.e., Pivot) is a randomized algorithm, thus, although with a pure-exploitation bandit strategy, it results in some implicit exploration; (ii) due to the peculiarity of our problem, each super arm admits a feedback from half the total number of arms, thus a bandit strategy with minimal exploration would likely perform better in the long run. CC-CLCB exhibits very good performance: it is comparable or close to the best methods in most datasets and weight settings, achieving maximum and average difference in loss with respect to the best performer(s) over all the configurations of 0.038 and 0.014, respectively.

Quality of the learned edge weights (Table 4). A first general observation is that the weight estimates of all the CMAB methods improve as the rounds progress, and the relative error goes down over time, leading to a negative growth rate. This is consistent with the clustering improvement by increasing rounds observed from Table 3. As expected, the non-CMAB baselines yield the highest error values, while Actual-weight clearly achieves zero error everywhere. Among the CMAB methods, EG and EG-fixed yield the most accurate estimates in the $C\text{-}wd$ weight setting. In the $R\text{-}wd$ and $PC\text{-}wd$ settings, EG-fixed is (comparable to) the best performer on the smaller datasets (Karate, Dolphins, Zebra, HighlandTribes, Contiguous-USA), while on the bigger datasets, CTS is (comparable to) the best method. CC-CLCB achieves the best performance in three datasets (Zebra, Last.fm, PrimarySchool) for $R\text{-}wd$ and $PC\text{-}wd$ distributions. Importantly, for some methods, the good/bad performance on the weight-estimation task does not necessarily translate into an equally good/bad performance in the clustering results discussed above. This has a twofold motivation: (1) the underlying oracle is not an exact algorithm for Min-CC, thus it may happen that clustering with weight estimates lead to clusterings that are of better quality when evaluated in terms of the actual weights, and (2) CMAB methods like CC-CLCB adopt exploration strategies that perturb the current weight estimates before giving them to the oracle, which corresponds to perform clustering with weights that are actually different from the estimated ones.

Table 5 Performance in terms of $f^{(T)}$ (Eq. 12) and $gr^{\%}_{_{\!\!f}}$ (Eq. 14), for the CLCB methods (equipped with Pivot as a Min-CC oracle)

Full size table

6.2 Performance of the CLCB-based CMAB methods

Quality of the clusterings (Table 5). In general, we observe that all the CC-CLCB variants perform rather closely to each other in all the configurations. Deepening the analysis, CC-CLCB-m and Global-CLCB-m are the best methods (in all the datasets but one) in the $R\text{-}wd$ weight setting. Conversely, in the $PC\text{-}wd$ and $C\text{-}wd$ settings, the best method is PC+Exp-CLCB in most cases: specifically, it is the best performer on all the datasets (though on par with CC-CLCB-m and Global-CLCB-m on the larger ones) in $PC\text{-}wd$, and on three out of five datasets in $C\text{-}wd$. Also, Global-CLCB performs better in the $PC\text{-}wd$ and $C\text{-}wd$ settings than the $R\text{-}wd$ one. These findings comply with the design principles of the CC-CLCB variants that favor the fulfilment of the probability constraint on the Min-CC instances to be processed by the underlying oracle (cf. Sect. 4.2), which clearly benefit from those settings like $PC\text{-}wd$ and $C\text{-}wd$ where the probability constraint actually holds.

Interestingly, CLCB and Global-CLCB achieve the same results in all the configurations (and the same holds for CLCB-m vs. Global-CLCB-m). This can be explained as CC-CLCB and Global-CLCB (and the same is for CLCB-m and Global-CLCB-m) compute adjusted weight estimates (Line 4 in Algorithms 1 and 3) so that the ordering between the positive-type weight estimate and the negative-type weight estimate is likely to be the same for both algorithms. In other words, despite CC-CLCB and Global-CLCB may compute different actual values of those weight estimates, the two algorithms are mostly consistent in yielding a positive-type weight estimate that is higher/lower than the negative-type one. This leads to very similar clusterings yielded by the Pivot Min-CC oracle in every run and every round of both CC-CLCB and Global-CLCB, as Pivot places any two vertices in the same clustering by solely checking whether the positive-type weight on the edge between those vertices is higher than the negative-type one, without looking at the specific values of those weights.

Table 6 Performance in terms of $ren^{(T)}$ (Eq. 13) and $gr^{\%}_{_{\!\!ren}}$ (Eq. 14), for the CLCB methods (equipped with Pivot as a Min-CC oracle)

Full size table

Quality of the learned edge weights (Table 6). In terms of edge weights, the picture in the $PC\text{-}wd$ and $C\text{-}wd$ settings is roughly consistent with what observed in terms of clustering quality. Some differences arise in the $R\text{-}wd$ setting, where, unlike the clustering quality criterion, CC-CLCB-m and Global-CLCB-m are the best methods only in a few configurations (they are outperformed mostly by PC+Exp-CLCB).

As another interesting observation, here are some differences between CLCB and Global-CLCB, and between CLCB-m and Global-CLCB-m. This confirms the argument discussed above, i.e., that those methods achieve the same clustering results even though they can learn different weight estimates.

6.3 Varying the Min-CC oracle

Table 7 shows the performance of all the competing methods when using LP+R as a Min-CC oracle (instead of Pivot). Here, we also show the relative difference (in percentage) between the score with LP+R and the corresponding score with Pivot. Thus, the more positive (resp. negative) such a relative difference, the worse (resp. better) the performance of using LP+R than using Pivot. The general trend in terms of clustering quality (Table 7a) is that LP+R leads to an increase (resp. decrease) in performance in the $R\text{-}wd$ and $PC\text{-}wd$ settings (resp. $C\text{-}wd$ setting). This is likely due to the fact that $R\text{-}wd$ and $PC\text{-}wd$ are more challenging than $C\text{-}wd$, as it is well-known that Min-CC is easier on complete-graph input instances (Charikar et al. 2005). In fact, LP+R provides approximation guarantees at each round, regardless of the weights in input to the oracle. Conversely, Pivot provides quality guarantees if the probability constraint holds on the given input, but this is not the case in a generic round t. The $C\text{-}wd$ setting (i.e., complete graph with probability constraint) corresponds to the most favorable scenario for Pivot to provide approximation guarantees.

In terms of learned edge weights (Table 7b) the advantage of using LP+R is less evident. A reason might lie in the different random choices of the two algorithms (i.e., choosing the node around which a cluster is being built in Pivot, and integer rounding in LP+R): the random choices of Pivot likely lead to more exploration, hence a better chance to discover weights close to the true ones.

Table 7 Performance with LP+R as a Min-CC oracle, and relative difference (in percentage) between the score achieved with LP+R and the corresponding score achieved with Pivot

Full size table

Table 8 Running times (in secs.) on the larger datasets. Results correspond to average runtime performances in the $R\text{-}wd$ and $PC\text{-}wd$ distribution settings and Pivot as a Min-CC oracle

Full size table

Table 9 Running times (in secs.)

Full size table

6.4 Efficiency

Table 8 shows the runtimes of the tested methods on the larger datasets, averaged over the various runs and over the $R\text{-}wd$ and $PC\text{-}wd$ weight settings. Despite all the CMAB methods are roughly comparable with each other, CTS is however the slowest method, as it involves additional sampling operations with respect to the other ones.

The CMAB methods take seconds on the smaller Last.fm and PrimarySchool datasets, around one hour on ProsperLoans, and up to 3–5 h on the largest datasets, i.e., Wikipedia and DBLP. In general, however, we can conclude that all the CMAB methods are rather efficient. Even the highest runtimes on Wikipedia and DBLP are not worrying, considering that such datasets have around 10 M edges, and, more importantly, that the reported runtimes are cumulative of all the 500 CMAB rounds. In fact, the highest per-round runtime of a CMAB method is always—at worst—comparable to the runtime of Actual-weight, which performs Min-CC clustering just once. In most cases, it is even less, likely because the time of the round-independent steps is amortized over the various rounds.

Further results are shown in Table 9 and include the use of both oracles for the smallest datasets in our collection, according to all weight settings. As it can be noted in the table, the above qualitative remarks on the relative differences between the methods are equally evident.

6.5 Statistical significance

Here we present a further step of analysis to assess the statistical significance of the performance of the CMAB-Min-CC methods CC-CLCB, Global-CLCB, PC+Exp-CLCB, CTS, and EG, when equipped with Pivot as a Min-CC oracle.

To this purpose, we resorted to a Friedman’s test. We designed it by considering all the methods, all the datasets, and all the weight settings in one single test. More specifically, we organized the data into a matrix with 5 columns (treatments) corresponding to the methods, and 250 rows (blocks) corresponding to the number of combinations between runs (10), datasets (10), and weight settings (R-wd and PC-wd available for all 10 datasets and C-wd available for 5 datasets), where each cell measures the average expected normalized cumulative loss (i.e., $f^{(T)}$ (Eq. 12) obtained by a particular method at the last round ($T=500$) on a particular configuration of run, dataset, weight setting. (Note that each run corresponds to a different fixed seed for handling computation randomness.)

Our Friedman’s test results indicate that there are significant differences—$\chi ^2(4) = 158.1$, p-value $< 2.2$E-16—in the average expected normalized cumulative losses in run/dataset/weight-setting blocks based on the methods, i.e., the methods have different effect on average expected normalized cumulative loss obtained on each run/dataset/weight-setting combination.

We also computed the Kendall’s coefficient of concordance (Kendall’s W) for measuring the effect size (degree of difference) for Friedman’s test. From the result above, Kendall’s W is 0.304, which indicates an effect size at the boundary of the “small” and the “moderate” effects based on Cohen’s interpretation guidelines (Tomczak and Tomczak 2014).

Since Friedman’s test is an omnibus test statistic, in order to know which methods are significantly different, we carried out Nemenyi’s all-pairs test as a post-hoc test for pairwise comparisons of methods, where the Bonferroni correction was used to adjust the p-values for multiple hypothesis testing at a 5% cut-off. Results show p-values in the range $(10^{-14}, 10^{-4})$ (i.e., significant differences) for all pairs but CC-CLCB vs. Global-CLCB. It should be noted that the lack of statistical difference between CC-CLCB and Global-CLCB is not surprising: in fact, in Sect. 6.2, we already noticed and explained why CC-CLCB and Global-CLCB achieve the same performance in terms of $f^{(T)}$, in all the datasets and weight settings.

7 Additional experiments

Table 10 Number of clusters for $R\text{-}wd$ weight setting: average over all rounds and runs, and (for the CMAB methods) relative (percentage) difference between the last and first round

Full size table

Table 11 Number of clusters for $PC\text{-}wd$ weight setting: average over all rounds and runs, and (for the CMAB methods) relative (percentage) difference between the last and first round

Full size table

Table 12 Number of clusters for $C\text{-}wd$ weight setting: average over all rounds and runs, and (for the CMAB methods) relative (percentage) difference between the last and first round

Full size table

Clustering size. Tables 10, 11 and 12 show the number of clusters yielded by the tested methods, averaged over all the CMAB rounds and runs of the Min-CC oracle. For the CMAB methods, we also provide the difference (in percentage) between the number of clusters at the final round and the number of clusters at the first round (both averaged over the runs of the Min-CC oracle). By inspecting such results, we notice that in the cases of $R\text{-}wd$ and $PC\text{-}wd$ weight distributions, the use of LP+R oracle generally corresponds to less clusters compared to the Pivot oracle; some exceptions are observed for small-world datasets (e.g., Zebra, HighlandTribes) by most methods, especially with $R\text{-}wd$. The $PC\text{-}wd$ mostly lead to less clusters than $R\text{-}wd$. Conversely, with the $C\text{-}wd$ setting, the LP+R oracle leads consistently to a much larger number of clusters (at least double in many cases) than Pivot.

Moreover, we observe that the non-CMAB methods (i.e., Adamic-Adar and Jaccard) produce a relatively small number of clusters as long as the characteristics of the input dataset are those typical of a small-world network; for instance, in Last.fm and PrimarySchool, the clustering size is about 90% and 97% of the vertex set size, respectively. This is not surprising, as the adopted approaches of (CMAB) correlation clustering are not designed to optimize some criterion function defined on topological properties at meso- and macroscopic level (e.g., modularity), which results in a need for refining the clustering solutions through a cluster aggregation stage.

Performance over the CMAB rounds. Figures 2 and 3 illustrate the performance—in terms of average expected normalized cumulative Min-CC loss $f^{(t)}$ (Eq. 12)—of the tested methods over the various CMAB rounds t. As expected, the CMAB methods mostly exhibit a decreasing trend, with a decrease in loss scores that is more consistent in the first rounds, until it gets progressively vanishing as the rounds go on, meaning convergence in the weight learning process (and, thus, in the clustering quality too). A few exceptions to this strictly monotonically decreasing trend arise (e.g., with some CLCB-based methods in Last.fm $PC\text{-}wd$, ProsperLoans $R\text{-}wd$, ProsperLoans $PC\text{-}wd$, DBLP $PC\text{-}wd$). However, the minimum of the $f^{(t)}$ function in all those exceptional cases is only slightly less than the value of $f^{(t)}$ at convergence (i.e., the difference is less than 0.004). Thus, remembering also that $f^{(t)}$ is an average of all the losses computed up to round t, we can conclude that those non-monotonic trends actually correspond to the normal fluctuations of the loss values in the first CMAB rounds, when there cannot be enough knowledge on the actual edge weights to get stable clustering quality.

Table 13 Coefficient of variation of $f^{(T)}$ (Eq. 12) over all the CMAB rounds and runs of the Min-CC oracle, for $R\text{-}wd$ weight setting

Full size table

Table 14 Coefficient of variation of $f^{(T)}$ (Eq. 12) over all the CMAB rounds and runs of the Min-CC oracle, for $PC\text{-}wd$ weight setting

Full size table

Table 15 Coefficient of variation of $f^{(T)}$ (Eq. 12) over all the CMAB rounds and runs of the Min-CC oracle, for $C\text{-}wd$ weight setting

Full size table

Table 16 Coefficient of variation of $ren^{(T)}$ (Eq. 13) over all the CMAB rounds and runs of the Min-CC oracle, for $R\text{-}wd$ weight setting

Full size table

Table 17 Coefficient of variation of $ren^{(T)}$ (Eq. 13) over all the CMAB rounds and runs of the Min-CC oracle, for $PC\text{-}wd$ weight setting

Full size table

Table 18 Coefficient of variation of $ren^{(T)}$ (Eq. 13) over all the CMAB rounds and runs of the Min-CC oracle, for $C\text{-}wd$ weight setting

Full size table

Stability over the CMAB rounds. Tables 13, 14, 15, 16, 17 and 18 show the coefficient of variation (i.e., ratio between standard deviation and mean) of the scores of the tested methods in terms of $f^{(T)}$ (Eq. 12) and $ren^{(T)}$ (Eq. 13) criteria, respectively. It can be observed that the coefficients of variations of $f^{(T)}$ are typically vary small for all the methods: they mostly range from $[10^{-3}, 10^{-2}]$ in the smaller datasets, and from $[10^{-6}, 10^{-4}]$ in the larger datasets, with only a very few exceptions. In terms of $ren^{(T)}$, the coefficients of variations in the $R\text{-}wd$ and $PC\text{-}wd$ weight settings are higher (especially in the smaller datasets), but they still remain rather small. In the $C\text{-}wd$ setting, they are instead mostly equal or very close to zero. Therefore, as a general conclusion, we can state that the various tested methods exhibit high stability over the CMAB rounds and runs of the Min-CC oracle.

8 Conclusion

We have focused on the novel setting of correlation clustering where edge weights are unknown, and they need be discovered while performing multiple rounds of clustering. We have provided a Combinatorial Multi-Armed Bandit (CMAB) framework for correlation clustering, algorithms for it, analyses of the theoretical guarantees of these algorithms, more practical heuristics, and extensive experiments.

In the future, we plan to investigate the theoretical properties of our heuristics, advanced CMAB settings, and clustering problems other than correlation clustering.

For reproducibility purposes, we make source code and data available at: https://github.com/Ralyhu/CMAB-CC, and http://people.dimes.unical.it/andreatagarelli/CMAB-CC/.

Notes

The [0, 1]-support assumption is frequently required to prove theoretical results in the CMAB setting (Chen et al. 2016), and is indeed required in our proofs too. In practice, should this assumption not hold, any [0, 1]-normalization of the input edge weights can be performed beforehand.
https://coin-or.github.io/pulp/.
https://networkx.org https://igraph.readthedocs.io.
https://www.eneagrid.enea.it/CRESCOportal/.

References

Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. JACM 55(5):231–2327
Article MathSciNet MATH Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256
Article MATH Google Scholar
Bansal N, Blum A, Chawla S (2004) Correlation clustering. Mach Learn 56(1):89–113
Article MathSciNet MATH Google Scholar
Berry DA, Fristedt B (1985) Bandit problems: sequential allocation of experiments. Chapman and Hall, London
Book MATH Google Scholar
Bollapragada S, Garbiras M (2004) Scheduling commercials on broadcast television. Oper Res 52(3):337–345
Article MathSciNet MATH Google Scholar
Bonchi F, García-Soriano D, Liberty E (2014) Correlation clustering: from theory to practice. In Proceedings of the ACM KDD conference, pp 1972
Bonchi F, García-Soriano D, Gullo F (2022) Correlation clustering. Synthesis lectures on data mining and knowledge discovery. Morgan & Claypool Publishers
Bressan M, Cesa-Bianchi N, Paudice A, Vitale F (2019) Correlation clustering with adaptive similarity queries. In: Proceedings of the NIPS conference, pp. 12531–12540
Cesa-Bianchi N, Lugosi G (2012) Combinatorial bandits. JCSS 78(5):1404–1422
MathSciNet MATH Google Scholar
Charikar M, Guruswami V, Wirth A (2005) Clustering with qualitative information. JCSS 71(3):360–383
MathSciNet MATH Google Scholar
Chawla S, Makarychev K, Schramm T, Yaroslavtsev G (2015) Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs. In: Proceedings of the ACM STOC symposium, pp. 219–228
Chen L, Xu J, Lu Z (2018) Contextual combinatorial multi-armed bandits with volatile arms and submodular reward. In: Proceedings of the NIPS conference, pp 3251–3260
Chen X, Huang W, Chen W, Lui JC (2018b) Community exploration: from offline optimization to online learning. In: Proceedings of the NIPS conference, pp 5474–5483
Demaine ED, Emanuel D, Fiat A, Immorlica N (2006) Correlation clustering in general weighted graphs. TCS 361(2–3):172–187
Article MathSciNet MATH Google Scholar
Dutta A, Ufimtsev V, Asaithambi A (2019) Correlation clustering based coalition formation for multi-robot task allocation. In: Proceedings of the SAC symposium, pp 906–913
Galimberti E, Ciaperoni M, Barrat A, Bonchi F, Cattuto C, Gullo F (2020) Span-core decomposition for temporal networks: algorithms and applications. ACM Trans Knowl Discov Data (TKDD) 15(1):1–44
Google Scholar
García-Soriano D, Kutzkov K, Bonchi F, Tsourakakis C (2020) Query-efficient correlation clustering. In Proceedings of the WWW conference, pp 1468–1478
Giallombardo G, Jiang H, Miglionico G (2016) New formulations for the conflict resolution problem in the scheduling of television commercials. Oper Res 64(4):838–848
Article MathSciNet MATH Google Scholar
Grillo H, Alemany M, Caldwell E (2022) Human resource allocation problem in the Industry 4.0: a reference framework. Comput Ind Eng 169:108110
Article Google Scholar
Gupta A (2005) Lecture notes—15-854: approximation algorithms.https://www.cs.cmu.edu/afs/cs/academic/class/15854-f05/www/scribe/lec11.pdf
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. JASA 58(301):13–30
Article MathSciNet MATH Google Scholar
Hübner AH, Düsterhöft T, Ostermeier M (2021) Shelf space dimensioning and product allocation in retail stores. Eur J Oper Res 292(1):155–171
Article MathSciNet MATH Google Scholar
Juárez J, Santos CP, Brizuela CA (2022) A comprehensive review and a taxonomy proposal of team formation problems. ACM CSUR 54(7):15:31-153:33
Google Scholar
Kveton B, Szepesvári C, Wen Z, Ashkan A (2015) Cascading bandits: learning to rank in the cascade model. In Proceedings of the ICML conference, pp 767–776
Kveton B, Wen Z, Ashkan A, Szepesvári C (2015) Combinatorial cascading bandits. In: Proceedings of the NIPS conference, pp 1450–1458
Lagrée P, Vernade C, Cappé O (2016) Multiple-play bandits in the position-based model. In: Proceedings of the NIPS conference, pp 1597–1605
Liu K, Huang H, Zhang W, Hariri A, Fu Y, Hua KA (2021) Multi-armed bandit based feature selection. In: Proceedings of the of SIAM International conference on data mining (SDM), pp 316–323
Mandaglio D, Tagarelli A (2019a) A combinatorial multi-armed bandit based method for dynamic consensus community detection in temporal networks. In: Proceedings of the DS conference, pp 412–427
Mandaglio D, Tagarelli A (2019b) Dynamic consensus community detection and combinatorial multiarmed bandit. In: Proceedings of the ASONAM conference, pp 184–187
Mandaglio D, Tagarelli A, Gullo F (2020) In and out: optimizing overall interaction in probabilistic graphs under clustering constraints. In: Proceedings of the ACM KDD conference, pp 1371–1381
Mandaglio D, Tagarelli A, Gullo F (2021) Correlation clustering with global weight bounds. In: Proceedings of the ECML PKDD conference, pp 499–515
Pandove D, Goel S, Rani R (2018) Correlation clustering methodologies and their fundamental results. Expert Syst 35(1)
Puleo GJ, Milenkovic O (2015) Correlation clustering with constrained cluster sizes and extended weights bounds. SIAM J Optim 25(3):1857–1872
Article MathSciNet MATH Google Scholar
Shamir R, Sharan R, Tsur D (2004) Cluster graph modification problems. Discret Appl Math 144(1–2):173–182
Article MathSciNet MATH Google Scholar
Swamy C (2004) Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of the ACM-SIAM SODA conference, pp 526–527
Talebi MS, Zou Z, Combes R, Proutiere A, Johansson M (2017) Stochastic online shortest path routing: the value of feedback. IEEE Trans Autom Control 63(4):915–930
Article MathSciNet MATH Google Scholar
Tomczak M, Tomczak E (2014) The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends Sport Sci 1(21):19–25
Google Scholar
van Zuylen A, Williamson DP (2007) Deterministic algorithms for rank aggregation and other ranking and clustering problems. In: Proceedings of the WAOA work, pp 260–273
Vaswani S, Lakshmanan LVS (2015) Influence maximization with bandits. arXiv:1503.00024
Wang Q, Chen W (2017) Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications. In: Proceedings of the NIPS conference, pp 1161–1171
Wang S, Chen W (2018) Thompson sampling for combinatorial semi-bandits. In: Proceedings of the ICML conference, pp 5101–5109
Wu Q, Li Z, Wang H, Chen W, Wang H (2019) Factorization bandits for online influence maximization. In: Proceedings of the ACM KDD conference, pp 636–646
Xu H, Liu Y, Lau WC, Li R (2020) Combinatorial multi-armed bandits with concave rewards and fairness constraints. In: Proceedings of the IJCAI conference, pp 2554–2560

Download references

Funding

Open access funding provided by Università della Calabria within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

UniCredit, Rome, Italy
F. Gullo
Department of Computer Engineering, Modeling, Electronics, and Systems Engineering (DIMES), University of Calabria, Rende, CS, Italy
D. Mandaglio & A. Tagarelli

Authors

F. Gullo
View author publications
You can also search for this author in PubMed Google Scholar
D. Mandaglio
View author publications
You can also search for this author in PubMed Google Scholar
A. Tagarelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Tagarelli.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest and no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter discussed in this manuscript.

Additional information

Responsible editor: Albrecht Zimmermann.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A.1: From expected theoretical guarantees to $(\alpha ,\beta )$-approximation oracle

Approximation algorithms for correlation clustering are typically randomized and achieve quality guarantees in expectation (Ailon et al. 2008; Charikar et al. 2005; Demaine et al. 2006; Swamy 2004). Here we show how to use a correlation-clustering algorithm with expected factor-$\varepsilon$ approximation as an $(\alpha ,\beta )$-approximation oracle within CMAB correlation-clustering algorithms (where $\varepsilon$ need not to be necessarily a constant; for instance, in the algorithms in Charikar et al. (2005); Demaine et al. (2006), $\varepsilon$ is ${\mathcal {O}} (\log |V|)$). We focus on the Min-CC context, thus on the notion of Min-CC-$(\alpha ,\beta )$-approximation oracle (Definition 2). A similar reasoning holds for Max-CC as well.

If $\varepsilon > 1$ is the expected approximation factor of a (randomized) Min-CC algorithm, it means that, for every Min-CC instance I, it holds that $\mathop {\mathrm {{\mathbb {E}}}}\limits [f_{min} ({\mathcal {C}})] \le \varepsilon ~f_{min} ({\mathcal {C}}^*_I)$, where ${\mathcal {C}}$ is the clustering output by the algorithm and ${\mathcal {C}}^*_I$ is the optimal clustering for I. To have a Min-CC-$(\alpha ,\beta )$-approximation oracle, we need to convert these guarantees—which hold in expectation, but not necessarily in every run—into guarantees that hold in every run with a certain probability at least $\beta$ (with $\beta$ typically in the order of $1 - (poly(|V|))^{-1}$).

More specifically, it is well-known that an algorithm with $\varepsilon$ guarantees in expectation can be converted into an algorithm with $(1 + \delta )\varepsilon$ guarantees with high probability, for any $\delta > 0$, by exploiting Markov’s inequality (Gupta 2005). In particular, running the algorithm with $\varepsilon$ guarantees in expectation $k=c \cdot \log _{1 + \delta }(|V|)$ times (for any $c > 0$) and keeping the best output among all of those trials, it yields a clustering with $(1 + \delta )\varepsilon$ quality guarantee with probability at least $1-|V|^{-c}$. The aforementioned procedure corresponds to a Min-CC-$(\alpha (\delta ), \beta (c))$-approximation oracle where $\alpha (\delta ) = (1 + \delta )\varepsilon$ and $\beta (c) = 1-|V|^{-c}$. Note that, given that $\delta$ is arbitrary, it means that there exist various Min-CC-$(\alpha (\delta ), \beta (c))$-approximation oracles, each one corresponding to specific values of $\delta$ and c. Also, the worse the required approximation guarantee (i.e., higher $\delta$, thus higher $\alpha (\delta )$), the higher the $\beta (c)$ success probability of the oracle (due to the relationship $k = c~\log _{1 + \delta }|V|$ between $\delta$ and c). By suitably choosing $\delta > 0$ and $c > 0$ (so as to define the number $k = c~\log _{1 + \delta }|V|$ of trials), one can get the desired $\alpha (\delta ), \beta (c)$ values.

1.2 A.2: Algorithms for CMAB-Max-CC

In this section, we present algorithms for CMAB-Max-CC (Problem 4). When basic Chen et al.’s CMAB framework (Chen et al. 2016) is contextualized to a specific maximization problem, a major algorithmic contribution typically consists in adapting the so-called Combinatorial Upper Confidence Bound (CUCB) method to the context at hand, and showing how its theoretical guarantees are maintained/change while carrying out such an adaptation (Chen et al. 2018b; Liu et al. 2021; Mandaglio and Tagarelli 2019a, b; Talebi et al. 2017; Vaswani and Lakshmanan 2015; Wu et al. 2019). Here we follow that bulk of literature: we focus on the customization of CUCB to the Max-CC context, and show that the theoretical guarantees of CUCB carry over rather easily to this customization.

The CC-CUCB algorithm. CUCB (Chen et al. 2016) is an extension of the UCB1 method for MAB (Auer et al. 2002). It keeps, along with the estimate of the means of the base-arm random variables, confidence intervals within which the true means fall with overwhelming probability, and plays superarms based on the upper bound of those intervals.

Our customization of CUCB to Max-CC is termed CC-CUCB and outlined as Algorithm 4. CC-CUCB keeps track of the mean estimates $\hat{\varvec{\mu }} = \lbrace \hat{\varvec{\mu }}^+, \hat{\varvec{\mu }}^- \rbrace$ (Eq. 4), and of the number $T_e^+$ (resp. $T_e^-$) of times a sample from $W_e^+$ (resp. $W_e^-$) random variable has been observed until the current round, for all $e \in E$. At the beginning, $\forall e \in E: T_e^+ = T^-_e = 0$, and $\hat{\varvec{\mu }}$ are initialized, e.g., randomly or based on prior domain knowledge (Line 1). In every round t, the current mean estimates are adjusted with a term $\rho ^\pm _e$ (defined based on Chernoff-Hoeffding bounds (Auer et al. 2002; Chen et al. 2016)), so as to foster, to some extent, the exploration of less often played base arms (Line 3). This leads to the adjusted means $\lbrace {\widetilde{\mu }}^+_e, {\widetilde{\mu }}^-_e\rbrace _{e \in E}$ (Line 4), which are interpreted as positive-type and negative-type edge weights of a correlation-clustering instance, respectively, and are fed as input (along with G) to an oracle ${\textbf{O}}$ that computes a Max-CC solution ${\mathcal {C}}_t$ (Line 5). ${\mathcal {C}}_t$ is used as a feedback to update the mean estimates (Sect. 3, Table 1). Specifically, the weight of each intra-cluster (resp. inter-cluster) edge e is interpreted as a sample of $W_e^+$ (resp. $W_e^-$), and is used to update ${\hat{\mu }}^+_e$, $T_e^+$ (resp. ${\hat{\mu }}^-_e$, $T_e^-$). ${\hat{\mu }}^+_e$ and ${\hat{\mu }}^-_e$ are updated so as to be equal to the average of the samples from $W_e^+$ and $W_e^-$ observed so far, respectively (Lines 6–11).

1.2.1 A.2.1: Regret analysis of CC-CUCB

As correlation clustering is ${{\textbf{N}}}{{\textbf{P}}}$-hard, it is unlikely that CC-CUCB can be equipped with an exact oracle ${\textbf{O}}$ for Max-CC running in polynomial time. Hence, in analyzing the theoretical guarantees of CC-CUCB, we consider the case where ${\textbf{O}}$ is a Max-CC-$(\alpha ,\beta )$-approximation oracle:

Definition 5

(Max-CC-$(\alpha , \beta )$-approximation oracle) Given a Max-CC instance $I \!=\! \langle (V,E),\lbrace \!(w _e^+\!,$ $w _e^-)\! \rbrace _{e \in E}\rangle$, let ${\mathcal {C}}^*_I$ be the optimal solution to I. Given $\alpha , \beta \in (0, 1]$, an algorithm for Max-CC is a max-$(\alpha , \beta )$-approximation oracle if, for every input I, it yields a solution ${\mathcal {C}}$ such that $\Pr [f_{max} ({\mathcal {C}}) \ge \alpha ~f_{max} ({\mathcal {C}}^*_I)] \ge \beta$ (where $f_{max} (\cdot )$ is Max-CC ’s objective function, Eq. (2)).

When used as an oracle ${\textbf{O}}$ within CC-CUCB, the condition in Definition 5 to recognize ${\textbf{O}}$ as a Max-CC-$(\alpha , \beta )$-approximation oracle needs to hold on every Max-CC instance that is given as input to ${\textbf{O}}$ at each round. Hence, the condition has to hold on the mean estimates, not the true ones. Existing algorithms for Max-CC achieving constant-factor guarantees in expectation (Charikar et al. 2005; Swamy 2004) can be employed as Max-CC-$(\alpha ,\beta )$-approximation oracles. We show the details of this in Appendix A.1.

When approximation oracles are used, the quality of a CMAB algorithm is typically measured in terms of the $(\alpha , \beta )$-approximation regret metric, which is defined as the $\alpha\beta$ fraction of the expected reward of playing the best superarm in every round, minus the sum of expected reward of the superarms played by the algorithm in all the rounds (Chen et al. 2016). In the CMAB-Max-CC setting, this metric becomes:

Definition 6

(Max-CC-$(\alpha , \beta )$-approximation regret) Let ${\mathcal {C}}^*_{I}$ be the clustering maximizing ${\bar{a}}_{\varvec{\mu }} (\cdot )$ (Eq. 7) on a CMAB-Max-CC instance I (w.r.t. the true $\varvec{\mu }$ means, Eq. (3)), and let $\lbrace {\mathcal {C}}_t\rbrace _{t=1}^T$ be the clusterings output by an algorithm ${\textbf{A}}$ run on I. For any $\alpha , \beta \in (0,1]$, the Max-CC-$(\alpha , \beta )$-approximation regret of ${\textbf{A}}$ is

$$\begin{aligned} \textstyle Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T) = T~\alpha ~\beta ~{\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}) - \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \sum _{t=1}^T {\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}_t) \right] . \end{aligned}$$

(15)

Chen et al. (2016); Wang and Chen (2017) show that, if the (expected) reward function satisfies certain properties (see Appendix A.2.2), the CUCB method indeed achieves a regret at most in the order of ${\mathcal {O}} (\log T)$. Here we show that this guarantee carries over to our CC-CUCB:

Theorem 4

Given $\alpha , \beta \in (0,1]$, the Max-CC-$(\alpha , \beta )$-approximation regret (Definition 6) of the CC-CUCB algorithm (Alg. 4), when equipped with a Max-CC-$(\alpha , \beta )$-approximation oracle ${\textbf{O}}$ (Definition 5), is upper-bounded by a function that is ${\mathcal {O}} (\log T)$.

Proof

(sketch) The expected reward ${\bar{a}}_{\varvec{\mu }} (\cdot )$ used in CC-CUCB satisfies all the properties that Chen et al. (2016); Wang and Chen (2017) require to have a CUCB-like method achieving a logarithmic (in the number of rounds) regret bound. A detailed proof is shown in Appendix A.2.2. $\square$

The guarantees in Theorem 4 are on the regret computed based on the true $\varvec{\mu }$ means, despite the guarantees of the oracle are on the $\hat{\varvec{\mu }}$ estimates. This is possible thanks to the way how CC-CUCB computes the mean estimates and the properties of the reward function required by the theorem. More details on this are reported in Appendix A.2.2.

1.2.2 A.2.2: Proof of Theorem 4

Definition 7

(Base arms induced by a clustering (CMAB-Max-CC)) Let ${\mathcal {S}}$ be the set of all base arms and ${\mathcal {C}}$ a clustering, we denote with ${\mathcal {S}}_{\mathcal {C}}$ the set of base arms corresponding to the clustering-compliant replica set induced by the clustering ${\mathcal {C}}$, i.e., ${\mathcal {S}}_{\mathcal {C}} = \{\mu _{uv}^+ \mid e = (u,v) \in e^{in} \} \cup \{\mu _{uv}^- \mid e = (u,v) \in e^{out} \}$.

Definition 8

(Bad superarm (CMAB-Max-CC) (Wang and Chen 2017)) Let ${\mathcal {C}}^*_{I}$ be the clustering minimizing the expected reward ${\bar{a}}_{\varvec{\mu }} (\cdot )$ (Eq. 7) on a CMAB-Max-CC instance I (w.r.t. the true $\varvec{\mu }$ means, Eq. 3). Given a Max-CC-$(\alpha , \beta )$-approximation oracle, a superarm (output of the oracle) ${\mathcal {C}}$ is bad if ${\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}) < \alpha ~{\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}^*_I)$.

Definition 9

(Gap (CMAB-Max-CC) (Wang and Chen 2017)) Given a superarm ${\mathcal {C}}$, the gap $\varDelta _{{\mathcal {C}}}$ of ${\mathcal {C}}$ is defined as $\varDelta _{{\mathcal {C}}} = \max \lbrace 0, \alpha ~{\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}^*_I) - {\bar{a}}_{\varvec{\mu }} ({\mathcal {C}})) \rbrace$, where ${\mathcal {C}}^*_I$ is the optimal solution of the Max-CC instance at hand. Moreover, for a base arm $p \in {\mathcal {S}}$, we define $\varDelta _{min}^p = \min _{{\mathcal {C}} \mid p \in {\mathcal {S}}_{{\mathcal {C}}}, \varDelta _{{\mathcal {C}}}>0} \varDelta _{{\mathcal {C}}}$ and $\varDelta _{max}^p = \max _{{\mathcal {C}} \mid p \in {\mathcal {S}}_{{\mathcal {C}}}, \varDelta _{{\mathcal {C}}} > 0} \varDelta _{{\mathcal {C}}}$. As a convention, if for a base arm p there is no superarm ${\mathcal {C}}$ such that $p \in {\mathcal {S}}_{\mathcal {C}}$ and $\varDelta _{{\mathcal {C}}} > 0$ then $\varDelta _{min}^p = + \infty$ and $\varDelta _{max}^p = 0$. Also let $\varDelta _{max} = \max _{p \in {\mathcal {S}}} \varDelta _{max}^p$ and $\varDelta _{min} = \min _{p \in {\mathcal {S}}} \varDelta _{min}^p$.

Property 1

(Monotonicity (CMAB-Max-CC ) (Wang and Chen 2017)) The reward function is monotonically non-decreasing w.r.t. the expectations, i.e., for any two expectation vectors $\varvec{\mu } = (\mu _1, \ldots , \mu _m)$ and $\varvec{\mu }^{\prime } = (\mu '_1, \ldots , \mu '_m)$, and for any clustering ${\mathcal {C}}$, it holds that ${\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}) \le {\bar{a}}_{\varvec{\mu }^\prime }({\mathcal {C}})$, if $\mu _i \le \mu '_i$, for all $i = 1, \ldots , m$.

Property 2

(1-Norm Bounded Smoothness (CMAB-Max-CC ) (Wang and Chen 2017)) There exists a bounded smoothness constant $B \in \mathbb {R}^+$ such that for any two expectation vectors $\varvec{\mu }$ and $\varvec{\mu }^{\prime }$, and for any clustering ${\mathcal {C}}$, it holds that $|{\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}) - {\bar{a}}_{\varvec{\mu }'}({\mathcal {C}})| \le B \sum _{p \in {\mathcal {S}}_{{\mathcal {C}}}}|\mu _p - \mu ^\prime _p|$.

Lemma 1

Property 1 holds for ${\bar{a}}_{\varvec{\mu }} (\cdot )$ (Eq. 8).

Proof

Straightforward because ${\bar{a}}_{\varvec{\mu }} (\cdot )$ is defined as a sum of the expectation values. $\square$

Lemma 2

Property 2 with $B=1$ holds for ${\bar{a}}_{\varvec{\mu }} (\cdot )$ (Eq. 8).

Proof

$$\begin{aligned} |{\bar{a}}_{\varvec{\mu }} ({\mathcal {C}}) - {\bar{a}}_{\varvec{\mu }'}({\mathcal {C}})| = \Bigg | \sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}(u) = {\mathcal {C}}(v) \end{array}} \!\!\! ( \mu ^+_{uv} - \mu ^{\prime ^+}_{uv}) \; + \!\!\! \sum _{\begin{array}{c} (u, v) \in E, \\ {\mathcal {C}}(u) \ne {\mathcal {C}}(v) \end{array}} \!\!\! (\mu _{uv}^- - \mu ^{\prime ^-}_{uv}) \Bigg | \le \sum _{p \in {\mathcal {S}}_{{\mathcal {C}}}}|\mu _p - \mu ^\prime _p|. \end{aligned}$$

$\square$

Theorem 5

Given $\alpha , \beta \in (0,1]$, the Max-CC-$(\alpha , \beta )$-approximation regret (Definition 6) of the CC-CUCB algorithm (Alg. 4), when equipped with a Max-CC-$(\alpha , \beta )$-approximation oracle ${\textbf{O}}$ (Definition 5), is upper-bounded as follows:

$$\begin{aligned} Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T) \ \le \ \frac{2}{3}\pi ^2~|E|~\varDelta _{max} + 4|E| + \sum _{p \in {\mathcal {S}}} \frac{48 |E| \ln T}{\varDelta _{min}^p}. \end{aligned}$$

(16)

Proof

Theorem 4 in the Appendix of Wang and Chen (2017) states that the CUCB method achieves a regret bound of the form in Eq. (16) if the expected reward satisfies Property 1 and Property 2. According to Lemmas 1 and 2, the expected reward ${\bar{a}}_{\varvec{\mu }} (\cdot )$ (Eq. 8) used in CMAB-Max-CC satisfies both those properties. $\square$

1.3 A.3: Regret Analysis of CC-CLCB (Proof of Theorem 1)

Next we provide several definitions which, sometimes with minor modifications, are taken from Wang and Chen (2017).

Property 3

(Monotonicity (CMAB-Min-CC ) (Wang and Chen 2017)) The loss function is monotonically non-decreasing w.r.t. the expectations, i.e., for any two expectation vectors $\varvec{\mu } = (\mu _1, \ldots , \mu _m)$ and $\varvec{\mu }^{\prime } = (\mu '_1, \ldots , \mu '_m)$, and for any clustering ${\mathcal {C}}$, it holds that ${\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}) \le {\bar{d}}_{\varvec{\mu }^\prime }({\mathcal {C}})$, if $\mu _i \le \mu '_i$, for all $i = 1, \ldots , m$.

Property 4

(1-Norm Bounded Smoothness (CMAB-Min-CC ) (Wang and Chen 2017)) There exists a bounded smoothness constant $B \in \mathbb {R}^+$ such that for any two expectation vectors $\varvec{\mu }$ and $\varvec{\mu }^{\prime }$, and for any clustering ${\mathcal {C}}$, it holds that $|{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}) - {\bar{d}}_{\varvec{\mu }'}({\mathcal {C}})| \le B \sum _{p \in {\mathcal {S}}_{{\mathcal {C}}}}|\mu _p - \mu ^\prime _p|$.

Lemma 3

Property 3 holds for ${\bar{d}}_{\varvec{\mu }} (\cdot )$ (Eq. 8).

Proof

Straightforward because ${\bar{d}}_{\varvec{\mu }} (\cdot )$ is defined as a sum of the expectation values. $\square$

Lemma 4

Property 4 with $B=1$ holds for ${\bar{d}}_{\varvec{\mu }} (\cdot )$ (Eq. 8).

Proof

$$\begin{aligned} |{\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}) - {\bar{d}}_{\varvec{\mu }'}({\mathcal {C}})| = \Bigg | \sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}(u) = {\mathcal {C}}(v) \end{array}} \!\!\! ( \mu ^-_{uv} - \mu ^{\prime ^-}_{uv}) \; + \!\!\! \sum _{\begin{array}{c} (u, v) \in E, \\ {\mathcal {C}}(u) \ne {\mathcal {C}}(v) \end{array}} \!\!\! (\mu _{uv}^+ - \mu ^{\prime ^+}_{uv}) \Bigg | \le \sum _{p \in {\mathcal {S}}_{{\mathcal {C}}}}|\mu _p - \mu ^\prime _p|. \end{aligned}$$

$\square$

Definition 10

(Base arms induced by a clustering (CMAB-Min-CC)) Let ${\mathcal {S}}$ be the set of all base arms and ${\mathcal {C}}$ a clustering, we denote with ${\mathcal {S}}_{\mathcal {C}}$ the set of base arms corresponding to the clustering-compliant replica set induced by the clustering ${\mathcal {C}}$, i.e., ${\mathcal {S}}_{\mathcal {C}} = \{\mu _{uv}^- \mid e = (u,v) \in e^{in} \} \cup \{\mu _{uv}^+ \mid e = (u,v) \in e^{out} \}$.

Definition 11

(Bad super arm (CMAB-Min-CC) (Wang and Chen 2017)) Let ${\mathcal {C}}^*_{I}$ be the clustering minimizing the expected loss ${\bar{d}}_{\varvec{\mu }} (\cdot )$ (Eq. 7) on a CMAB-Min-CC instance I (w.r.t. the true $\varvec{\mu }$ means, Eq. 3). Given a Min-CC-$(\alpha , \beta )$-approximation oracle for Min-CC, a super arm (output of the oracle) ${\mathcal {C}}$ is bad if ${\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}) > \frac{1}{\alpha } \cdot {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_I)$.

Definition 12

(Gap (CMAB-Min-CC) (Wang and Chen 2017)) For a CMAB-Min-CC instance, for a given super arm ${\mathcal {C}}$ we define the gap as $\varDelta _{{\mathcal {C}}} = \max \lbrace 0, {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}) - \frac{1}{\alpha } \cdot {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_I) \rbrace$, where ${\mathcal {C}}^*_I$ is the optimal solution of the Min-CC instance at hand. For each base arm $p \in {\mathcal {S}}$ we define $\varDelta _{min}^p = \min _{{\mathcal {C}} \mid p \in {\mathcal {S}}_{{\mathcal {C}}}, \varDelta _{{\mathcal {C}}}>0} \varDelta _{{\mathcal {C}}}$ and $\varDelta _{max}^p = \max _{{\mathcal {C}} \mid p \in {\mathcal {S}}_{{\mathcal {C}}}, \varDelta _{{\mathcal {C}}} > 0} \varDelta _{{\mathcal {C}}}$. As a convention, if for a base arm p there is no super arm ${\mathcal {C}}$ such that $p \in {\mathcal {S}}_{\mathcal {C}}$ and $\varDelta _{{\mathcal {C}}} > 0$ then $\varDelta _{min}^p = + \infty$ and $\varDelta _{max}^p = 0$. Also let $\varDelta _{max} = \max _{p \in {\mathcal {S}}} \varDelta _{max}^p$ and $\varDelta _{min} = \min _{p \in {\mathcal {S}}} \varDelta _{min}^p$.

Similarly to Wang and Chen (2017), we introduce the following definition to assist our analysis.

Definition 13

(Event-filtered regret) Let ${\mathcal {C}}^*_{I}$ be the clustering minimizing the expected loss ${\bar{d}}_{\varvec{\mu }} (\cdot )$ (Eq. 7) on a CMAB-Min-CC instance I (w.r.t. the true $\varvec{\mu }$ means, Eq. 3) and let $\lbrace {\mathcal {C}}_t\rbrace _{t=1}^T$ be the clusterings output by an algorithm ${\textbf{A}}$ run on I. For any series of events $\lbrace {\mathcal {E}} \rbrace _{t \ge 1}$ indexed by round number t, we define $\textstyle Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha }(T, \lbrace {\mathcal {E}} \rbrace _{t \ge 1})$ as the regret filtered by events $\lbrace {\mathcal {E}} \rbrace _{t \ge 1}$, that is, regret is only counted in round t if ${\mathcal {E}}_t$ happens in round t. Formally

$$\begin{aligned} \textstyle Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha }(T, \lbrace {\mathcal {E}} \rbrace _{t \ge 1}) = \mathop {\mathrm {{\mathbb {E}}}}\limits \Big [ \sum _{t=1}^T \mathbbm {1} \lbrace {\mathcal {E}}_t \rbrace \Big ( {\bar{d}}_{\varvec{\mu }} ({{\mathcal {C}}_t}) - \frac{1}{\alpha } \cdot {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{I}) \Big ) \Big ]. \end{aligned}$$

Theorem 1Given $\alpha , \beta \in (0,1]$, the Min-CC-$(\alpha , \beta )$-approximation regret (Definition 3) of the CC-CLCB algorithm (Alg. 1), when equipped with a Min-CC-$(\alpha , \beta )$-approximation oracle ${\textbf{O}}$ (Definition 2), is upper-bounded as follows:

$$\begin{aligned} Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T) \ \le \ \frac{2}{3}\pi ^2~|E|~\varDelta _{max} + 4|E| + \sum _{p \in {\mathcal {S}}} \frac{48 |E| \ln T}{\varDelta _{min}^p}. \end{aligned}$$

(17)

Proof

The proof is very similar to the proof of Theorem 4 in Wang and Chen (2017) (Section B.2 in the Supplementary Materials), thus, here we avoid replicating all the steps. Although the adopted (event-filtered) regret definition (Definition 3) in our case is different from the one considered in Wang and Chen (2017), the derived regret bound is the same. Our proof relies on the following main result: the ${\bar{d}}_{\varvec{\mu }} (\cdot )$ function (Eq. 8) satisfies the properties of monotonicity (Lemma 3) and 1-norm bounded smoothness (Lemma 4). Other technical differences between our proof and the one in Wang and Chen (2017) correspond to (1) changing, in some intermediate steps, inequality signs since rewards are replaced with losses, and (2) using the argument that, for the considered regret definition, it holds that $\varDelta _{max} \le {\mathcal {M}} - \frac{1}{\alpha } \cdot {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_I)$. $\square$

1.4 A.4: Details on the symmetric-distribution setting (Sect. 4.3)

Lemma 5

(Hoeffding’s inequality (Hoeffding 1963)) Let $X_1, \cdots , X_n$ be independent random variables, where each $X_i$ is bounded in $[a_i, b_i]$. Define ${\hat{\mu }} = (X_1 + \cdots + X_n)/n$ the empirical mean of these variables and let $\mu = \mathop {\mathrm {{\mathbb {E}}}}\limits [{\hat{\mu }}]$ the expected value of these mean, it holds that:

$$\begin{aligned} Pr\Big [ |{\hat{\mu }} - \mu | \ge \delta \Big ] \le 2 \exp \Bigg (- \frac{2 n^2 \delta ^2}{\sum _{i=1}^n(b_i - a_i)^2} \Bigg ). \end{aligned}$$

Theorem 3

Given $\alpha , \beta \in (0,1]$, the Min-CC-$(\alpha , \beta )$-approximation regret (Definition 3) of a full-exploitation strategy ${\textbf{A}}$ run on a CMAB-Min-CC instance where all edge-weight distributions are symmetric, and equipped with a Min-CC-$(\alpha , \beta )$-approximation oracle (Definition 2), is upper-bounded as follows:

$$\begin{aligned} Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T) \le \left( 1 + \frac{2 (\frac{1}{\alpha } + 1)^2 |E|^3}{\varDelta _{min}^2} \right) \varDelta _{max}. \end{aligned}$$

(18)

Proof

Let ${\mathcal {F}_t}$ be the event that the Min-CC-$(\alpha , \beta )$-approximation oracle fails to produce an α-approximate answer with respect to its input.

Following the reasoning of Chen et al. (2018b) (Appendix D.5), we decompose the filtered regret (Definition 13) with null event as follows:

$$\begin{aligned} Reg(\lbrace \rbrace ) = Reg({\mathcal {F}}_t) + Reg(\lnot {\mathcal {F}}_t) \end{aligned}$$

It can be easily shown Chen et al. (2018b) that $Reg(\mathcal{F}_t) \leq (1 - \beta) T \Delta_{max} $. Let $\mathcal{C}^*_{\mathbf{\mu}} $ (resp. $\mathcal{C}^*_{\mathbf{\hat{\mu}_t}}$) be the clustering minimizing the expected loss ${\bar{d}}_{\varvec{\mu }} (\cdot )$ (Eq. 7) w.r.t. the true $ {\mathbf{\mu }}$ means (resp. w.r.t. the estimates at round t, denoted as $\boldsymbol{\hat{\mu}_t}$). In order to bound $Reg(\lnot {\mathcal {F}}_t)$, we first bound $\varDelta _{{\mathcal {C}}_t}$ when $\lnot {\mathcal {F}}_t$ holds:

$$\begin{aligned}&\varDelta _{{\mathcal {C}}_t} = \max \Big \lbrace 0, {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) - \frac{1}{\alpha } {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_ {\mu }) \Big \rbrace \\&= \max \Big \lbrace 0, {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) + {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}_t) - {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}_t) - \frac{1}{\alpha } {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_ {\mu }) \Big \rbrace \\&\le \max \Big \lbrace 0, {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) + \frac{1}{\alpha } {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}^{*}_{\mu } ) - {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}_t) - \frac{1}{\alpha } {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^{*}_{\mu }) \Big \rbrace \;\\&( \lnot {\mathcal {F}}_t\Rightarrow {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}_t) \le \frac{1}{\alpha } {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}^*_{{\hat{\mu }}}) \le \frac{1}{\alpha } {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}^{*}_{\mu }) ) \\&\le \max \Big \lbrace 0, | \frac{1}{\alpha } {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}^{*}_{\mu} ) - \frac{1}{\alpha } {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_ {\mu }) | + | {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) - {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}_t) | \Big \rbrace \\&\le \frac{1}{\alpha } | {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}^*_\mu ) - {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^{*}_ {\mu }) | + | {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}_t) - {\bar{d}}_{\hat{{\varvec{\mu }_t}}} ({\mathcal {C}}_t) | \;\;\;\;\;\;\;\;\;\;\; (1-\text {Norm bounded smoothness of } {\bar{d}}_{\varvec{\mu }} (\cdot ))\\&\le \frac{1}{\alpha } \Bigg ( \!\!\!\sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}^{*}_{\mu }(u) = {\mathcal {C}}^{*}_{\mu }(v) \end{array}}\! \! \! \! \! \!\!\! |{\hat{\mu }}_{uv,t}^{-} - \mu _{uv}^-| + \!\!\! \!\!\!\!\!\!\! \sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}^{*}_{\mu }(u) \ne {\mathcal {C}}^*_{\mu }(v) \end{array}}\! \! \! \! \! \!\!\! |{\hat{\mu }}_{uv,t}^{+} - \mu _{uv}^+| \Bigg ) + \Bigg ( \!\!\!\!\sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}_t(u) = {\mathcal {C}}_t(v) \end{array}}\! \! \! \! \!\!\!\! |{\hat{\mu }}_{uv,t}^{-} - \mu _{uv}^-| + \! \! \! \! \!\!\!\!\! \sum _{\begin{array}{c} (u, v) \in E,\\ {\mathcal {C}}_t(u) \ne {\mathcal {C}}_t(v) \end{array}}\! \! \! \! \!\!\!\! |{\hat{\mu }}_{uv,t}^{+} - \mu _{uv}^{+}| \Bigg ). \end{aligned}$$

If $|{\hat{\mu }}_{uv,t}^\pm - \mu _{uv}^\pm | < \varDelta _{min}/(\frac{1}{\alpha } + 1)|E|$ for each $(u, v) \in E$ (event denoted with $B_t$), then $\varDelta _{{\mathcal {C}}_t} < \varDelta _{min}$, and, thus, $\varDelta _{{\mathcal {C}}_t} = 0$ by definition of $\varDelta _{min}$. This means that, when $\lnot {\mathcal {F}}_t$ holds, $\varDelta _{{\mathcal {C}}_t}$ can be non-zero only when there is at least one pair of vertices for which $|{\hat{\mu }}_{uv,t}^\pm - \mu _{uv}^\pm | \ge \varDelta _{min}/(\frac{1}{\alpha } + 1)|E|$, i.e., $\lnot B_t$ holds. In other words, $Reg(\lnot {\mathcal {F}}_t) = Reg(\lnot {\mathcal {F}}_t \wedge B_t) + Reg(\lnot {\mathcal {F}}_t \wedge \lnot B_t)$, with $Reg(\lnot {\mathcal {F}}_t \wedge B_t) = 0$. Next we bound $Reg(\lnot {\mathcal {F}}_t \wedge \lnot B_t)$:

$$\begin{aligned}&Reg(\lnot {\mathcal {F}}_t \wedge \lnot B_t) \le \sum _{t=1}^T Pr\Big [ \lnot {\mathcal {F}}_t \wedge \lnot B_t \Big ] \varDelta _{max} \\&\le \sum _{t=1}^T Pr\Big [ \lnot B_t \Big ] \varDelta _{max} \;\;\;\;\;\;\;\;\;\;\;\; \left( \text {since } Pr\Big [ \lnot {\mathcal {F}}_t \wedge \lnot B_t \Big ] \le Pr\Big [ \lnot B_t \Big ] \right) \\&\le \sum _{t=1}^T \varDelta _{max} \sum _{(u,v) \in E} \Big ( Pr\Big [ |{\hat{\mu }}_{uv,t}^+ - \mu _{uv}^- | \ge \frac{\varDelta _{min}}{(\frac{1}{\alpha } + 1)|E|} \Big ] + Pr\Big [ |{\hat{\mu }}_{uv,t}^- - \mu _{uv}^- | \ge \frac{\varDelta _{min}}{(\frac{1}{\alpha } + 1)|E|} \Big ]\Big ) \end{aligned}$$

$$\begin{aligned}&\le \sum _{t=1}^T \varDelta _{max} \sum _{(u,v) \in E} \Big ( 2e^{-2T^+_{uv,t-1}\frac{\varDelta _{min}^2}{(\frac{1}{\alpha } + 1)^2 |E|^2}} + 2e^{-2T^-_{uv,t}\frac{\varDelta _{min}^2}{(\frac{1}{\alpha } + 1)^2 |E|^2}}\Big ) \;\;\;\; (\text {Hoeffding's inequality}) \\&= \sum _{t=1}^T \varDelta _{max} \sum _{(u,v) \in E} \\&\Big ( 2e^{-2 (t-1)\frac{\varDelta _{min}^2}{(\frac{1}{\alpha } + 1)^2 |E|^2}} + 2e^{-2 (t-1) \frac{\varDelta _{min}^2}{(\frac{1}{\alpha } + 1)^2 |E|^2}}\Big ) \; (\text {full-information setting}: T^\pm _{uv,t-1} = t-1 ) \\&\le \varDelta _{max} + \sum _{(u,v) \in E} \varDelta _{max} \sum _{t=2}^T \Big ( 2e^{-2 (t-1)\frac{\varDelta _{min}^2}{(\frac{1}{\alpha } + 1)^2 |E|^2}} + 2e^{-2 (t-1) \frac{\varDelta _{min}^2}{(\frac{1}{\alpha } + 1)^2 |E|^2}}\Big )\\&= \varDelta _{max} + 4 |E| \varDelta _{max} \sum _{t=2}^T e^{-2 (t-1) \frac{\varDelta _{min}^2}{(\frac{1}{\alpha } + 1)^2 |E|^2}} \ \ \le \ \ \varDelta _{max} + 4 |E| \varDelta _{max} \int _{t=0}^\infty e^{-2t\frac{\varDelta _{min}^2}{(\frac{1}{\alpha } + 1)^2 |E|^2}} dt \\&= \varDelta _{max} + \frac{2 (\frac{1}{\alpha } + 1)^2 |E|^3}{\varDelta _{min}^2} \varDelta _{max} \ \ = \ \ \Bigg ( 1 + \frac{2 (\frac{1}{\alpha } + 1)^2 |E|^3}{\varDelta _{min}^2} \Bigg ) \varDelta _{max}. \end{aligned}$$

Moreover, by the definition of filtered regret:

$$\begin{aligned}&\textstyle Reg^{{\textbf{A}}}_{\varvec{\mu }, \alpha , \beta }(T) = Reg(\lbrace \rbrace ) - T \cdot (1 - \beta ) \cdot ({\mathcal {M}} - \frac{1}{\alpha } \cdot {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{\varvec{\mu }})) \\&= Reg({\mathcal {F}}_t) + Reg(\lnot {\mathcal {F}}_t \wedge B_t) + Reg(\lnot {\mathcal {F}}_t \wedge \lnot B_t) - T \cdot (1 - \beta ) \cdot ({\mathcal {M}} - \frac{1}{\alpha } \cdot {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{\varvec{\mu }})) )\\&\le (1 - \beta ) T \varDelta _{max} + \Bigg ( 1 + \frac{2 (\frac{1}{\alpha } + 1)^2 |E|^3}{\varDelta _{min}^2} \Bigg ) \varDelta _{max} - T \cdot (1 - \beta ) \cdot ({\mathcal {M}} - \frac{1}{\alpha } \cdot {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{\varvec{\mu }})) )\\&\le \Bigg ( 1 + \frac{2 (\frac{1}{\alpha } + 1)^2 |E|^3}{\varDelta _{min}^2} \Bigg ) \varDelta _{max}, \end{aligned}$$

where, in the last step, we used the fact that $\varDelta _{max} \le {\mathcal {M}} - \frac{1}{\alpha } \cdot {\bar{d}}_{\varvec{\mu }} ({\mathcal {C}}^*_{\varvec{\mu }})$. $ \square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gullo, F., Mandaglio, D. & Tagarelli, A. A combinatorial multi-armed bandit approach to correlation clustering. Data Min Knowl Disc 37, 1630–1691 (2023). https://doi.org/10.1007/s10618-023-00937-5

Download citation

Received: 10 December 2021
Accepted: 06 April 2023
Published: 29 June 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10618-023-00937-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A combinatorial multi-armed bandit approach to correlation clustering

Abstract

Similar content being viewed by others

Impartial Selection with Additive Approximation Guarantees

Fair Correlation Clustering with Global and Local Guarantees

Impartial Selection with Additive Approximation Guarantees

1 Introduction

2 Background and related work

2.1 Correlation clustering

Problem 1

Problem 2

2.2 Combinatorial multi-armed bandit

3 Problem definition

Definition 1

Problem 3

Problem 4

4 Algorithms for CMAB-Min-CC

4.1 General Min-CC oracles

Definition 2

Definition 3

Theorem 1

Proof

4.2 Min-CC oracles requiring the probability constraint

Theorem 2

Proof

4.3 Special edge-weight distributions

Definition 4

Theorem 3

Proof

4.4 Visualization example

5 Experimental methodology

6 Results

6.1 Performance of the CMAB methods

6.2 Performance of the CLCB-based CMAB methods

6.3 Varying the Min-CC oracle

6.4 Efficiency

6.5 Statistical significance

7 Additional experiments

8 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 A.1: From expected theoretical guarantees to \((\alpha ,\beta )\)-approximation oracle

1.2 A.2: Algorithms for CMAB-Max-CC

1.2.1 A.2.1: Regret analysis of CC-CUCB

Definition 5

Definition 6

Theorem 4

Proof

1.2.2 A.2.2: Proof of Theorem 4

Definition 7

Definition 8

Definition 9

Property 1

Property 2

Lemma 1

Proof

Lemma 2

Proof

Theorem 5

Proof

1.3 A.3: Regret Analysis of CC-CLCB (Proof of Theorem 1)

Property 3

Property 4

Lemma 3

Proof

Lemma 4

Proof

Definition 10

Definition 11

Definition 12