Globally optimal and scalable $N$-way matching of astronomy catalogs

Building on previous Bayesian approaches, we introduce a novel formulation of probabilistic cross-identification, where detections are directly associated to (hypothesized) astronomical objects in a globally optimal way. We show that this new method scales better for processing multiple catalogs than enumerating all possible candidates, especially in the limit of crowded fields, which is the most challenging observational regime for new-generation astronomy experiments such as the Rubin Observatory Legacy Survey of Space and Time (LSST). Here we study simulated catalogs where the ground-truth is known and report on the statistical and computational performance of the method. The paper is accompanied by a public software tool to perform globally optimal catalog matching based on directional data.

Several approaches have been proposed over the years to combine observations across telescopes and epochs. In the era of time-domain astronomy where dozens or hundreds of observation epochs are available, these problems are more important than ever. In particular, combining catalogs has been a central issue where detections in separate exposures are matched by identifying the ones that correspond to the same celestial object. Several tools were developed to provide solution to the cross-matching problem, such as TOPCAT (Taylor 2005) and CDS XMatch (Pineau et al. 2011;Boch et al. 2012). However, they do not consider the statistical aspect of the problem.
This cross-identification problem was successfully addressed using Bayesian hypothesis testing by Budavári & Szalay (2008), whose methodology was implemented in the latest version of the SkyQuery service (Budavári et al. 2013) which is now part of the SciServer Science Platform (Taghizadeh-Popp et al. 2020). The Bayesian formalism and the combinatorial nature of the problem is discussed in a review by Budavári & Loredo (2015). The first solution came from Budavári & Basu (2016) who formulated the matching problem as a search for globally optimal associations using combinatorial opti-mization, where the marginal likelihood of the entire matched catalog is maximized, and used the Hungarian algorithm (Munkres 1957) to solve it. After that proof of concept was developed for two catalogs, Shi et al. (2019) extended the algorithm to handle multiple catalogs using Integer Linear Programming, or ILP for short. For simplicity, we will refer this method as CanILP, as it enumerates all possible candidate associations and uses ILP to choose the best valid subset. As we will discuss later, the method suggested in Shi et al. (2019) does not scale very well with large number of catalogs. This scaling problem is also observed in Pineau et al. (2017) as the authors try to estimate the probability, for all combinations of sources, that a tuple of sources from different catalogs correspond to the same object. The exhaustive search results in an exponential growth in the number of possible tuples as the number of catalogs increases. They also note that this approach is not feasible in practice for more than 9 catalogs.
In this paper, we improve on the previous studies by introducing a novel formulation, hereafter referred to as DirILP, where we use ILP to directly assign detections to hypothesized objects. Section 2 describes the new approach, and Section 3 illustrates how the new method scales better with the number of input catalogs. Section 4 discusses a public software tool to solve the catalog matching problem. Section 5 concludes the study. To quantify the associations among independent detections, a relatively recent approach was developed that uses a hierarchical Bayesian formalism. Suppose there are C catalogs, indexed by c ∈ {1, . . . , C}, with each catalog capturing N c sources respectively. Let D ic denote the measurements for source i in catalog c, hereafter denoted by tuple (i, c). Associated with any (i, c) measurement is a likelihood function ic (ω) = p(D ic |ω), for the unknown true direction ω, which captures the astrometric uncertainty. While other object properties could also be considered in general, such as their brightness or colors, here we focus on directional matching only.
Here we adopt the definition for matching used in previous papers (e.g., Budavári & Szalay 2008;Budavári & Loredo 2015;Budavári & Basu 2016;Shi et al. 2019) where one tests whether two or more detections correspond to the same physical object. That said, it is possible to define the "match" hypothesis such that detections are not of the "same" object but instead just "part of" another, e.g., an optical galaxy in an X-ray cluster, or a star in a blend of two. In theory such scenarios can be accommodated (by changing the marginal likelihood calculations, see below), but the computation requirements might increase, and the interpretation of the matched catalog would be more difficult. 1 Our association approach described below is flexible and will yield matches based on the underlying model.
Associations are created by grouping all sources in all catalogs such that each belongs to only one group. Mathematically, a partition P is created of the data set D, the union of all sources in all catalogs, where each subset corresponds to the same celestial object. The number of subsets in the partition will constitute the number of hypothesized objects N obj , which is an unknown but bounded quantity that is less or equal to the number of all sources. Typically it is much less as the equality would mean that every source is in fact a separate object altogether. We can index every object by an integer o ∈ {1, . . . , N obj }. Let S o be the set of sources (i, c) associated with object o and C o be the list of catalogs containing sources associated with object o. Following Budavári & Loredo (2015), the likelihood of a partition P of all sources, a collection of the S o subsets, will be a product of conditionally independent terms, 1 Further complications emerge when different blends are to be matched, in which case one considers whether the detections "share" components, e.g., a common star in two different blends.
where the marginal likelihood M o for the association corresponding to object o is Here ρ Co (ω) is the prior probability density function of the object direction producing sources within (the footprint of) every catalog in the set C o . This notation enables the treatment of catalogs with different sky coverage, whose angular selection function would enter the prior on the latent direction ω. Technically, the ρ Co (ω) function is not simply the angular selection function of the intersection area of the catalogs, because it also incorporates the astrometric uncertainty: there is a nonzero probability of observing a source within a given footprint even if its true direction is outside the field of view, but this effect is negligible if the field of view is large in comparison to its boundary blurred by the astrometric uncertainty, which is the case for typical observations and surveys, but would not apply, for example, if a catalog were an aggregation of disjoint sky patches comparable in size to the point-spread function. Alternatively, one can define the marginal likelihood for a non-association hypothesis, which assumes that every source in S o is a separate object on its own where ρ c (ω) is the prior probability density function of direction for sources in catalog c. This hypothesis serves as a natural comparison, with which it is useful to introduce the Bayes factor as the ratio of the marginal likelihoods of these two cases, The B o takes values larger than 1 when the association of the sources in S o is more likely than the alternative, B o < 1 favors separate objects. We note that M NA o is simply a product over all sources in all catalogs independent of partition P and, hence, is constant. Consequently, the maximization of the likelihood L(P ) is equivalent to optimizing B o . As customary, we work with a summary of the raw imaging data D ic for each detection (i, c), the measured direction x ic , which is essentially the intensity-weighted pixel direction. In order to calculate the Bayes factors B o with these measurements, we specify a distribution for the member likelihood function ic (ω), i.e., the astrometric uncertainty. To describe directional uncertainty in the observations, the spherical analog of the Gaussian is often assumed, the Fisher (1953) where the x ic observed direction and true ω direction are both 3D unit vectors. The latter is the mode of the distribution, and κ ic is a concentration parameter. When κ ic 1, the Fisher distribution approximates a Gaussian distribution with standard deviation (in radians) for each coordinate σ ic with κ ic = 1/σ 2 ic and the (all-sky) Bayes factor can be calculated analytically as shown in Budavári & Szalay (2008) as follows, where (i, c) and (i , c ) are all sources in subset S o and ψ ic,i c is the (small) angle between the directions for sources (i, c) and (i , c ).
The next section discusses how to find the globally optimal associations, i.e., the partition P that maximizes the L(P ) likelihood function by optimizing B o using integer linear programming.

CanILP: Optimal Selection of Candidates
First we summarize the previous approach introduced by Shi et Given a data set D of all (i, c) pairs for all catalog c and source i in catalog c, we introduce a binary variable x T taking values in {0, 1} for each nonempty subset T ⊆ D, with the interpretation that x T = 1 indicates that the subset T is included in the partition. To ensure the validity of the partition, we require for every element (i, c) ∈ D. This forces every source (i, c) to be included in exactly one subset of the partition. However, note that for an orphan o, B o = 1. Hence, these coefficients do not contribute to the objective function and we could simply remove those subsets T that have |T | = 1. From this, we can modify the above constraint to T (i,c) x T ≤ 1 for every element (i, c) ∈ D. In the final solution, if a source (i, c) does not appear in any subset T, we treat it as an orphan. For example, in Figure 1, Source (2,1) is not included in any subset T , so, in the solution, we will include it as an orphan. By defining w T = − ln B T for every subset T , the final integer linear programming function can be written as follows, Note that the formulation above can be used to solve the matching problem given any number of catalogs C but requires an enumeration of all possible candidate associations, which can be numerous. This requires calculations of combinatorially many w T and the introduction of the corresponding x T binary variables, which quickly becomes prohibitively expensive for many catalogs. Shi et al. (2019) demonstrated their approach on three catalogs with identical astrometric uncertainty. Here we extend their work and describe a novel approach, which can be efficiently used with many catalogs.

DirILP: Optimal Direct Associations
The key idea is to introduce variables that directly assign the detections to hypothesized objects instead of simply switching on and off some previously enumerated candidate associations in the final matched catalog. The objective B o is the same but expressing it with the new variables is significantly more complicated than before. In the process one needs to introduce several additional (sets of) auxiliary variables to linearize the problem. In case of homoscedasticity when all astrometric uncertainty are the same for all detections, the linearization is relatively straightforward, but further modeling tricks are required in the general setting. In the following sections these two cases are introduced along with the variables needed to model and solve the global association problem. Further details are provided in the appendix about the derivation of the general heteroscedastic formalism.

Homoscedasticity
For simplicity, we first discuss the special case where the astrometric uncertainty of each detection is the same, i.e., σ ic = σ for each source (i, c). Given a data set D, let N be the total number of detections in all catalogs considered. The number of astronomical objects these represent will be at most N , corresponding to the hypothesis that every detection comes from a different object. Our goal is to find a mapping that matches each Figure 1. An illustration of CanILP. As can be seen on the left side, we assume there are 2 detections from Catalog 1 (Sources (1,1) and (2,1)), 1 detection from Catalog 2 (Source (1,2)) and 2 detections from Catalog 3 (Sources (1,3) and (2,3)). In CanILP, we list all candidates for possible associations across independent detections, which are shown on the right side. These are the xT in the formulation. We then find the combinations of subsets that maximize the overall likelihood. Here source to one (and only one) object. This association between a source and an object means that the source is an observation of that object in the sky. Naturally, multiple sources are expected to be assigned to the same object, which represents the hypothesis that all of these sources are observations of that same object. To capture the matching between a source (i, c) and an object o, we introduce binary variables {x o ic }, where a given x o ic = 1 if the (i, c) detection is associated with object o, and 0 otherwise. Figure 2 illustrates how this approach works. For example, the arrow from Source (2,1) to Object 1 representing an association means that x 1 21 = 1. Similarly, x 3 11 = 0 means no association, hence there is no arrow between the corresponding entries. A partition P can now be represented as a set {S o : o = 1, . . . , N }, where S o is the subset of sources assigned to o, i.e., If, for a given index o, x o ic = 0 for all (i, c) sources, then S o = ∅ is empty, which means object o is not needed for that particular partition.
Recall that the goal is to maximize the product of Bayes factors B o (or to minimize − ln B o ) corresponding to these associations. Given an association S o , assuming κ ic = κ for all source (i, c), eq. (6) gives us Catalog 1 Source (1,1) Source (2,1)

Catalog 2
Source (1,2) Catalog 3 Source (1,3) Source (2,3) Object 1 Object 2 Object 3 Object 4 Object 5 Figure 2. An illustration of DirILP. As in Figure 1, assume there are 2 detections from Catalog 1 (Sources (1,1) and (2,1)), 1 detection from Catalog 2 (Source (1,2)) and 2 detections from Catalog 3 (Sources (1,3) and (2,3)). In this case, the output of DirILP indicates that Sources (1,1) and (2,3) belong to the same object, that Sources (1,2) and (1,3) belong to the same object, and that Source (2,1) is an orphan. Notice that it is okay for an object to not have any source associated with it. The solution given by DirILP is {{(1, 1), (2, 3)}, {(1, 2), (1, 3)}, {(2, 1)}}, which is the same as the one given by CanILP in Figure 1. Hence, We want to find the partition P that minimizes − ln B o . Notice that as of now, there are still several non-linear terms in − ln B o so it is not yet a linear objective. To make use of ILP method, we will first need to rewrite this as a linear function. To do that, we introduce the following variables, defined for each index k ∈ {0, . . . , C}, with C representing the total number of catalogs: This variable captures the number of sources getting matched to object o, or the cardinality of the subset S o . When z o k = 1, there are k hypothesized observations of object o in the data. In addition, notice that at most 1 of the z o k , k = 0, . . . , C can be 1. We also introduce This is an indicator variable that checks whether the sources (i, c) and (i , c ) belong to the same object o.
In particular, y o ic,i c = 1 indicates the hypothesis that sources (i, c) and (i , c ) are observations of object o . We also have as desired, where the summation goes over all (i, c) and (i , c ) in S o . On the other hand, when z o 0 = 1, no detection is assigned to object o, so this term should contribute nothing to the objective function. Next, we introduce Finally, we will linearize the term ln|S o | by breaking the natural log function into finitely many affine linear pieces. We first introduce constants a 1 , a 2 , · · · , a C , where a 1 = 0 and a p = ln(p)−ln(p−1), for p = 2, · · · , C. Then for each object o, we define binary variables C p=1 a p w o p = a 1 + a 2 + a 3 = (0) + (ln 2 − ln 1) + (ln 3 − ln 2) = ln 3, which is exactly ln|S o |. Our objective function now becomes which is linear in the variables p o , w o p and t o . As can be seen in the definitions of these variables, there are certain relationships that still need to be modeled using linear constraints. The full ILP formulation is given in Appendix A with detailed explanations for how the constraints model the relationships between the variables x o ic , y o ic,i c , z o k , p o , w o p , and t o .

Heteroscedasticity
We can also remove the assumption that every source has the same measure of uncertainty κ ic . From eq. (6), we have, where all the summations run over all (i, c) and (i , c ) in S o . We use x o ic , z o k , and y o ic,i c as defined in the special case of Section 2.2. We also introduce new variables to convert eq. (22) into a linear function.
We first linearize the term ln κ ic using the same trick as when we linearized ln|S o | in Section 2.2. We introduce constants b min ≡ b 1 , b 2 , b 3 , · · · , where b min = ln min ic∈D κ ic and Now, if we set an error threshold , then the constants b p are defined as Then for each object o, we define binary variables χ o 1 ≥ χ o 2 ≥ · · · ≥ χ o P and impose the constraint Using the new variables, we have To illustrate how the χ o p variables work, let us assume that after looking at the data, we determine that b min = 29 and b max = 33. If we let = 0.5, then the value of constants b p are {29, 29.5, · · · , 32.5, 33}. Now suppose there are 3 sources that are matched to an object o with associated κ ic values of 5 × 10 12 , 8 × 10 12 , and 10 13 . Then the true value of ln ic∈So κ ic is ln(2.3×10 13 ), which evaluates to 30.77. With the defined variables, the solution given by ILP is (29) + exp(29.5) − exp(29)+exp(30)−exp(29.5)+· · ·+exp(31)−exp(30.5) = exp(31) > 2.3 × 10 13 , which satisfies the constraint Notice that setting the variables χ o 6 , · · · , χ o 9 = 1 will also satisfy the constraint. However, since we will model our problem with a minimization objective, the optimal solution will force to be as small as possible. Finally, notice that in this case the value of , which is used to approximate ln ic∈So κ ic , is 31, which is close to the true value of 30.77.
Next, we will linearize the last term in eq. (22) by first introducing the constant and Then by rounding these two values to the nearest 100, we can introduce grid points 0 ≡ c 0 , c 1 , c 2 , · · · , c Q , where c 1 is the nearest 100 of c min , c Q is the nearest 100 of c max , and for all i > 2, c i = c 1 + 100(i − 1). We then introduce where k ranges in {0, 1, . . . , Q} and the operator [·] 100 " is defined as rounding to the nearest 100. This variable attempts to approximate ic∈So κ ic , which appears in the denominator of the last term of eq. (22). The variables p o and t o are also very similar to the definitions in Section 2.2; however, we need to slightly modify them as follows: if u o k = 1 for some k ∈ {1, 2, · · · , Q}, and t o = 0 otherwise.
The reasoning for defining t o this way is that if u o 0 = 1, This happens only when x o ic = 0 for all (i, c), i.e. no sources are matched to object o. Hence, t o should not contribute to the objective function, hence the value of 0. On the other hand, if u o k = 1 for some k > 0, by definition of u k , c k is the best approximation to ic∈So κ ic . Thus, (32) holds.
In addition, we modify p o defined in eq. (19) as follows, (34) This variable serves a similar function as in the homoscedastic case, which is to capture the first term in eq. (22).
The objective function can now be written as which is linear in all the variables involved.
There are certain relationships that still need to be modeled using linear constraints because ILP formulations only take linear constraints. The full ILP formulation is given in Appendix B, with detailed explanations for how the constraints model the relationships between the variables x o ic , y o ic,i c , z o k , χ o p , u o k , p o , and t o .

MOCK OBJECTS AND SIMULATIONS
We consider the idealized case where all the catalogs capture the same astronomical properties of objects in the sky, i.e., they detect the same set of objects. As we generate 100 objects and assume there are C distinct catalogs, we expect to see 100 × C sources and 100 C−way association sets. We will now show the catalog matching results using both of our approaches. The ILP programs in both approaches are solved using Gurobi, an optimization solver Gurobi Optimization (2020).
3.1. Homoscedasticity: identical κ ic = 1/σ 2 Observe that for the CanILP formulation in Section 2.1, we need to list all the possible valid subsets T ⊆ D. We could do this by sequentially adding catalogs one by one and considering sources from the new catalog. However, this evaluates to 101 C − 1 subsets, which is exponential in terms of the number of catalogs C. Hence, we first try to reduce the number of possible subsets by observing that sources that are far away cannot be from the same object. So we can impose some distance constraints on the sources that are put into the same candidate association set. In doing so, we should be careful not to discard potentially true associations later on because say two sources from the first 2 catalogs that are far away might not be a 2−way matching; but, if on the third catalog, there is a source lying in the middle of the path between these 2 sources, the 3 sources together might be a 3−way matching.
That being said, this suggests an idea for dividing the whole region of the sky that is of interest into different islands where the sources are clustered together so that instead of solving one big problem, we could break it into smaller problems and make use of parallel computing. Essentially, we first apply a single-linkage clustering algorithm to our dataset, which is done using the DB-SCAN algorithm with parameters "min samples" = 2 and "eps" = 5 × max ic∈D σ ic . With the chosen parameters, we are essentially performing the friends-of-friends algorithm. It turns out that for our simulation, most of these islands consist of only 1 source from each catalog. Hence, from now on, we will show the result for this scenario of having 1 source from each catalog. This situation is not peculiar to our simulation but is, in fact, observed in real data sets from multiple visits of the same part of the sky by the same telescope. Analysis for the multiple sources per catalog will be discussed later. As can be seen in Figure 3, even though we are able to handle more than 3 catalogs, the maximum number of catalogs we could analyze in a day is 20. Though, similar to Shi et al. (2019), we do not include any pruning procedures, such as those described in Kunszt et al. (2001); Gorski et al. (2005); Gray et al. (2007); Lee & Budavári (2017). These pruning procedures might speed up the matching, but from our experience the complexity of the problem is still exponential in terms of the number of catalogs. The next paragraph discusses how far we could get using DirILP formulation.
DirILP formulation analysis. -The main drawback from the previous approach is that the process of creating potential subsets T is exponential in terms of the number of catalogs. Even if we consider the island, the number of nonempty subsets in such an island will still be 2 C −1, so creating the variables for the ILP takes a tremendous amount of time.
DirILP formulation attempts to fix that problem by reducing the time complexity to create the variables for the ILP to something that is polynomial in the total number of sources. However, since this catalog matching problem is intrinsically difficult, we still have to tackle the exponential complexity of the problem somewhere else: this appears in the time needed to solve the ILP. We believe that with advances in the field of integer linear programming, the Gurobi solver will be able to solve this problem more efficiently. It turns out using DirILP, we are able to tackle up to 60 catalogs. The comparison for the total running time between CanILP and DirILP is shown in Figure 3. In addition, we also include the set up time and optimization time for each formulation in Figures 4 and 5.  Moreover, by including some heuristic constraints, such as imposing a time limit between incumbent solutions, on the Gurobi solver, we are able to push the DirILP further to handle 160 catalogs.
Finally, it is important to note that the associations given by CanILP and DirILP are the same and they match the ground truth perfectly. Hence, there is no difference in the accuracy of the matching between the two approaches. They only differ in their running time.

General case: different κ ic for every detection
For the general case, both approaches still give all correct associations that match the ground truth. However, as in the special case, DirILP is still more efficient at solving the matching problem than CanILP, as shown in Figure 6. We should point out that even though in this general setting, the optimal value found in DirILP is just an approximation of the Bayes factor associated with the ground-truth matching, the values are still quite close to each other. More importantly, the associations obtained from DirILP still match the ground-truth associations.

Multiple sources per catalog in each island
Recall that in the previous sections, we assume that in each island there is only one detection from each catalog, which is a reasonable assumption in many real-life situations. In this section, we would like to discuss scenarios when the uncertainty σ ic is large or the source density is very high. These scenarios will result in islands where there might be multiple detections from each catalog in an island. It turns out that in our simulation, CanILP and DirILP still give the correct association under this scenario. However, both methods run much slower than in the previous scenario and are able to handle only about half as many catalogs with the same settings on the algorithms. We give an example of the running time for the 2 methods when there are 2 detections from each catalog. One can see how much worse it can get when the number of detections from each catalog becomes larger. Figure 9 shows the total running time for both CanILP and DirILP when there are 2 detections from each catalog in an island.

IMPLEMENTATION AND SOFTWARE
CanILP and DirILP algorithms are implemented in several Jupyter notebooks. They share a common structure: In the first part, we create a simulation with different catalogs and a number of mock objects on each of the catalogs. Next, we perform the DBSCAN algorithm to output different islands, or clusters of detections. Again, as mentioned in 3.1, with our chosen parameters, this is similar to executing a friends-of-friends algorithm. The reason we pick DBSCAN is because of its well-developed library in Python.
After running the clustering algorithm, we implement CanILP and DirILP to solve the catalog matching problem in each island. The optimization problems in these modules were solved using Gurobi software Gurobi Optimization (2020).
In addition, we employ several (optional) heuristics in the DirILP algorithm to speed up the catalog matching procedure. First, any 2 sources that are more than 8σ away from each other, we force them to belong to separate objects. Second, for sources that are 0.1σ away from each other, we restrict them to belong to the same object. Finally, we set an MIP Gap (optimality gap) of 0.5% to prevent Gurobi from taking too long to verify optimality. Through our experiments, we have found that running the algorithm with these heuristics give the same results but it was 10 times faster. The notebooks can be found on Github at https://github.com/tunguyen52/Nway-matching.

SUMMARY AND FUTURE WORK
We have shown how the CanILP approach of Shi et al. (2019) and the new DirILP solve for a globally optimal matched catalog in crowded fields where a greedy approach is not sufficient. The former enumerates the candidate associations and picks the optimal combination of those; the latter introduces variables to directly assign sources to objects. The new DirILP formulation is superior in the sense that it scales to large number of catalogs better, i.e., produces results in less time. The method comes at a price, which is in complexity of the algorithm, especially in case of heteroscedasticity when the catalogs have different astrometric uncertainty.
In fact DirILP only out-performs the previous method when many catalogs are to be crossmatched. We recommend the simpler CanILP approach for small number of catalogs, where the combinatorial explosion is not as severe. In our experiments, this crossover threshold appears to be at around 12 visits or catalogs, beyond which the DirILP method gets faster.
Both of these methods optimize the same objective and yield the best possible catalog matching result in a likelihood sense. No prior on the partition is imposed currently in our study and the accompanying software. While placing priors on the partition might seem complicated, certain simple priors can be easily expressed, such as those that depend only the number of objects in the matched catalog. This is a possible direction to explore in the future.
Additional future work includes testing the software and its performance on imaging surveys, such as multiple visits of the Hyper Suprime-Cam Subaru Strategic Program, whose data collection resembles future observations of the LSST.
This material is based upon work supported by the National Science Foundation under Grant No. 1909709, 1814778, 1452820 and 1934979. T.B. gratefully acknowledges the aforementioned AST/AAG grants from NSF and funding from NASA via awards STScI-49721 and STScI-52333 under NAS5-26555. T.N. and A.B. gratefully acknowledge support from the aforementioned NSF grant and ONR grant N000141812096. A.B. is also grateful for support from the aforementioned NSF grant and AFOSR grant FA95502010341. The authors thank the anonymous referee for the careful review and the thoughtful comments.
The next equation ensures that all sources (i, c) need to belong to exactly one subset: The following equation imposes that every subset takes no more than 1 source from each catalog. i x o ic ≤ 1, ∀o ∈ {1, 2, ..., N }, ∀c ∈ {1, . . . , C} (A4) The following set of constraints on y o ic,i c is an implementation of the definition of y o ic,i c in Section 2.2, which requires y o ic,i c = 1 only if x o ic = x o i c = 1: for all (i, c) = (i , c ) and ∀o.
Since the cardinality of any subset from a partition P is between 0 and C, the following equation states that only 1 of z o k can take a value of 1.
The next constraint is the definition of w o p as described in Section 2.2.
With the specific choice of the constant M as defined below, the equation that follows becomes redundant when z o k = 0 since the RHS will be negative and so t o ≥ 0 becomes the enforcing constraint, and when z o k = 1, the minimization forces t o to be equal to the first term of the RHS.
for all k ∈ {0, 1, 2, · · · , C} and for all o. Finally, the last equation ensures that for an empty subset S o , p o = 0, hence contributing nothing to the objective. This is because when z o 0 = 1 (nothing is assigned to subset S o ), w o p = 0, ∀p. As we are minimizing the objective function with respect to p o , p o will be set to 0. On the other hand, when z o 0 = 0, the constraint becomes p o ≥ ln(2κ)(1 − p w o p ) and again, since we are minimizing, p o will equal this value.

B. DIRILP FORMULATION -GENERAL CASE
Below, we give the ILP Formulation for DirILP when κ ic is different for distinct sources (i, c). Some of these constraints are similar to the special case so we will only give explanations for the new constraints, which are shown after the ILP formulation. We follow the notation introduced in Section 2.2.2. In particular, recall the constants c 0 , c 1 , . . . , c Q designed to model, up to the nearest 100, the sum of subsets of uncertainties κ ic , the associated decision variables u o k and the decision variables χ o P to model the logarithms of such sums.
The objective in the general case is