Foraging with MUSHROOMS: A Mixed-Integer Linear Programming Scheduler for Multimessenger Target of Opportunity Searches with the Zwicky Transient Facility

Electromagnetic follow-up of gravitational wave detections is very resource intensive, taking up hours of limited observation time on dozens of telescopes. Creating more efficient schedules for follow-up will lead to a commensurate increase in counterpart location efficiency without using more telescope time. Widely used in operations research and telescope scheduling, mixed integer linear programming (MILP) is a strong candidate to produce these higher-efficiency schedules, as it can make use of powerful commercial solvers that find globally optimal solutions to provided problems . We detail a new target of opportunity scheduling algorithm designed with Zwicky Transient Facility in mind that uses mixed integer linear programming. We compare its performance to \texttt{gwemopt}, the tuned heuristic scheduler used by the Zwicky Transient Facility and other facilities during the third LIGO-Virgo gravitational wave observing run. This new algorithm uses variable-length observing blocks to enforce cadence requirements and ensure field observability, along with having a secondary optimization step to minimize slew time. \blue{We show that by employing a hybrid method utilizing both this scheduler and \texttt{gwemopt}, the previous scheduler used, in concert, we can achieve an average improvement in detection efficiency of 3\%-11\% over \texttt{gwemopt} alone} for a simulated binary neutron star merger data set consistent with LIGO-Virgo's third observing run, highlighting the potential of mixed integer target of opportunity schedulers for future multimessenger follow-up surveys.


INTRODUCTION
The detection of GW170817  in August 2017 signaled the beginning of a new era of multimessenger astronomy, promising advances in r-process nucleosynthesis (e.g. Chornock et al. 2017;Coulter et al. 2017;Cowperthwaite et al. 2017;Pian et al. 2017), the neutron star equation of state (e.g. Abbott et al. 2018;Radice et al. 2018;Coughlin et al. 2019Dietrich et al. 2020a), and the value of the Hubble constant (e.g. Hotokezaka et al. 2019;Dietrich et al. 2020a). This was thanks to the detection of GW170817 ) and its electromagnetic counterparts: a kilonova (ultraviolet/optical/near-IR emission generated by the radioactive decay of rprocess elements) (e.g. Kasliwal et al. 2017;Evans et al. 2017;Kilpatrick et al. 2017;Pian et al. 2017;Shappee et al. 2017;Smartt et al. 2017), a short gamma ray burst (e.g. Goldstein et al. 2017), and an afterglow (e.g. Hallinan et al. 2017;Troja et al. 2017). However, since then, no further electromagnetic counterparts to gravitational-wave detections have been confirmed, despite several other detected binary neutron star and neutron star black hole mergers during LIGO-Virgo's third observing run (Abbott et al. 2021). This can mostly be explained by the localization areas of neutron star containing mergers being much larger than expected, (e.g. Petrov et al. 2021), making efficient observation planning all the more important. With, on average, a much larger area than previously thought to search, there are many more choices for potential schedules, but it will take an optimal scheduler to maximize scientific output.
Fundamentally, telescope scheduling software determines which fields to observe in what order, subject to environmental and programmatic constraints. In the case of the follow-up of large sky localizations produced by gravitational-wave (e.g. Coughlin et al. 2019a;Anand et al. 2020) or gamma-ray burst (e.g. Coughlin et al. 2019b; Ahumada et al. 2021) events with wide field of view surveys such as the Zwicky Transient Facility (Bellm et al. 2019b;Graham et al. 2019;Dekany et al. 2020;Masci et al. 2018), the goal is usually to maximize an objective function, which is typically taken to be the integral of the probability skymap over the combined footprint of all the observations, although other choices are possible . These observations should also be completed in the minimal amount of time, as many different science programs time share on the same telescope, and therefore any time saved can be utilized by other science programs (Bellm et al. 2019c).
While, in principle, this could be done manually, schedules designed this way are labor-intensive and suboptimal, and it is unclear how the heuristics translate to survey effectiveness. Another common approach is the use of "Greedy" algorithms, which compute a metric or score for each possible target, select the target with the current highest value, observe it, and then repeat the process. This is a ubiquitous approach, adapted by commonly used packages ranging from Astroplan (Morris et al. 2018) to gwemopt (Coughlin et al. , 2019cAlmualla et al. 2020) for a variety of purposes. Unfortunately, due to the inability to plan ahead, the fields chosen are not optimal; instead, an optimal schedule not only accounts for the current possible observations but also for past and potential future observations to maximize the scientific output from those observations, such as the Zwicky Transient Facility's need for a minimum 30-minute cadence when searching for transients to rule out asteroids.
Unfortunately, the scheduling problem is NP-complete and the number of observing sequences is combinatorially large. A well-known model that can make these problems computationally tractable is the use of Integer Linear Programming (ILP); ILP problems have variables which take only discrete integer values, linear objective functions, and linear constraints. In the following, we will also use mixed ILP, which can include some non-integral variables. One popular application of this in astronomy is within the Las Cumbres Observatory (LCO) 1 scheduler, who operate a network of identical imagers and spectrographs; their scheduler (Lampoudi et al. 2015) uses ILP to maximize the total number of observations obtained, weighted by the priority assigned to them by the Time Allocation Committee (TAC). ALMA solves a similar ILP model to maximize TAC-assigned scientific priorities, program completion, and telescope utilization (Solar et al. 2016). ZTF's timeallocation scheduler uses ILP to solve both programlevel and global scheduling constraints and optimally or-1 https://lco.global/ der individual observational blocks (Bellm et al. 2019b), however, the schedulers used to plan within each observation block do not all use ILP, such as the greedy scheduler gwemopt. Bellm et al. 2019a, a paper whose authorship spans many surveys and open-source astronomy software producers, advocate for community emphasis on the use of quantitative objective functions and ILP-based scheduling approaches to address the rapid proliferation of instruments, many of which will benefit from coordination.
In this paper, we introduce MUSHROOMS (Milp-Using ScHeduleR Of sky lOcalization MapS), a MILPbased scheduler for multimessenger follow-up with wide field-of-view surveys. We structure the paper as follows. We start by describing requirements faced by ZTF gravitational-wave follow-up observations in section 2. We then introduce MUSHROOMS and describe the scheduling algorithm in section 3, laying out its MILP formalism and the reasoning behind certain design decisions. In section 4, we use MUSHROOMS to schedule simulated skymaps based on LIGO-Virgo's third observing run and characterize its run time, efficiency and other relevant metrics, and summarize our results and future outlook in section 5.

OBSERVING REQUIREMENTS
Multimessenger astronomy supplements electromagnetic observations with observations using other information carriers such as gravitational waves or neutrinos. Since 2018, ZTF has been used for target of opportunity, multimessenger follow-up searches, both searching for the sources of gravitational-wave detections during the third LIGO-VIRGO observing run (Coughlin et al. 2019a;Anand et al. 2020;Kasliwal et al. 2020), and gamma-ray bursts from detectors like the Fermi gamma-ray burst monitor (Coughlin et al. 2019b;Ahumada et al. 2021). However, there are a number of factors one has to consider when designing schedules for such systems, both in terms of general observational requirements for ground-based surveys and certain ZTF-specific restrictions or demands. For example, targets are only observable at night and when they are above a minimum altitude from horizontal (i.e., below a minimum airmass). There are also common sense constraints: for example, the scheduler cannot schedule more than one field observation at the same time, and it must restrict observations to the window of time available for observing. In addition, there are also limits imposed by the telescope and observing system itself, such as slew speed.
There are also a number of multimessenger transient follow-up restrictions that must be accounted for. For TSP stands for travelling salesperson example, for the ZTF gravitational wave follow-up program, there is a 30-minute cadence requirement, observing once in r-and g-bands, which serves to both eliminate asteroids and gain color information about detected transients. This requirement imposes not only a limit on the return time, but also must account for the filter exchange time within ZTF, which is 2 minutes long. Another special feature of ZTF follow-up is that the system uses a fixed grid of reference images, a preset selection of a limited number of telescope pointings to choose from.
In addition to the requirements, the goal is to limit the total amount of time required for these observations through the selection of an objective function, whose choice we will describe below.

SCHEDULING ALGORITHM AND MILP FORMULATION
Due to the design of the ZTF survey and data system, ZTF has a fixed grid of 1778 telescope pointings. Given a probability density map in right ascension and declination, a span of time t 0 to t o + T , and a fixed exposure time ∆t, the goal is to produce a schedule that meets the observing requirements laid out in Sec 2 by selecting a set of fields S to observe and arranging them in time (with repeats). The objective is to maximize the total probability density contained in the area observed by at least 1 field in S minus a penalty factor proportional to the amount of fields observed with proportionality constant p. We maximize the probability density contained because the overall goal of this scheduler is to identify new transients that could potentially be gravitational wave source for follow-up observation. This is done by comparing observations to reference images to find new sources, so maximizing the probability density observed in theory maximizes the probability of detecting the source for follow-up. We introduced the penalty factor p with the other survey priorities in mind. It allows the user to restrict the search to only fields that introduce more than p to the total probability observed. The expected range of p is [0, 0.02], though it has to be manually selected for each skymap. While not making it impractical for use in scheduling, using it does require some extra effort from the user to determine the trade off between observing time and detection probability best for their situation. This creates shorter duration schedules that only target the highest probability fields and are less intrusive to the other programs; with a value of zero for p, the schedule will fill all available time.
MUSHROOMS (Parazin 2022) is a python-based mixed-integer scheduler that uses the commercially available software Gurobi, which is free with an academic license. When the network of ground based gravitational-wave detectors localize a new event, they release a probability map of where in the sky the source is most likely located (Singer & Price 2016), which MUSHROOMS takes as one of its inputs. An example schedule overlaid on its corresponding probability map is shown in Fig. 2. For each given source localization probability map, referred to as a skymap from here on, the MUSHROOMS algorithm works in a three-step process: a preliminary pruning step, a block-division step, and an observation sequencing step. A flowchart illustrating the whole algorithm can be seen in Fig. 1.
In the pruning step, MUSHROOMS reduces the field grid to a user-provided number of fields using a max weighted coverage algorithm (Nemhauser et al. 1978). This is done to reduce run time during the block-division step.
In the next step, the block-division step, a number of observing blocks are constructed out of the field shortlist produced by the pruning step. MUSHROOMS defines an observing block by a start time, an expected end time, and a collection of fields that are visible for the entire expected duration. To observe each block, all the fields within it are observed in same filter, a filter change is executed, and the fields are observed in another filter. These blocks have a minimum size which depends on the given exposure time and ensures that there are at least 30 minutes of observations between field re-observations. MUSHROOMS calculates the expected block length using an average slew time assumption of 10 seconds since the order of observations, which determines the actual slew time for each block, is not found until the next step. We use this block-division heuristic rather than giving the scheduler complete freedom to order fields and filter changes as it sees fit due to the computational complexity of complete freedom, which would require orders of magnitude more time to run.
To minimize slew time within each block, MUSH-ROOMS calculates the slew times from each field within a block to all other fields in that block using a travelling salesperson (TSP) algorithm to find the order of observations that will minimize slew time within each block.
Finally, MUSHROOMS post-processes the schedule to ensure that it is valid and satisfies all the requirements laid out in Appendix 2. Because the block-division algorithm uses a fixed slew time, if one of the blocks has an average slew time higher than that, the block will run longer than expected, and all subsequent blocks will have to be delayed to avoid scheduling 2 observations at the same time. There is an edge case where this delay means MUSHROOMS schedules a field for observation when it is (barely) below the visibility requirement and cannot be observed. In this scenario, we automatically re-run the schedule utilizing a gap parameter to add more time between the offending blocks. However, in the 951 simulations utilized for this paper (see below), it did not occur once.
The variables, constraints and objective function behind MUSHROOMS are laid out explicitly in Appendix A. This algorithm is a modification of the classic max weighted coverage problem (Nemhauser et al. 1978), Figure 3. Run time of block-division algorithm. The large number of schedules clustered at 500 seconds is a result of setting a 500-second time limit for this step of optimization; all 500 second run times are when the MILP solver used would converge quickly on a high-quality solution but then spend the duration trying to lower the optimality gap.
with additional constraints to allow for the creation of valid observing blocks, as well as an additional (optional) penalty factor p in the objective function.

SIMULATED OBSERVING PLANS
To assess the efficiency of the generated schedules, we ran both the MUSHROOMS and gwemopt algorithm on 951 simulated Binary Neutron Star detections consistent with the third LIGO-Virgo observing run (O3) from When performing the block-division algorithm, we recorded the run time of all 951 schedules, which can be seen in Fig. 3. The mean and median run times for this step were 107 and 15 seconds respectively. The large difference between the mean and median can be attributed to the 140 schedules that took the entire time limit of 500 seconds to complete. In these cases, the solver would quickly converge on a high-quality solution, but would then spend the rest of the time limit attempting to narrow the optimality gap. This 500-second time limit was chosen because when developing MUSHROOMS it was observed that most schedules which converged in a reasonable amount of time did so before 500 seconds, though it could easily be lowered to even 100 seconds without a substantial decrease in efficiency. Only an an additional 64 schedules would be truncated and use the near-optimal candidate solutions instead of a solution proven to be globally optimal. Additionally, with a maximum number of 6 blocks (and thus a maximum of 6 filter changes in a night), the average number of filter changes scheduled was 4.7.

Comparisons to gwemopt
For all 951 skymaps we first measured the total probability density observed by each schedule, hereafter referred to as "probability coverage." MUSHROOMS saw an average probability coverage of 0.418, while gwemopt had an average probability coverage of 0.387, an 8.0% increase in probability coverage, however MUSHROOMS' schedules had an average runtime of 23700 seconds, while gwemopt had an average runtime of 16800 seconds, a 41.5% increase in runtime. This is because MUSH-ROOMS was run with p = 0, meaning it filled all available time, while gwemopt has some logic to stop when as it gets diminishing returns by adding more fields to observe. To make a more equal comparison, we focused on the skymaps where MUSHROOMS and gwemopt made no more than 6 additional observations compared to the other. This value was chosen because it kept the difference in the average run time of the schedules low, while still including a large amount of skymaps. The average run times were 22900 seconds for MUSHROOMS and 22800 for gwemopt over this subset of 329 simulated events. An important note here is that there is selection bias here towards skymaps with a greater 90% credible area, since they are the ones that gwemopt usually makes longer schedules for, as well as more well-localized schedules which, due to the event time and location, MUSH-ROOMS and gwemopt both filled almost all available time. A frequency histogram comparing the area distributions can be seen in figure 4.
For this subset, MUSHROOMS has an average probability coverage of 0.353, while gwemopt has an average probability coverage of 0.333, only a 5.8% improvement for MUSHROOMS. For a skymap-by-skymap comparison, figure 5 is a scatter plot comparing the probability coverages achieved by MUSHROOMS and gwemopt over this subset.
An important note, however, is that MUSHROOMS does not always out-perform gwemopt, even if the solution was not truncated by the 500-second time limit. This is because, even though the solution found is an optimal solution for MUSHROOMS' block-division heuristic, it may not be a globally optimal schedule. The design of MUSHROOMS forces the solutions to take on a certain format with observing blocks that are repeated in two different filters. This means the problem is (comparably) easy to implement using MILP and runs quickly, but if the best possible schedule does not fit such a format, MUSHROOMS cannot produce it. gwemopt has more freedom in ordering filter changes and block observations, meaning it can sometimes surpass MUSH-ROOMS, even with a greedy algorithm. Producing a more complex MILP formulation that lacks these restrictions and can always surpass gwemopt is an avenue for future research.
For now, this means we can use a hybrid scheduler to achieve better results than either MUSHROOMS or gwemopt alone. By running both MUSHROOMS and gwemopt and using the schedule with a higher probability coverage, we can get an average coverage of 0.360, an 8.1% improvement over gwemopt alone and a 2.1% improvement over just MUSHROOMS.

Detection efficiency Characterizations
Probability coverage, however, is not equivalent the actual performance a schedule will have, since it fails to capture the difficulties in identifying a kilonova, even if the field containing it is observed, a kilonova might not be detected due to it being too dim to significantly differ from the reference. As fast-fading transients, kilonova vary in magnitude significantly even over the 24 hours both schedules were allotted to search, meaning that the order of observations has a significant impact on the schedule's quality not captured by probability cov-erage. To address this, following Petrov et al. 2021, we characterized the resulting schedules' efficiencies with gwemopt's simulation and injection recovery suite for two different kilonova light curve models, this is done by injecting 10,000 kilonovae into the sky following the skymap's probability distribution, and each schedule's efficiency is the proportion of those kilonova that ZTF would have been able to detect following each schedule. The light curve models for the kilonovae used here, an optimistic and a conservative model, were generated by by the radiative transfer code POSSIS (Bulla 2019) and summarized in Dietrich et al. 2020b. For details about the physical properties of each light curve, see Table 1.
Tables 2 and 3 compare the efficiencies of MUSH-ROOMS, gwemopt and the hybrid implemenation of the two, for all skymaps (table 2) and for just the ones where both schedules are of a similar length (table 3). Due to the large number of simulations, the monte carlo uncertainty in these values is negligible. In both cases, the hybrid method out-performs both schedulers acting on their own, with an efficiency increase of about 11.5% (11.1%) for a conservative (optimistic) light curve in the subset where both schedules are the same length compared to just using gwemopt alone. Figure 6 compares the 90% credible area of each skymap to the percent improvement in efficiency that would result from utilizing the hybrid method as opposed to gwemopt to produce a schedule for it. The selection bias against more-well localized skymaps is clear, as well as MUSHROOMS' comparative weakness at scheduling for these more localized detections. Only 37 out of 97 (38.1%) of detections below 1000 deg 2 were improved upon by mushrooms, while 131 out of 232 (56.5%) of detections above 1000 deg 2 were improved upon by MUSHROOMS.
Because these smaller localizations where MUSH-ROOMS does worse make up a larger proportion of the total observations, this means that in actual use employing a hybrid MUSHROOMS-gwemopt strategy will result in less than a 11.5% (11.1%) improvement in detection efficiency for a conservative (optimistic) light curve. Additionally, since the hybrid strategy will never do worse than gwemopt alone, for a worst-case scenario, where MUSHROOMS is better for none of the remaining 622 skymaps when schedule lengths are equal, that will result in a minimum 3.1% (3.2%) efficiency increase for a conservative (optimistic) light curve, establishing an upper and lower bound of 11% and 3% respectively on the potential performance improvement of this applying hybrid method in real observing scenarios. Figure 6. Percent improvement in detection efficiency by utilizing the hybrid scheduling method. A 0% improvement means that gwemopt performed better than or equal to MUSHROOMS for that skymap. The three outliers at above 100% improvement are schedules that had low efficiencies when run with gwemopt and a small absolute efficiency increase from MUSHROOMS resulted in a large relative increase In this paper, we presented a novel scheduling algorithm for scheduling wide field-of-view survey follow-up for multimessenger events, outlined its MILP formulation, and compared its performance to gwemopt, the Target of Opportunity scheduler used by ZTF and other surveys in recent observing runs. We focused on the MUSHROOMS block-division algorithm, outlining the parameters, decision variables, objective function and constraints used to define this problem. Fundamentally, the block division algorithm is an alteration of a max weighted coverage problem, but instead of simply choos-ing a certain number of fields to look at, the algorithm assigns fields to variable-length blocks that are under further constraints to ensure all fields within them are observable and that no two blocks overlap. We include an additional optional penalty factor introduced into the objective function which allows for one to only observe fields that add enough probability coverage to overcome the penalty factor, leading to shorter schedules that infringe less on other programs. We also introduce a postprocessing step to check for block overlaps that could be introduced by the fixed slew time approximation.
Next, we compared MUSHROOMS to gwemopt, with MUSHROOMS achieving similar efficiencies gwemopt for both light curve models used. We showed that the differing strengths and MUSHROOMS and gwemopt mean when used in concert, one is able to achieve efficiencies 3% to 11% higher than either gwemopt alone.
The algorithm behind MUSHROOMS is a comparatively straightforward one, designed to quickly run on everyday computer hardware while still producing efficient schedules, and it already is able to increase the detection efficiency when used alongside the previous greedy scheduler. This shows the potential of using mixed-integer linear programming for scheduling multimessenger target of opportunity follow-up, and for observational scheduling as a whole, but also that there is significant room for improvement in MUSHROOMS or another Mixed-Integer scheduler, since problem formulation's rigidity in its schedules means it can still be out-done by gwemopt for some schedules.
There are a number of planned improvements for MILP schedulers for multimessenger follow-up. For example, MUSHROOMS does not account for the moon distance and lunar phase when scheduling observations. MUSHROOMS also does not have a straightforward way to respond to weather and other unexpected events. Currently, one would have to edit the input skymap, setting probabilities associated with affected healpix to zero before renormalizing and inputting it to MUSH-ROOMS. Improving it to account for both of those is important future work. Potentially more importantly, it treats the source as having constant flux for the duration of the schedule, which is not correct for the fast transient kilonova models considered here from (Dietrich et al. 2020b). The most straightforward way to address this issue would be to accept a desired light curve model as an additional parameter and make an alteration to the objective function of the block division step, using that model to alter the weight of each pixel by when it is observed, such as multiplying the weight associated with that pixel by the ratio of the light curve magnitude at the observation time to the maximum magnitude. As the complete field order is not determined until the travelling salesman problem step, one may have to use an approximation of when each pixel is observed, such as the midpoint of the first block to observe a given pixel.
The block-division formulation, while a useful heuristic for limited time and computing power, has some limitations, especially when variable exposure times are desired. Producing a model that is not constrained by blocks and can jointly be optimized over the selection and ordering of all fields, subject only to the observing and time constraints, would lead to more efficient schedules. Also allowing the model to vary the exposure times of individual observations would lead to higher chances of detection because it would adjust for time and position dependent sky background. However, both improvements are much more computationally complex, and will require much greater optimization and application of high-level operations research techniques. Using the experience gained from working on this project, among others, several authors of this paper have begun development on a more general multi-facility observation scheduling toolkit which will add those considerations into its problem formulation.  ∀i, j, U i ≥ B i,j If a block makes at least 1 observation, it is being used (A17) ∀i > 0, U i ≤ U i−1 All unused blocks are at the end of the night (A18) ∀l, i∈S l ,j B i,j ≥ y l A pixel is observed if it is in any observed field (A19) ∀i, j t o,i + 2 · (t exp + t slew ) A block's end time must be before the observability end time of all fields within it (A21) A block's start time must be after the observability start time of all fields within it (A23) A block's start time must be after the previous block finishes (A25)