Random Popular Matchings with Incomplete Preference Lists

Given a set $A$ of $n$ people and a set $B$ of $m \geq n$ items, with each person having a list that ranks his/her preferred items in order of preference, we want to match every person with a unique item. A matching $M$ is called popular if for any other matching $M'$, the number of people who prefer $M$ to $M'$ is not less than the number of those who prefer $M'$ to $M$. For given $n$ and $m$, consider the probability of existence of a popular matching when each person's preference list is independently and uniformly generated at random. Previously, Mahdian showed that when people's preference lists are strict (containing no ties) and complete (containing all items in $B$), if $\alpha = m/n>\alpha_*$, where $\alpha_* \approx 1.42$ is the root of equation $x^2 = e^{1/x}$, then a popular matching exists with probability $1-o(1)$; and if $\alpha<\alpha_*$, then a popular matching exists with probability $o(1)$, i.e. a phase transition occurs at $\alpha_*$. In this paper, we investigate phase transitions in the case that people's preference lists are strict but not complete. We show that in the case where every person has a preference list with length of a constant $k \geq 4$, a similar phase transition occurs at $\alpha_k$, where $\alpha_k \geq 1$ is the root of equation $x e^{-1/2x} = 1-(1-e^{-1/x})^{k-1}$.


Introduction
A simple problem of matching people with items, with each person having a list that ranks his/her preferred items, models many important real-world situations such as the assignment of graduates to training positions [9], families to government-subsidized housing [20], and DVDs to subscribers [13]. The main target of such problems is to find an "optimal" matching in each situation. Various definitions of optimality have been proposed. The least restrictive one is Pareto optimality [1,2,17]. A matching M is Pareto optimal if there is no other matching M ′ such that at least one person prefers M ′ to M but no one prefers M to M ′ . Other stronger definitions include rank-maximality [10] (allocating maximum number of people to their first choices, then maximum number to their second choices, and so on), and popularity [3,7] defined below.

Popular Matching
Consider a set A of n people and a set B of m ≥ n items, with α = m/n ≥ 1. Each person in A has a preference list that ranks some items in B in order of preference. A preference list is strict if it does not contain ties, and is complete if it contains all items in B. Each person can only be matched with an item in his/her preference list, and each item can be matched with at most one person.
For a matching M , a person a ∈ A, and an item b ∈ B, let M (a) denote an item matched with a, and M (b) denote a person matched with b (for convenience, let M (a) be null for an unmatched person a). Let r a (b) denote the rank of item b in a's preference list, with the most preferred item having rank 1, the second most preferred item having rank 2, and so on (for convenience, let r a (null) = ∞). A probabilistic variant of this problem, the Random Popular Matching Problem (rpmp), studies the probability that a popular matching exists in a random instance when given n and m, and each person's preference list is defined independently by selecting the first item b 1 ∈ B uniformly at random, the second item b 2 ∈ B \ {b 1 } uniformly at random, the third item b 3 ∈ B \ {b 1 , b 2 } uniformly at random, and so on. Example 1. Consider the following instance with three people a 1 , a 2 , a 3 and three items b 1 , b 2 , b 3 , with everyone having the same preferences.

Related Work
The concept of popularity of a matching was first introduced by Gardenfors [7] in the context of the Stable Marriage Problem. Abraham et al. [3] presented the first polynomial time algorithm to find a popular matching in a given instance, or to report that none exists. Later, Mestre [16] generalized that algorithm to the case where people are given different voting weights. Manlove and Sng [14] presented an algorithm to determine whether a popular matching exists in a setting known as the Capacitated House Allocation Problem, which allows an item to be matched with more than one person. The notion of popularity also applies when the preference lists are two-sided (matching people with people), both in a bipartite graph (Marriage Problem) and general graph (Roommates Problem). Biró et al. [5] developed an algorithm to test popularity of a matching in these two settings, and proved that determining whether a popular matching exists in these settings is NP-hard when ties are allowed.
While a popular matching does not always exist, McCutchen [15] introduced two measures of "unpopularity" of a matching, the unpopularity factor and the unpopularity margin, and showed that finding a matching that minimizes either measure is NP-hard. Huang et al. [8] later gave algorithms to find a matching with bounded values of these measures in certain instances. Kavitha et al. [12] introduced the concept of a mixed matching, which is a probability distribution over matchings, and proved that a mixed matching that is popular always exists.
For rpmp in the case with strict and complete preference lists, Mahdian [13] proved that if α = m/n > α * , where α * ≈ 1.42 is the root of equation x 2 = e 1/x , then a popular matching exists with high (1 − o(1)) probability in a random instance. On the other hand, if α < α * , a popular matching exists with low (o(1)) probability. The point α = α * can be regarded as a phase transition point, at which the probability rises from asymptotically zero to asymptotically one. Itoh and Watanabe [11] later studied the weighted case where each person has weight either w 1 or w 2 , with w 1 ≥ 2w 2 , and found a phase transition at α = Θ(n 1/3 ).

Our Contribution
rpmp in the case that preference lists are strict but not complete, with every person's preference list having the same length of a constant k was simulated by Abraham et al. [3], and was conjectured by Mahdian [13] that the phase transition will shift by an amount exponentially small in k. However, the exact phase transition point, or whether it exists at all, had not been found yet. In this paper, we study this case and prove a phase transition at α = α k , where α k ≥ 1 is the root of equation xe −1/2x = 1 − (1 − e −1/x ) k−1 . In particular, we prove that for k ≥ 4, if α > α k , then a popular matching exists with high probability; and if α < α k , then a popular matching exists with low probability. For k ≤ 3, where the equation does not have a solution in [1, ∞), a popular matching always exists with high probability for any value of α ≥ 1 without a phase transition.

Preliminaries
For convenience, for each person a ∈ A we append a unique auxiliary last resort item ℓ a to the end of a's preference list (ℓ a has lower preference than all other items in the list). By introducing the last resort items, we can assume that every person is matched because we can simply match any unmatched person a with ℓ a . Note that these last resort items are not in B and do not count toward m, the total number of "real items." Also, let L = {ℓ a |a ∈ A} be the set of all last resort items.
For each person a ∈ A, let f (a) denote the item at the top of a's preference list. Let F be the set of all items b ∈ B such that there exists a person a ′ ∈ A with f (a ′ ) = b, and let S = B − F . Then, for each person a ∈ A, let s(a) denote the highest ranked item in a's preference list that is not in F . Note that s(a) is well-defined for every a ∈ A because of the existence of last resort items.
We say that a matching M is A-perfect if every person a ∈ A is matched with either f (a) or s(a). Abraham et al. [3] proved the following lemma, which holds for any instance with strict (not necessarily complete) preference lists. Lemma 1. [3] In an instance with strict preference lists, a popular matching exists if and only if an A-perfect matching exists.
The proof of Lemma 1 first shows that a matching M is popular if and only if M is an A-perfect matching such that every item in F is matched in M . This equivalence implies the forward direction of the lemma. On the other hand, if an A-perfect matching M exists in an instance, the proof shows that we can modify M to make every item in F matched, hence implying the backward direction of the lemma.
It is worth noting another useful lemma about independent and uniform selection of items at random proved by Mahdian [13], which will be used throughout this paper.

Complete Preference Lists Setting
We first consider the setting that every person's preference list is strict and complete. Note that when m > n and the preference lists are complete, the last resort items are not necessary.
From a given instance, we construct a top-choice graph, a bipartite graph with parts B ′ = B and S ′ = S such that each person a ∈ A corresponds to an edge connecting f (a) ∈ B ′ and s(a) ∈ S ′ . Note that multiple edges are allowed in this graph. Previously, Mahdian [13] proved the following lemma. By Lemmas 1 and 3, the problem of determining whether a popular matching exists is equivalent to determining whether the top-choice graph contains a complex component. However, the difficulty is that the number of vertices in the randomly generated top-choice graph is not fixed. Therefore, a random bipartite graph G(x, y, z) with fixed number of vertices is defined as follows to approximate the top-choice graph.
is selected independently and uniformly at random (with replacement) from the set of all possible edges between a vertex in V and a vertex in U .
This auxiliary graph has properties closely related to the top-choice graph. Mahdian [13] then proved that if α > α * ≈ 1.42, then G(m, h, n) contains a complex component with low probability for any integer h ∈ [e −1/α m − m 2/3 , e −1/α m + m 2/3 ], and used those properties to conclude that the top-choice graph also contains a complex component with low probability, hence a popular matching exists with high probability. Theorem 1. [13] In a random instance with strict and complete preference lists, if α > α * , where α * ≈ 1.42 is the solution of the equation x 2 e −1/x = 1, then a popular matching exists with probability 1 − o(1).
Theorem 1 serves as an upper bound of the phase transition point in the case of strict and complete preference lists. On the other hand, the following lower bound was also proposed by Mahdian [13] along with a sketch of the proof, although the fully detailed proof was not given.
In a random instance with strict and complete preference lists, if α < α * , then a popular matching exists with probability o(1).

Incomplete Preference Lists Setting
The previous section shows known results in the setting that preference lists are strict and complete. However, preference lists in many real-world situations are not complete, as people may regard only some items as acceptable for them. In the setting that the preference lists are strict but not complete, we will consider the case that every person's preference list has equal length of a constant k ≤ m (not counting the last resort item). Such instance is called an instance with k-incomplete preference lists.
Definition 2. For a positive integer k ≤ m, a random instance with strict and k-incomplete preference lists is an instance with each person's preference list chosen independently and uniformly from the set of all m! (m−k)! possible k-permutations of the m items in B at random.
is the highest ranked item in a's preference list not in F . The main difference from the complete preference lists setting is that in the incomplete preference lists setting, s(a) can be either a real item or the last resort item ℓ a . For each person a ∈ A, let P a be the set of items in a's preference list (not including the last resort item ℓ a ). We then define

Top-Choice Graph
Analogously to the complete preference lists setting, we define the top-choice graph of an instance with strict and k-incomplete preference lists to be a bipartite graph with parts B ′ = B and S ′ ∪ L ′ , where S ′ = S and L ′ = L. Each person a ∈ A 2 corresponds to an edge connecting f (a) ∈ B ′ and s(a) ∈ S ′ . We call these edges normal edges. Each person a ∈ A 1 corresponds to an edge connecting f (a) ∈ B ′ and s(a) = ℓ a ∈ L ′ . We call these edges last resort edges.
Although the statement of Lemma 3 proved by Mahdian [13] is for the complete preference lists setting, exactly the same proof applies to incomplete preference lists setting as well. The proof first shows that an A-perfect matching exists if and only if each edge in the top-choice graph can be oriented such that each vertex has at most one incoming edge (because if an A-perfect matching M exists, we can orient each edge corresponding to a ∈ A toward the endpoint corresponding to M (a), and vice versa). Then, the proof shows that for any undirected graph H, each edge of H can be oriented in such a manner if and only if H does not have a complex component. Thus we can conclude the following lemma. In contrast to the complete preference lists setting, the top-choice graph in the incomplete preference lists setting has two types of edges (normal edges and last resort edges) with different distributions, and thus cannot be approximated by G(x, y, z) defined in the previous section. Therefore, we have to construct another auxiliary graph G ′ (x, y, z 1 , z 2 ) as follows.
This graph has z 1 + z 2 edges. Each of the first z 1 edges is selected independently and uniformly at random (with replacement) from the set of all possible edges between a vertex in V and a vertex in U . Then, each of the next z 2 edges is constructed by the following procedures: Uniformly select a vertex v i from V at random (with replacement); then, uniformly select a vertex u ′ j that has not been selected before from U ′ at random (without replacement) and construct an edge The intuition of G ′ (x, y, z 1 , z 2 ) is that we imitate the distribution of the top-choice graph in the incomplete preference list setting, with V , U , and U ′ correspond to B ′ , S ′ , and L ′ , respectively, and the first z 1 edges and the next z 2 edges correspond to normal edges and last resort edges, respectively.
Similarly to the complete preference lists setting, this auxiliary graph has properties closely related to the top-choice graph in incomplete preference lists setting, as shown in the following lemma.
From Chebyshev's inequality, we have as desired.

Size of A 2
Since our top-choice graph has two types of edges with different distributions, we first want to bound the number of each type of edges. Note that the top-choice graph has |A 2 | normal edges and |A 1 | last resort edges, so the problem is equivalent to bounding the size of A 2 . First, we will prove the next two lemmas, which will be used to bound the ratio |A 2 | n . Lemma 6. In a random instance with strict and k-incomplete preference lists, Proof. Let c 1 > 0 be any constant. From Lemma 2 with y = n and z = m, we have Therefore, from (1) and (2) we can conclude that Lemma 7. In a random instance with strict and k-incomplete preference lists, holds for any a ∈ A for sufficiently large m, given any constant c 2 > 0.
Proof. If k = 1, then we have P a ⊆ F for every a ∈ A, which means Pr[a ∈ A 2 ] = 0 and thus the lemma holds. From now on, we will consider the case that k ≥ 2. Let c 2 > 0 be any constant. We can select a sufficiently small c 1 (e.g. Note that a ∈ A 1 if and only if P a − {f (a)} ⊆ F . Consider the process that we first independently and uniformly select the first-choice item of every person in A from the set B at random, creating the set F . Suppose that |F | = q for some fixed integer q ∈ I. Then, for each a ∈ A, we uniformly select the remaining k − 1 items in a's preference list one by one from the remaining m Since q−1 k−1 / m−1 k−1 converges to q m k−1 when m increases to infinity for every q ∈ I, it is sufficient to consider Pr a ∈ A 1 |F | = q = q m k−1 .

Now consider
For the lower bound of Pr[a ∈ A 1 ], we have where the last inequality follows from (3) where the last inequality follows from (4). Thus, we can conclude that Pr[a ∈ A 1 ] < (1 − e −1/α ) k−1 + c 2 for sufficiently large m. Therefore, which is equivalent to Finally, the following lemma shows that the ratio |A 2 | n lies around a constant 1 − (1 − e −1/α ) k−1 with high probability. Lemma 8. In a random instance with strict and k-incomplete preference lists, with probability 1 − o(1) for any constant c 3 > 0.
Proof. If k = 1, then we have P a ⊆ F for every a ∈ A, which means |A 2 | = 0 and thus the lemma holds. From now on, we will consider the case that k ≥ 2. Let c 3 > 0 be any constant. We can select a sufficiently small c 2 such that c 2 (1 + (1 − e −1/α ) k−1 + c 2 ) < c 3 and thus From Lemma 7, we have for sufficiently large m.
For each a ∈ A, define an indicator random variable X a such that Note that |A 2 | = a∈A X a . From (7), we have for each a ∈ A, and from the linearity of expectation we also have Since X a and X a ′ are independent for any pair of distinct a, a ′ ∈ A, we have Then, from Chebyshev's inequality and (8) we have Therefore, from (5), (6), and (8) we can conclude that

Main Results
For each value of k, we want to find a phase transition point α k such that if α > α k , then a popular matching exists with high probability; and if α < α k , then a popular matching exists with low probability. We do so by proving the upper bound and lower bound separately. Proof. By the definition of G ′ (m, h, βn, (1 − β)n), each vertex in U ′ has degree at most one, thus removing U ′ does not affect the existence of a complex component. Moreover, the graph G ′ (m, h, βn, (1−β)n) with part U ′ removed has exactly the same distribution as G(m, h, βn) given in Definition 1. Therefore, it is sufficient to consider the graph G(m, h, βn) instead.

Upper Bound
Using the same technique as in Mahdian's proof of [13,Lemma 4], define a minimal bad graph to be two vertices joined by three vertex-disjoint paths, or two vertex-disjoint cycles joined by a path which is also vertex-disjoint from the two cycles except at both endpoints (the path can be degenerate, which is the only exception that the two cycles share a vertex). Note that any proper subgraph of a minimal bad graph does not contain a complex component, and every graph that contains a complex component must contain a minimal bad graph as a subgraph.
Let X and Y be subsets of vertices of G(m, h, βn) in V and U , respectively. Define BAD X,Y to be an event that X ∪ Y contains a minimal bad graph as a spanning subgraph. Then, let p 1 = |X|, p 2 = |Y |, and p = p 1 + p 2 . Observe that BAD X,Y can occur only when |p 1 − p 2 | ≤ 1, so p 1 , p 2 ≥ p−1 2 . Also, there are at most 2p 2 non-isomorphic minimal bad graphs with p 1 vertices in V and p 2 vertices in U , with each of them having p 1 !p 2 ! ways to arrange the vertices, and there are at most (p + 1)! βn p+1 1 mh p+1 probability that all p + 1 edges of each graph are selected in our random procedure. By the union bound, the probability of BAD X,Y is at most Again, by the union bound, the probability that at least one BAD X,Y occurs is at most By the assumption, we have α 2 e −1/α > β 2 , so α 2 β 2 (e −1/α − m −1/3 ) > 1 for sufficiently large m, hence the above sum converges. Therefore, the probability that at least one BAD X,Y happens is at most O(1/n).
We can now prove the following theorem, which serves as an upper bound of α k . Theorem 3. In a random instance with strict and k-incomplete preference lists, if αe −1/2α > 1 − (1 − e −1/α ) k−1 , then a popular matching exists with probability 1 − o(1). probability 1 − o(1). Moreover, we have β = t n < αe −1/2α for any integer t ∈ J 1 . Define E 1 to be an event that a popular matching exists in a random instance. First, consider the probability of E 1 conditioned on |A 2 | = t for each fixed integer t ∈ J 1 . By Lemmas 5 and 9, the top-choice graph contains a complex component with probability O(n −1/3 ) = o(1). Therefore, from Lemmas 1 and 4 we can conclude that a popular matching exists with probability 1−o(1), i.e. Pr E 1 |A 2 | = t = 1−o(1) for every fixed integer t ∈ J 1 . So Hence, a popular matching exists with probability 1 − o(1). Proof. Again, by the same reasoning as in the proof of Lemma 9, we can consider the graph G(m, h, βn) instead of G ′ (m, h, βn, (1 − β)n), but now we are interested in an event that G(m, h, βn) does not contain a complex component.

Lower Bound
Since αe −1/2α < β, for sufficiently small ǫ > 0, we still have αe −1/2α < (1 − ǫ) 3/2 β. Consider the random bipartite graph G(m, h, (1 − ǫ)βn) with parts V having m vertices and U having h vertices. For each vertex v, let a random variable r v be the degree of v. Since there are (1 − ǫ)βn edges in the graph, the expected value of r v for each v ∈ V is for sufficiently large m, the expected value of r v for each v ∈ U is for sufficiently large m. Furthermore, each r v has a binomial distribution, which converges to Poisson distribution when m increases to infinity. The graph can be viewed as a special case of an inhomogeneous random graph [6,19]. With the assumption that c 1 c 2 > (1−ǫ) 3 β 2 α 2 e −1/α > 1, we can conclude that the graph contains a giant component (a component containing a constant fraction of vertices of the entire graph) with probability 1 − O(1/n), where the explanation is given in Appendix B. Finally, consider the construction of G(m, h, βn) by putting ǫβn more random edges into G(m, h, (1 − ǫ)βn). If two of those edges land in the giant component C, a complex component will be created. Since C has size of a constant fraction of m, each edge has a constant probability to land in C, so the probability that at most one edge will land in C is exponentially low. Therefore, G(m, h, βn) does not contain a complex component with probability at most O(1/n).
We can now prove the following theorem, which serves as a lower bound of α k . Proof. Like in the proof of Theorem 3, we can select a small enough δ 2 > 0 such that We have |A 2 | n ∈ J 2 with probability 1 − o(1) and β = t n > αe −1/2α for any integer t ∈ J 2 . Now we define E 2 to be an event that a popular matching does not exist in a random instance. By the same reasoning as in the proof of Theorem 3, we can prove that Pr E 2 |A 2 | = t = 1 − o(1) for every fixed t ∈ J 2 and reach an analogous conclusion that Pr[E 2 ] = 1 − o(1).
Theorem 5. In a random instance with strict and k-incomplete preference lists with k ≥ 4, if α > α k , where α k ≥ 1 is the root of equation xe −1/2x = 1 − (1 − e −1/x ) k−1 , then a popular matching exists with probability 1 − o(1); and if α < α k , then a popular matching exists with probability o(1). In such a random instance with k ≤ 3, a popular matching exists with probability 1 − o(1) for any value of α ≥ 1.

Conclusion and Future Work
For each value of k ≥ 4, the phase transition occurs at the root α k ≥ 1 of equation xe −1/2x = 1 − (1 − e −1/x ) k−1 as shown in Figure 1. Note that as k increases, the right-hand side of the equation converges to 1, hence α k converges to Mahdian's value of α * ≈ 1.42 in the case with complete preference lists. Remark. For each person a, as the length of P a increases, the probability that P a F and thus a ∈ A 2 also increases, and so do the expected size of A 2 and the phase transition point. Therefore, in the case that the lengths of people's preference lists are fixed but not equal (e.g. half of the people have preference lists with length k 1 , and another half have those with length k 2 ), the phase transition will occur between α k min and α kmax , where k min and k max are the shortest and longest lengths of people's preference lists, respectively.
In many real-world situations, ties can and are likely to occur among people's preference lists. rpmp in the case with ties allowed was mentioned by Mahdian [13] and simulated by Abraham et al. [3] using a parameter t to denote the probability that each entry in a preference list is tied with previous entry. Intuitively, and also confirmed by the experimental results of [3], when ties are very likely to occur (t is very close to 1), a popular matching is likely to exist even when α = 1. However, the transition point for each value of t has still not been found yet. A possible future work is to study the transition point in this case for each value of t, both with complete and incomplete preference lists. Another interesting generalization of rpmp is the Capacitated House Allocation Problem, where each item can be matched with more than one person. A possible future work is to find the transition point in the most basic case where every item has the same capacity c.
which has the largest eigenvalue ||T κ || = √ c 1 c 2 > 1. This is a necessary and sufficient condition to conclude that G(m, h, (1 − ǫ)βn) contains a giant component with 1 − o(1) probability [6,19]. In fact, by giving a precise bound in each step of [6], it is possible to show that the probability is greater than 1 − O(1/n) as desired.
Alternatively, we hereby show a direct proof of the bipartite case by approximating the construction of the graph with a Galton-Watson branching process similar to that in the proof of existence of a giant component in the Erdős-Rényi graph in [4, pp.182-192].
The Galton-Watson branching process is a process that generates a random graph in a breadth-first search tree manner when given a starting vertex and a distribution of the degree of each vertex. The process begins when the starting vertex spawns a number of children which are put in the queue in some order. Then, the first vertex in the queue also spawns children which are put at the end of the queue by the same manner, and so on. The process may stop at some point when the queue becomes empty, or otherwise continues indefinitely.
Consider the construction of G(m, h, (1− ǫ)βn) with parts V and U starting at a vertex and discovering new vertices in a breadth-first search tree manner. We approximate it with the Galton-Watson branching process. Let T be the size of the process (T = ∞ if the process continues forever). Let z 1 and z 2 be the probability that T < ∞ when starting the process at a vertex in V and U , respectively. Also, let Z 1 and Z 2 be the number of children the root has when starting the process at a vertex in V and U , respectively.
Given that the root has i children, in order for the branching process to be finite, all of the i branches must be finite, so we get the equations.
Finally, when we perform the Galton-Watson branching process at a vertex in G(m, h, (1− ǫ)βn), there is a constant probability that the process will continue indefinitely, thus creating a giant component. Otherwise, with probability 1− O(1/n 2 ) we will create a component with size smaller than k 1 log n, so we can remove that component from the graph and then repeatedly perform the process starting at a new vertex. After repeatedly performing this process for some logarithmic number of times, we only remove O(log 2 n) vertices from the graph, which does not affect the constant y = Pr[T = ∞], so the probability that we never end up with a giant component in every time is at most O(1/n). Therefore, G(m, h, (1 − ǫ)βn) contains a giant component with probability 1 − O(1/n).