Unbalanced Three-Phase Distribution Grid Topology Estimation and Bus Phase Identification

There is an increasing need for monitoring and controlling uncertainties brought by distributed energy resources in distribution grids. For such goal, accurate three-phase topology is the basis for correlating and exterminating measurements in unbalanced distribution networks. Unfortunately, such topology knowledge is often unavailable due to limited investment, especially for secondary distribution grids. Also, the bus phase connectivity information is inaccurate due to human errors or outdated records. For this challenge, we utilize smart meter data at different phases for an information-theoretic approach to learn the structures. Specifically, we convert the system of three unbalanced phasors into symmetrical components, namely the positive, negative, and zero sequences. Then, we prove that Chow-Liu algorithm can find the optimal topology by utilizing power flow equation and the conditional independence relationships implied by the radial three-phase structure of distribution grids with the presence of incorrect bus phase labels. At last, by utilizing Carson's equation, we prove that the bus phase connection can be correctly identified using voltage measurements. For validation, we extensively simulate on IEEE $37$- and $123$-bus systems using real data from PG\&E, ADRES Project, and Pecan Street. We observe that the algorithm is highly accurate for finding three-phase topology in distribution grids even with strong load unbalancing condition and DERs. This ensures close monitoring and controlling DERs in distribution grids.


I. INTRODUCTION
The power distribution system is currently undergoing a dramatic transformation in both form and function. A largescale deployment of technologies such as rooftop solar, electric vehicles (EVs), and smart home management systems have the potential to offer cheaper, cleaner and more controllable energy to the customers. On the other hand, integrating these resources has been proven to be nontrivial, largely because of their inherent uncertainty and distributed nature.
For example, even a small-scale of DERs can affect the stability of the local distribution grid [1]. Such a problem will be aggravated by the unbalance situation in the distribution grid especially when uneven DER deployment happens. Furthermore, the more frequent bi-directional power flows easily leave the existing monitoring system with passive protective devices, insufficient for robust grid operation. In addition to the static connectivity, mobile components, such as EVs, can further jeopardize the grid due to their frequent plug-in [2]. Therefore, we need carefully designed three-phase grid monitoring tools for islanding and line work hazards in system operation with deep and uneven DER penetration. For such monitoring, grid topology information is a prerequisite.
For topology estimation, the transmission grid assumes a prior grid knowledge, which needs limited error correction. Also, it is assumed that infrequent reconfiguration happens, Y identifiable by generalized state estimation [3]- [5]. Unfortunately, such assumptions do not hold in medium-and lowvoltage distribution grids, where topology can change relatively more frequently with limited sensing devices. Furthermore, many urban distribution lines have been underground for decades, making prior knowledge of topology suspicious and expensive to verify [6].
For distribution grid topology identification, many methods have been proposed in recent years [7]- [11]. Unfortunately, most of them focus on the balanced systems or single phase systems. For utility practice, distribution grids are usually considered unbalanced and have multiple phases when power supplies to buildings and residential houses. Such unbalanced multi-phase systems include three-phase distributions and twophase distributions. One reason is that the loads connected at different phases are unbalanced due to the uneven growth in each feeder territory [12], [13]. For example, surveyed by the American National Standards Institute (ANSI), 2% distribution grids in the USA have a significant undesirable degree of unbalance [14], [15].
In addition to the load unbalance problem in existing grids, such issue will become more frequent in future distribution grids due to growing renewable penetration. For example, the unbalance of the three-phase system appears more often because the installations and operations of many DER devices are not fully controlled by the utilities. This fact makes the requirement of balanced grids in previous works invalid. One way to address this problem is to apply the algorithms developed for single-phase or steady-state system for estimating connectivity of each phase. Then, one can merge the single-phase topology results to recover the three-phase system topology. However, this approach performs poorly in unbalanced three-phase systems due to the power line coupling between multi-phase buses. Finally, for many utilities, as high as 10% phase labeling are incorrect or unknown because of human errors or outdated records. This error makes identifying new topology based on existing methods not sound anymore.
For resolving the problems above, we propose a datadriven method that utilizes the smart meter data in different phases to estimate the topology of three-phase distribution grid systems. Two-phase and one-phase systems are special cases of our consideration. Building on the probabilistic graphical model formulation of distribution grids [16], we first expand from single-phase representation to three-phase systems with incorrect phase labels. In such model, a node represents the three-phase bus voltages and an edge between nodes indicates the statistical dependency among three-phase bus voltage measurements. Subsequently, we convert the system of three unbalanced phasors to three symmetrical components, namely the positive, negative, and zero sequences. Then, we prove that the Chow-Liu algorithm can identify the correct three-phase topology by utilizing power flow equation and the conditional independence relationships implied by the radial three-phase structure of distribution grids. As a highlight, the proposed method is robust to incorrect phase label, which is a big problem in distribution grid operation. This is due to the label-invariant property of mutual information. Another major contribution we have is on phase identification. Specifically, we propose a data-driven approach to identify correct bus phase connection by utilizing Carson's equation [17], which is employed for deriving the primitive phase impedances of different lines. Finally, since phasor measurement units (PMUs) have not been widely deployed, we propose a refined algorithm that only needs the voltage magnitudes for topology estimation.
The performance of the proposed method is verified by simulations on the IEEE 37-bus and 123-bus distribution test cases [18]. We used 3 different PG&E residential household data sets in North California, ADRES project data set [19], [20] that contains 30 houses load profiles in Upper-Austria, and Pecan Street data set, which contains load data of 345 houses with PV panels in Austin, Taxes. Simulations are conducted via GridLAB-D, an open source distribution grid simulator [21] for three-phase systems. Simulation results show that, provided with hourly measurements, the proposed algorithm can perfectly estimate the topology of three-phase distribution grids.
The rest of the paper is organized as follows: Section II introduces the modeling of the three-phase distribution system and the problem of data-driven topology estimation. In Section III, we firstly prove that topology estimation problem of three-phase distribution grid can be solved as a mutual information maximization problem and propose an algorithm to solve such a maximization problem in three-phase setup. Also, we propose a method to identify bus phase connection. In Section IV, to address the unbalance in distribution grids, we transform an unbalanced distribution grid to a symmetric system using sequence component frame and prove that the mutual information approach can still apply to grid topology estimation and phase identification. Section V evaluates the performance of our method using IEEE test cases and real data collected from different regions. Section VI concludes this paper.
II. THREE-PHASE DISTRIBUTION GRID MODELING AND PROBLEM FORMULATION In order to better formulate the topology estimation problem, we need to describe the three-phase distribution network topology and data. A distribution grid is modeled by a graph G = (M, E), where the vertex set M = {0, 1, 2, · · · , M } represents the set of buses and the unidirectional edge set E = {(i, k), i, k ∈ M} represents the branches. The branch between two buses is not necessary to be three-phase. In the distribution grid, bus 0 is the substation with a fixed voltage and is the root of the tree graph. M + denotes the set of buses excluding the substation, i.e., M + = M\{0}. If bus i and bus k are connected, i.e., (i, k) ∈ E, and bus i is closer to the root (substation) than bus k, bus i is the parent of bus k and bus k is the child of i. We use pa(i) to denote the parent bus of bus i. The root has no parent and all other buses in M + have exactly one parent. We use C(i) to denote the set of child buses of bus i and use S(i) = C(pa(i))\{i} to denote the set of sibling buses of bus i.
Let a, b, and c denote the three phases of the distribution grid. The vector V abc denote the vectors of current injections and injected complex powers at bus i, respectively. If a bus is only connected with one or two phases, the quantities of the missing phase are zeros. For example, if bus i does not have phase c, V c i = 0, I c i = 0, and S c i = 0. For convenience, we use V i , I i , and S i as the general notation of multi-phase quantities at bus i.
If bus i and bus k are connected, i.e., (i, k) ∈ E, the relationship between their nodal voltages and current injections can be expressed as follows [22], [23]: where Y ik ∈ C 3×3 denotes the admittance submatrix between bus i and bus k and B i,shunt ∈ C 3×3 denotes the shunt capacitance at bus i. In a three-phase system, Y ik is not diagonal. The voltages at different phases are coupled. As we will show in Section V, this coupling property in the threephase systems leads the existing steady-state methods to have poor performance in the unbalanced three-phase systems. Since B i,shunt is relatively small in distribution grids [24], we assume B i,shunt = 0. In the formulation above, the effect of the neural wire is merged into the three-phase wires by applying Kron's reduction [23]. If bus i and bus k are not connected, Y ik = 0.
Here, we do not need to assume the line to be transposed.
denotes the complex voltage measurement on phase φ at time n and j = √ −1. The magnitude |v φ i [n]| ∈ R is in volt and the phase angle θ φ i [n] ∈ R is in degree. All measurements are assumed to be noiseless at first. In Section V, we will validate the proposed algorithm with noisy measurements.
With the modeling above, the multiphase distribution grid topology estimation and bus phase identification problem is defined as • Problem: data-driven multiphase distribution grid topology and bus phase estimation using voltage measurements • Given: the time-series voltage measurements with unknown bus phase labels v i [n], n = 1, · · · , N, i ∈ M + • Find: the unknown grid topology E and bus phase φ.

III. THREE-PHASE DISTRIBUTION GRID TOPOLOGY ESTIMATION AND BUS PHASE IDENTIFICATION
In today's distribution grid, the smart meters and micro phase measurement unit (µPMU) have been installed and provide highly accurate historical data about three-phase system operation. However, a practical issue that exists in many distribution grids, especially the low-voltage distribution grids, is that the smart meter phase connectivity information is inaccurate. In some countries, about 10% phase labels in low-voltage distribution grids are incorrect. Also, the phase connectivity can change over time when new customers and DER devices are connected to the grid [25]. As the correct phase connectivity information is critical to distribution grid planning, we want to estimate grid topology and identify phase connection at the same time. In this section, we will firstly extend our previous work [16] to estimate the topology of three-phase system with incorrect phase labels. Then, we will propose a method to identify phase connectivity by utilizing the statistical relationship between voltage measurements. The process of estimating topology and bus phase is summarized in Fig. 1.
The end-user measurements are time-series data. One way to represent these data is using a probability distribution. If we model the nodal three-phase voltage vector V i as a random vector, the joint distribution of voltage measurement P (V M + )  is P (V 1 )P (V 2 |V 1 ) · · · P (V M |V 1 , · · · , V M−1 ). Bus 0 is omitted because it is the slack bus with a fixed voltage.
In many previous works of distribution grid topology estimation [7], [9], [10], [16], only the steady-state voltages are required. However, with the presence of incorrect or unknown phase labels, we need to utilize all three phases voltage measurements for topology estimation. In the later part of this section, we will show that our method is invariant to phase labels and therefore, can estimate topology with incorrect or unknown phase labels.
In many medium-and low-voltage distribution grids, the distribution of voltage is irregular. To better formulate the topology estimation problem, we use the incremental change of measurements in this paper [26]. At bus i, the incremental change of voltage is ∆v i for n ≥ 2. When n = 1, ∆v i [1] = 0. By using the incremental change ∆V, the joint probability becomes P (∆V M + ) = P (∆V 1 )P (∆V 2 |∆V 1 ) · · · ×P (∆V M |∆V 1 , · · · , ∆V M−1 ).
Since the nodal voltages are modeled as random vectors, the graph G becomes a probabilistic graphical model with a tree structure. In a graphical model, the vertex represents a random vector (e.g., ∆V i ) and the edge between two vertices indicates the statistical dependency between bus voltages. Therefore, estimating distribution grid topology is equivalent to recovering the radial structure of the graphical model G.
In the steady-state distribution grid, the nodal voltages only have statistical dependency with the nodal voltages of their parent bus [16]. Therefore, we extend such dependency in the steady-state system to approximate the three-phase system joint probability P (∆V M + ) as If (1) holds, finding the structure of G is equivalent to finding the parent of each bus for the same phase. Before proving the approximation in (1) holds with equality, we will make two assumptions and justify these assumptions using real data.

Assumption 1. In the three-phase distribution gird,
• the incremental change of the current injection ∆I at each non-slack bus is independent, i.e., ∆I i ⊥ ∆I k for all i = k. • the incremental changes of the current injection ∆I and bus voltage ∆V at each bus follow Gaussian distribution with zero means and non-zero covariances. Fig. 2 shows the pairwise mutual information of the incremental changes of bus current injection using the real data from PG&E. The mutual information I(X, Y) is a measure of the statistical dependence between two random vectors X and Y. When the mutual information is zero, these two random vectors are independent, i.e., X ⊥ Y [27]. As shown in Fig. 2, most pairs of ∆I have small values. Therefore, we can assume that the current injections are independent with some approximation errors. This assumption has also been adopted in other works, such as [7] and [9]. To further validate the independence of ∆I, we plot the average auto-correlation of current injection increment of PG&E data in IEEE 123-bus system in Fig. 3. The error bar is one standard deviation. We can observe that the auto-correlation of ∆I drops significantly as the lag increases. This observation justifies that the current injection increment is approximately independent over time.  In the existing distribution grid topology estimation works, both injected power increment independence and injected current increment independence are used. In [28], the authors use the real data to show that these two assumptions are equivalent in distribution grids. In Fig. 2, we illustrate that the mutual information of pairwise power injection increment ∆S and that of pairwise current injection increment ∆I. Both histograms are similar. In this paper, we prefer the assumption of current injection independence because it simplifies the proof of following theorems and lemmas.
With Assumption 1, we will use a two-stage proof to show that P ( In the first stage, we will prove that P (  Fig. 4. An example of an 8-bus three-phase system. A node represents a bus, which can be single-phase or multi-phase. An edge represents a branch between two buses. The branch is unnecessary to be three-phase. Bus 0 is the substation (root).

Lemma 1.
If the incremental change of current injection at each bus is approximately independent (i.e., ∆I i ⊥ ∆I k for i = k), given the incremental voltage changes of bus i's parent (∆V pa(i) ), grandparent (∆V pa(pa(i)) ), and siblings (∆V S(i) ), the incremental voltage changes of bus i and the buses that are not below bus i are conditionally independent, i.e., We will use a simple example to demonstrate Lemma 1. A formal proof is given in Appendix section VII-A. For the example system in Fig. 4, the nodal admittance equation is there is no branch between bus i and k.
For a non-leaf bus, bus 2, given ∆V 3 = ∆v 3 and ∆V 1 = ∆v 1 , we have the following equations: (8) Given ∆I 1 and ∆I 6 are independent, according to (6) and (7), ∆V 2 and ∆V 6 are conditionally independent. Similarly, ∆V 2 and ∆V 6 are conditionally independent given ∆I 1 and ∆I 7 are independent. Our conclusion in Lemma 1 is similar to the results in [29].
Assumption 2. In a distribution grid, the mutual information between ∆V i and its parent ∆V pa(i) is much larger than the mutual information between ∆V i and ∆V pa(pa(i)) and ∆V S(i) . Assumption 2 is inspired by the observation from real data. In Fig. 5, we plot the mutual information of voltage increments between each bus pair in IEEE 123-bus distribution system using the real load data from PG&E. The distribution grid configuration and simulation setup are described in Section V. In Fig. 5, the color in a square represents the mutual information of voltage increments of two buses. As discussed in [27], if the voltage increments of two buses are independent, their mutual information is zero (dark color). In Fig. 5, the circle refers to the bus neighbors (e.g. parent bus) and the crossing indicates the two-step neighbors (grandparent bus and sibling buses). If a square without any marker, it means the pair of buses is more than two-step away. As illustrated in Fig. 5, the mutual information between the voltages of two-step neighbors is higher than the mutual information of other bus pairs, but it is still much lower than the mutual information between two neighbors. The diagonal bus pairs have the highest mutual information because it is the selfinformation. With Assumption 2, we can simplify P (∆V M+ ) to depend on the voltages of parent buses. In Section V, we use numerical simulation to demonstrate that the approximation does not degrade the performance of topology and bus phase connectivity estimation.

Lemma 2. Given the incremental voltage changes of bus i in a multi-phase distribution grid, if the incremental change of current injection at each bus is approximately independent
, the incremental voltage changes of every pair of bus i's children are conditionally independent, i.e., ∆V k ⊥ ∆V l |∆V i for k, l ∈ C(i) and k = l.
With Lemma 2, (1) holds with equality, i.e., P ( . Therefore, finding the distribution grid topology is equivalent to finding the parent of each bus. In next subsection, we will propose an information theoretical approach to estimate three-phase distribution grid topology with incorrect bus phase labels.

A. An Information Theoretical Approach to Estimate Threephase Distribution Grid Topology
In order to find the parent of each bus, we can minimize the Kullback-Leibler divergence [27] of P (∆V M + ) and where Θ denotes the collection of parent bus index of every bus, i.e., Θ = {pa(1), · · · , pa(M )}, P denotes the joint distribution of all voltages, and Q denotes the distribution of voltage vectors with tree structure. When two distributions are identical, the KL divergence is zero. Therefore, as shown in Lemma 2, if we find a distribution Q(∆V M + ; Θ) that is identical to P (∆V M + ), Θ contains the parent bus index of every bus i. The associated structure of P CL (∆V M + ) = Q(∆V M + ; Θ) is the estimated topology of distribution grid. In Lemma 3, we will prove that (9) can be efficiently solved by utilizing the radial structure of distribution grids. In the following context, Θ i and pa(i) are used interchangeably.
Lemma 3. In a radial distribution grid, finding the topology is equivalent to solving the following optimization problem: where I (∆V i ; ∆V Θ i ) denotes the mutual information.
The proof is in Appendix section VII-B. With such a lemma, we can use a mutual information-based maximum weight spanning tree algorithm, well-known as Chow-Liu algorithm [30], to find Θ and identify the three-phase distribution grid topology. This algorithm has been applied to single-phase system in [16]. In Theorem 1, we will prove that Chow-Liu algorithm can be applied to three-phase systems.
Theorem 1. In a radial three-phase distribution grid, the mutual information-based maximum weight spanning tree algorithm (Chow-Liu algorithm) estimates the best-fitted topology.
Proof: In this proof, we will show that the mutual information between connected buses is higher than those without a connection. If bus i is the parent of bus k and bus l and k = l, by utilizing the chain rule property of the mutual information [27], the joint mutual information is expressed as . Due to the fact that mutual information is always non-negative, I(∆V k ; ∆V i ) ≥ I(∆V k ; ∆V l ). Therefore, the mutual information between connected buses is larger than the mutual information between not connected buses. Then, by using the mutual information as the weight, the maximum weight spanning tree algorithm (Chow-Liu algorithm) solves (10) and estimates the distribution grid topology [16], [30].
To compute mutual information I(∆V i ; ∆V k ), we can avoid integration by utilizing entropy, i.e., where H(∆V i ) denotes the entropy of ∆V i and H(∆V i ; ∆V k ) denotes the cross-entropy of ∆V i and ∆V k . An advantage using (11) is that many distributions have a closed form of entropy. In Assumption 1, the incremental changes of voltages in distribution grid are assumed to follow Gaussian distribution approximately. Thus, if we assume ∆V i follows a multivariate Gaussian distribution, its entropy is where r denotes the dimension of the random vector ∆V i and Cov denotes the covariance. In some systems, the bus may not have all three phases. In this case, we need to exclude the disconnected phases in the computation of entropy. As mentioned previously, a practical issue of using smart meter is that the phase connectivity information is usually inaccurate or missing. In some countries, about 10% phase labels in low-voltage distribution grids are incorrect or unknown. Also, the phase connectivity can change over time when new customers and DER devices are connected to the grid [25]. One may apply existing methods [25], [31], [32] to identify phase connectivity before estimating topology. Fortunately, the proposed topology estimation method does not require this preprocessing and is invariant to incorrect phase labels. Specifically, when voltage phase is incorrectly labeled, the elements in the random variables in ∆V i are permuted. This permutation does not affect the computation of det Cov(∆V i ), thus, does not change the values of H(∆V i ) and H(∆V i ; ∆V k ). Therefore, the mutual information I(∆V i ; ∆V k ) is the same even the bus labels are incorrect. In Section V-C, we use numerical examples to show that our algorithm can recover the topology perfectly with the presence incorrect phase labels. The proposed algorithm for three-phase distribution grid topology estimation is summarized in Algorithm 1. To efficiently build the maximum weight spanning tree (Steps 6 -16), we can use the well-known Kruskal's minimum weight spanning tree algorithm [33], [34], which has a running time of O(M log M ) for a radial distribution network with M buses.

B. Bus Phase Identification
In the previous section, we have proved that even with incorrect phase labels, we can still estimate the three-phase distribution grid topology. In many field applications, accurate grid topology is not sufficient. The correct information of bus phase is also critical in grid planning and operation. In this subsection, we will propose a data-driven method to identify bus phase information.  Proof: Using the modified Carson's equation [17], the self impedance z aa and mutual impedances z ab and z ac of a three-phase power line can be computed as follows: z aa = r ik + 0.095 + j0.121 × H a ik Ω/miles, z ab = 0.095 + j0.121 × H ab ik Ω/miles, z ac = 0.095 + j0.121 × H ac ik Ω/miles, where H a ik , H ab ik , and H ac ik are constants, and r a ik is resistance of branch i − k in Ω/miles. In a distribution grid, the resistance is usually larger than reactance. Therefore, we can approximate that z aa ≃ r a ik + 0.095 and z ab ≃ z ac ≃ 0.095. For bus i and bus k, the voltages and currents can be expressed as  where Z mn = z mn × l and l is the line length. The equations above can be simplified to The phase voltages at the two ends of a branch are in a linear relationship. Therefore, their correlation is the largest. Table. I shows the corrections among bus 64, 65, and 66 in IEEE 123-bus system. Bus 64 and 65 are connected on phase b. Phase 65 and 66 are connected on phase c. We can observe that the correlation between bus 64 and 65 on phase b is much larger than other pairs. Similar observation holds for bus 65 and 66. Therefore, to identify bus phases in a distribution grid, we can apply the correlation check from the substation of the radial network down to the leaf buses. The reason is that we usually have reliable information about the bus labels of substation. Then, following the path of the estimated grid topology, we can find the phase of each bus. Note that, the metering device installed at each bus can provide the number of phases at each bus. Therefore, we can apply the method to all types of bus.

C. Topology Estimation using Voltage Magnitude Only
In distribution grids, voltage angles θ are hard to acquire because PMUs are not widely available. However, we can still find the distribution grid topology when only the voltage magnitudes |∆V| are available. Using chain rule, the mutual information I(∆V i ; ∆V k ) can be decomposed as In Fig. 7, we empirically plot the pairwise mutual information of term A, B, C using the IEEE 123-bus system and the real data from PG&E. We can see that the values of term A is much larger than term B and term C. The reason is that the changes of voltage angles are relatively small in distribution grids and thus, contain less information than voltage magnitudes. Therefore, based on our empirical observation, we can use I(|∆V i |; |∆V k |) to estimate distribution grid topology. Many smart meters deployed can measure the voltages of all three phases. Therefore, we can still apply the proposed algorithm when only smart meter measurements are available.

IV. UNBALANCED THREE-PHASE DISTRIBUTION GRID TOPOLOGY ESTIMATION WITH INCORRECT PHASE LABELS
The results in the previous section illustrate the topology estimation for balanced three-phase systems with incorrect phase labels. However, it is not directly expendable to unbalanced three-phase systems. As shown in Fig. 6, the voltages and currents are coupled cross different phases. Also, the unbalanced loads on each phase lead to the voltages angles are not separated by 2π/3. To address these issues, we transform the grid using the sequence component framework. As a highlight, we do not require to know the correct phase labels. The voltage phasor is decomposed into three balanced phasors known as the positive sequence, the negative sequence, and the zero sequence. The three-phase voltage ∆V i in phase frame can be decomposed as follows: where h = exp (j2π/3) and h 2 = exp (−j2π/3), V p a , V n a , V z a denote the positive-sequence, negative-sequence, and zerosequence voltage on phase a. ∆V pnz a is called the sequence voltage of phase a. Since each sequence component system is balanced, the sequence component voltages of phase b and phase c are the phase shifts of voltage on phase a, ∆V pnz a . Thus, we do not need to compute them. In the following text, we will use ∆V pnz i to denote the sequence voltage vector of bus i on phase a. Since the smart meters can provide multi-phase measurements, we usually know the three-phase voltage measurements. Therefore, the sequence voltages can be computed as follows: where the operator H denotes the Hermitian transpose. The same transformation can also be applied to the three-phase current phasors and admittance matrix, i.e., ∆I pnz We want to highlight that the transformation above is applied to the three-phase voltage phasors, current phasors, and admittance matrix at a particular bus, not the entire system. Therefore, if two buses are not connected, e.g., Y ik = 0, Y pnz ik = 0. Therefore, finding topology in a phase frame is equivalent to finding topology in a sequence component frame. To apply the mutual information-based maximum weight spanning tree algorithm (Chow-Liu algorithm), we want to prove that the joint probability P (∆V pnz M + ) can be factorized as M i=1 P (∆V pnz i |∆V pnz pa(i) ). Lemma 5 (Data Processing Inequality [27]). If random vectors X, Y, Z forms a Markov Chain, i.e., X → Y → Z, I(X; Y) ≥ I(X; Z). Also, for the function of Y, g(Y), I(X; Y) ≥ I(X; g(Y)). Lemma 6. Consider a multi-phase distribution grid and assume that the current injection increment at each bus is approximately independent, e.g., ∆I i ⊥ ∆I k for i = k. Proof: Firstly, we want to prove that when the current injection increment at each bus is independent in phase frame, this independence also holds in sequence component frame. There are multiple ways to prove it. Here, we will use an information theoretical approach.
When two random vectors are independent, their mutual information is zero [27], e.g., ; ∆I pnz k ) = 0 due to the non-negativity of mutual information. Therefore, we prove that if the current injections are independent in phase frame, they are also independent in sequence component frame.
Next, we want to show that the conditional independence of nodal voltages holds in sequence component frame. We will use the example in Fig. 4 again to illustrate it. In sequence component frame, the nodal equation of the system in Fig. 4 is Y pnz there is no branch between bus i and k. This equation is in the same format as (III). Since we have proved that ∆I pnz i ⊥ ∆I pnz k for all i = k, we can use the same method in the proof of Lemma 1 and Lemma 2 to show the conditional independence of nodal voltages in sequence component frame.
With Lemma 6, we prove that the conditional independence holds in the sequence component frame as well, e.g., Since the sequence component system is a balanced three-phase system, we can apply the Chow-Liu algorithm to estimate topology in the sequence component frame.

Theorem 2.
In an unbalanced radial distribution grid, the topology can be estimated by solving the following problem .

Also, the mutual information-based maximum weight spanning tree algorithm (Chow-Liu algorithm) solves the problem above.
The proof of Theorem 2 is omitted here because it is similar to the proofs of Lemma 3 and Theorem 1. The topology estimation algorithm for unbalanced three-phase distribution grid is summarized in Algorithm 2. The grid topology is identical in phase frame and in sequence component frame. Therefore, once the unbalanced grid topology is estimated, we can apply Lemma 4 to identify the phases of all buses. The simulations are implemented on the IEEE PES distribution networks for IEEE 37-bus and 123-bus (Fig. 8) networks [18]. Both networks are three-phase systems. In each network, the feeder or substation is selected as the slack bus (bus 0). The historical data have been preprocessed by the GridLAB-D [21], an open source simulator for distribution grid. To simulate the power system behavior in a practical pattern, we use the load profile from PG&E in the subsequent simulation. This profile contains anonymized and secure hourly smart meter readings over 110, 000 PG&E residential customers for a period of one year spanning from 2011 to 2012. Since both 37-bus and 123bus systems are primary distribution grids, the real power at each bus is an aggregation of 10 − 100 customers. The load buses in both systems are unnecessary to be three-phase. The details of bus and branch phases are given in [18].

V. SIMULATIONS AND NUMERICAL RESULTS
The PG&E data set does not contain the reactive power. The reactive power q φ i [n] on phase φ of bus i at time n is computed according to a random lagging power factor pf φ i [n], which follows a uniform distribution, e.g., pf φ i [n] ∼ Unif(0.8, 0.95). To obtain voltage time-series, i.e., v abc i [n], we run a power flow to generate the hourly states of the power system over a year. N = 8760 measurements are obtained at each bus. The loads attached to each phase are unequal. Hence, the systems are unbalanced. Fig. 9 and Fig. 10 show the hourly aggregated real powers on each phase in 37-bus and 123-bus systems. Although each phase has the similar pattern over time, the magnitudes of real powers are different on each phase. Therefore, the testing systems are unbalanced.

A. Distribution Grid Topology Estimation Error Rate
In this section, we discuss the performance on grid topology estimation. We use the error rate (ER) as the performance evaluation metric, which is defined as where E denotes the edge set estimates, |E| is the size of E, and I (.) is the indicator function. The first and second terms represent the number of falsely estimated branches and the number of missing branches, respectively. Table II summarizes the topology estimation error rates of unbalanced three-phase 37-and 123-bus systems. When phase angle data are available, our algorithm perfectly estimates the grid topology. Today, the PMUs have not been widely deployed. When only voltage magnitudes are available, our algorithm can still estimate the grid topology perfectly.
We also compare the proposed algorithm with a modified steady-state topology estimation in [16]. Specifically, we apply the steady-state topology estimator to each phase individually. Then, we combine the single-phase estimates to produce the three-phase system topology. As shown in Table II, the modified steady-state method has worse performance than the proposed algorithm. The key reason is that the modified steadystate method does not consider the voltage coupling across phases.

B. Distribution Grids with DER Integration
The penetration of DERs has grown significantly during last decade and will keep increasing in the future. As discussed earlier, the high penetration of DER will lead to a deeply unbalanced distribution grid. To evaluate the proposed algorithm with integrated DERs, we install several rooftop photovoltaic (PV) systems in the distribution networks. The profile of hourly power generation is obtained from NREL PVWatts Calculator, an online simulator that estimates the PV power generation based on weather history of PG&E service zone and the physical parameters of a 5kW PV panel in residential levels [35]. The power factor is fixed as 0.90 lagging, which satisfies the regulation of many U.S. utilities [36] and IEEE standard [37]. Totally, 20% of residents are installed with PV panels. The error rates of grid topology estimation with the rooftop PVs integration are presented in Table III. Our algorithm does not have any performance degradation with DER integration. Also, the modified steady-state method still performs worse than the proposed method. Compared with the systems without DER, the modified steady-state method has performance degradation.
In order to further validate the proposed algorithm, we progressively increase the DER penetration level from 0% to 100%. In Fig. 11, we can observe that, besides 60% penetration of DERs, the error rates do not change with the growth of DER installation rate, which highlights the reliability of the proposed algorithm.

C. Distribution Grids with Incorrect Phase Labels
As we discussed in the previous section, in some distribution grids, up to 10% of the phase labels are incorrect or unknown. Therefore, in this section, we validate our algorithm on the 123-bus system with incorrect phase labels. To simulate the incorrect phase labels, we randomly pick up several buses and switch their phase a voltage measurements to either phase b voltage data or phase c voltage data. Table IV shows the error rates with different percentages of incorrect phase labels using voltage magnitude only. We observe that our algorithm is insensitive to incorrect bus phase label. As we discussed previously, the incorrect phase labels is a permutation of random variables in |V pnz i | and do not affect I(|V pnz i |; |V pnz k |). If we use the steady-state approach, the error rate increases significantly because the mutual information is computed for incorrect bus pairs. For the 123-bus system, the error rate is 11.7% when 10% buses have incorrect phase labels. D. Sensitivity Analysis 1) Sensitivity to Data Pattern: To understand our algorithm's sensitivity to load pattern, we validate the proposed algorithm on the "ADRES-Concept" Project load profile [19], [20]. This data set contains real and reactive power profiles of 30 houses in Upper-Austria. The data were sampled every second over 7 days in summer and 7 days in winter. We generate the voltage data using the 37-bus system. The load profiles are scaled to match the scale of power in the 37-bus system. The resulting three-phase system is unbalanced. Fig. 12 compares the error rates using summer and winter load profiles. When there is only one measurement, the proposed algorithm has 200% error rate due to poor estimation of mutual information. The error rate is above 100% because all estimated branches are incorrect and none of the correct branch is found. As more measurements become available, the error rate reduces significantly. Also, our algorithm has consistent performance in winter and summer. Compared with the results in [16], the proposed algorithm perfectly estimates the grid topology with shorten time because more information is observed at each time step.
We also validate our algorithm using data set from Pecan Street which contains the hourly load measurements of 345 houses with PV integration in Austin, Taxes. The measurements include both power consumption and renewable generation. In Fig. 13, our algorithm requires 16 hours' measurements to recover the topology of the 37-bus system, which is similar to the ADRES data set. This highlights the robustness of our algorithm.
2) Sensitivity to Data Accuracy: Our algorithm uses the smart meter data to estimate grid topology. In particles, the smart meter measurements are noisy. Thus, it is important to validate our algorithm under different levels of measurement noises. In the U.S., ANSI C12.20 standard (Class 0.5) requires  the smart meters to have an error less than ±0.5% [38], [39]. Table. V shows the error rates with different noise levels over 20 iterations in the 123-bus system. Compared with the estimation results using perfect measurements, the error rates grow up as the increase in noise levels. These newly introduced errors are around the feeders. For example, bus 251 and 451 are incorrectly connected with the presence of noise. In real systems, we usually know the feeder buses. Therefore, we can post-process the topology estimate and remove these unnecessary branches from topology estimate. After performing post-processing, we can reduce the error rate to 1%. In Table. V, the standard deviation of error rate is very small and therefore, our algorithm can provide reliable and consistent results with noisy measurements.

3) Sensitivity to Data Length:
To explore the effect of data set size, we validate the proposed algorithm by using measurements from 1 up to 360 days. Fig. 14 illustrates the error rates of the 123-bus system, with and without DER, over different lengths of data. We can observe that with 20 days' measurements (24 × 20 = 480 data points), our algorithm can achieve zero error. This result is better than the single-phase system presented in [16], which requires 30 days' observations. The reason is that at time n, our proposed algorithm uses measurements from three phases, which contain more information than the single-phase system.  Fig. 15 illustrates the performance of the proposed algorithm under different sampling frequencies using the ADRES data set. When the sampling period is 1 minute, we need about 6 hours' voltage profile to perfectly recover the system. The frequency of distribution grid reconfiguration is range from hours to weeks [40], [41]. Therefore, the proposed algorithm is suitable for the existing system and real-time operation. If the sampling period is 30 minutes, 35 data points (35 × 30 minutes = 17.5 hours) to recover the system topology. This estimation time is only half of the required time in [16]. VI. CONCLUSIONS In order to estimate the three-phase distribution grid topology, we propose a data-driven approach based on newly available smart meter data. Unlike existing approaches, our method does not require the system to be balanced. We formulate the topology estimation problem as a joint distribution (voltage phasors) approximation problem under the probabilistic graphical model framework. Then, we prove that the distribution grid topology estimation is equivalent to the graphical model estimation problem and propose a mutual information-based maximum weight spanning tree algorithm, which is optimal and efficient. Moreover, we extend our algorithm to the case where only voltage magnitude is available. Finally, we simulate the proposed algorithm on IEEE 37-and 123-bus systems and compare with the existing steady-state method. Results show that the proposed algorithm outperforms the steady-state method and has robust performances when measurement phase labels are incorrect. We also validate our algorithms under different penetration levels of DERs and conduct the sensitivity analysis. The numerical results are highly accurate and robust in various system configurations. We will use several cases illustrated in Fig. 16 to prove Lemma 1. We will start with the leaf node. In Fig. 16(a), for bus 4, given ∆V 2 = ∆v 2 , ∆V S(4) = ∆v S(4) , and ∆V 1 = ∆v 1 , we have the following equations: ∆I 4 = Y 42 ∆v 2 + Y 44 ∆V 4 , ∆I k = Y 1k ∆v 1 + Y kk ∆V k ∀k ∈ S(2).
Given ∆I 4 and ∆I k + Y 3k (Y 13 ) −1 (∆I 3 + k∈C(3) ∆I k ) are independent and ∆v 1 is a constant, ∆V 4 and ∆V k are conditionally independent for k ∈ C(3). When there are more child buses ∆V C(3) , we can apply the same induction method above to prove the conditional independence. Thus, the proof of Fig. 16(b) can be generalized to prove the conditional independence of a leaf bus and all other buses that are under the same grandparent bus. Next, we will prove the lemma for non-leaf buses. In Fig. 16(c), for bus 4, given ∆V 2 = ∆v 2 , ∆V S(4) = ∆v S(4) , and ∆V 1 = ∆v 1 , where l ∈ S(2) and m ∈ C(l). By combing (19) and (20), we have For every l in S(2), we can combine (21) and (22) and have the following equation: We can apply the strategy in Fig. 16(b) to (23) and (24) and prove that ∆V 4 and ∆V l are conditionally independent. Also, we can prove that ∆V 4 and ∆V m are conditionally independent for m ∈ C(l) by combining (22) and (24). The results in Fig. 16(c) can be generalized to all non-leaf buses. Using the results in Fig. 16, we can prove that Lemma 1 holds.

B. Proof of Theorem 3
Proof: Recall the definition [27], KL divergence is expressed as where P (∆V 1 |∆V 0 ) = P (∆V 1 ) due to the fact that ∆V 0 is a constant. By following the definition of conditional probability and adding P (∆V i ) into the denominator, we can simplify the equation above as D(P (∆V M + ) Q(∆V M + ; Θ)) The last equality is due to the definitions of entropy, i.e., H(∆V i ) = − P (∆V i ) log P (∆V i ), and mutual information, i.e., I (∆V i ; ∆V Θ i ) Thus, to minimize the KL-divergence between P (∆V M + ) and Q(∆V M + ; Θ), we can choose the M − 1 edges to maximize M i=1 I (P (∆V i ); P (∆V Θ i )). The entropy term M i=1 H(∆V i ) − H(∆V M + ) is irrelevant with the topology structure of distribution grid and we do not include it in the final optimization problem. Therefore, minimizing D(P (∆V M + ) Q(∆V M + ; Θ)) is equivalent to solving the following optimization problem: