Statistical Analysis of Binary Stars from the Gaia Catalog Data Release 2

We have developed a general statistical procedure for analysis of 2D and 3D finite patterns, which is applied to the data from recently released Gaia-ESA catalog DR2. The 2D analysis clearly confirms our former results on the presence of binaries in the former DR1 catalog. Our main objective is the statistical 3D analysis of DR2. For this, it is essential that the DR2 catalog includes parallaxes and data on the proper motion. The analysis allows us to determine for each pair of stars the probability that it is the binary star. This probability is represented by the function depending on the separation. Furthermore, a combined analysis of the separation with proper motion provides a clear picture of binaries with two components of the motion: parallel and orbital. The result of this analysis is an estimate of the average orbital period and mass of the binary system. The catalog we have created involves 80,560 binary candidates.


Introduction
In this paper we analyze the recent data from the new catalog DR2 Gaia Collaboration et al. (2018) obtained by the Gaia-ESA mission. If compared with the previous DR1 catalog Gaia Collaboration et al. (2016aCollaboration et al. ( , 2016b, the DR2 contains the cleaner data complemented with parallaxes and data on the 2D proper motion. Parallaxes allow us to determine the distance of the stars, so we can substantially enlarge our former DR1 analysis (Zavada & Píška 2018) and work with the 3D patterns of moving stars.
In the present study, we focus on the statistical analysis of the presence of binaries. This topic is related to the recent studies of various aspects of binaries with the use of the catalogs DR1 (Oelkers et al. 2017;Oh et al. 2017) and DR2 (Ziegler et al. 2018;Jiménez-Esteban et al. 2019). These authors, apart from their own results, present an up-to-date overview of important findings on binary stars. Other important papers exploring wide binaries can be cited from the era before Gaia, for example, Caballero (2009) and Close et al. (1990). However, our approach and objectives are rather different, so the results obtained are complementary.
Methodology for 2D analysis has been described in detail in our above-quoted paper. In Section 2 we repeat its essence and perform generalization for the 3D case. For 2D analysis in Section 3 we take the same region in the DR2 catalog as we used in the DR1, so we can compare results from both corresponding data sets.
Principal results are obtained from the 3D analysis of a sample of DR2 data and are presented in Section 4. This part deals with two issues: the analysis of 3D separations and the analysis of proper motion of pairs of sources. The combination of both insights provides essential information about the statistical set of binaries. Obtained results are discussed in Section 5. Here we define the probabilistic function β(Δ), which is important for discussion on the occurrence of binaries. Our present catalog of binary candidates is described in Section 6, where we also shortly discuss its content and overlap with the catalog JEC-Jiménez- Esteban et al. (2019).
A brief summary of the paper is presented in Section 7. The Appendix is devoted to the derivation of some relations important for our statistical approach. The most important are distributions of separations of random sources uniformly distributed inside circles or spheres of unit diameter. The significant role of these functions for our approach is explained in Section 2.

Methodology
The methods are designed for analysis of the distribution of stars inside circles or spheres covering the chosen region of the sky, as sketched in Figure 1. These 2D and 3D star patterns are called events. Input data for the generation of the event grids are supposed in the galactic reference frame. So, the position L of a source is defined by spherical coordinates L, l, and b (distance from the Sun, galactic longitude and latitude): In the center of circles or spheres we define a local orthonormal frame defined by the basis: cos cos , cos sin , sin , sin , cos , 0 , sin cos , sin sin , cos , Vector k r has radial direction, perpendicular vectors k b and k l lies in the transverse plain.

Definition of Events
The 2D event of the multiplicity M is a set of stars with angular positions n i inside a circle with the event center n 0 and a small angular radius ρ 2 : With the use of event local basis (2), the local coordinates are defined as We define the 2D event as the set: Since the DR2 catalog involves also data on parallaxes, we can similarly also generate the 3D events-patterns of the M sources with position L i inside the spheres with the center L 0 and radius ρ 3 : With the use of the star positions (1) and local basis (2) we define local coordinates where p i is parallax, and L i is distanceof the star. The 3D event is defined as the set:

2D Methods
The first method is based on the Fourier analysis of 2D events, where we have introduced characteristic functions Θ n (M) depending on the event multiplicity M. These functions are generated by a set of events and measure statistical deviations from uniform distribution of stars ( ( ) ) Q = M 1 n , for instance a tendency to clustering ( ( ) Details of the method are described in Zavada & Píška (2018), in this paper we will present only the result.
With the use of a second complementary method, we analyze distributions of angular separations of sources inside the 2D events (5). Distribution is generated from the set of events. We use either absolute separations The suitable unit of the parameters x i , y i , ρ 2 will be for our purpose 1 as (1″). Distribution of scaled separations generated by Monte Carlo (MC) for a uniform distribution of stars in the sky is shown in Figure 3. The exact shape of normalized MC distributions reads  (ˆ) where the functions EllipticK (EllipticE) are complete elliptic integrals of the first (second) kind. The proof is given in the Appendix. These distributions do not depend on the event multiplicity and radius, this is an advantage of the scaled separations. Obviously, we have alwaysx < < 0 1 . These exact functions replace their approximations resulting from the MC calculation applied in the previous paper.

3D Methods
Similarly, as in the 2D case, we shall work with absolute separations and/or with the scaled oneŝˆˆ( The suitable unit of the parameters X i , Y i , Z i , ρ 3 is for our purpose 1 pc. Distribution of scaled separations generated by MC from the uniform distribution of stars in the 3D region of the sky is shown in Figure 4. Exact shapes of these normalized MC distributions read as proved in the Appendix. Shapes of these distributions similarly to (12) and (13) do not depend on event multiplicity and radius. The analysis with the use of characteristic functions Θ n (M) could be in the 3D case done separately in the plains XY, YZ, and ZX. However, such analysis is not the aim of the present paper.

Aims
In Section 3 using the DR2 data set we shall obtain the characteristic functions Θ n (M), afterwards we check distributions (12) and (13). The distributions (17)-(19) will be used for the data analysis in Section 4.1. All these distributions are of key importance for the analysis of real data. They represent the templates, which can reveal a violation of uniformity in the star distributions. Binary (and multiple) star systems are an example of such a violation, which manifests as the peaks in the distributions of angular or space separations in the region of close sources. In general, the scale of expected structure violating uniformity should be less than the event radius ρ 2 or ρ 3 .

Analysis of 2D Events
Here we present the results obtained from regions of the DR2 catalog listed in Table 1. The regions are shown in Figure 5. The corresponding events are created with the same angular radius as in Zavada & Píška (2018), which allows us  the consistent comparison of results from the DR1 and DR2 catalogs. First, we checked the events covering the regions N and S. Their nonuniformity defined by the characteristic functions Θ n (M) is demonstrated in Figure 6. The clear result Θ n (M)>1 indicates the presence of clustering.
Corresponding distributions of angular separations are shown in Figure 7 together with curves (12) and (13). These results can be compared with those in Figures 7 (lower panels), 10, and 11 in the former paper. We observe: (i) The peaks at small angular separations in the DR2 corresponding to binaries are clearer and more pronounced than in the DR1 catalog. Panels (e) and (k) in Figure 7 demonstrate the double stars separated by  d 0.5 as ij are absent because such close pairs are not resolved in the DR2 data set as reported in Arenou et al. (2018). In both catalogs, we observe an excess of binaries in the region N and S forˆ d 0.06 ij or equivalently for d ij 8.6 as. For greater separations inside the event, we observe perfectly uniform distributions of stars. Note the data and curves are equally normalized forx < < 0 1 . That is why the strong peak in panels(g)-(i) is balanced by a small reduction of distribution outside the peak. Brighter stars (G15, panels (g)-(l)) show evidently stronger peaks than the sample without any cut on magnitude (panels (a)-(f)). A similar tendency was already observed in the catalog DR1.
(ii) A more pronounced presence of binaries is demonstrated also in Figure 6. The slopes of lines in DR2 are greater than in DR1-clustering is more obvious. For M8 the slopes are ≈4%(8%) for the DR1(DR2) data. In Figure 8 we have shown Note. Only sources in distance 1-5000 pc are taken into account. The analysis is done for events 2M15.  some results obtained in a more populated region C. Also here we can observe a clear peak at small angular separations of sources of the magnitude G15, which proves the presence of binaries. Panel (b) again demonstrates the absence of double stars separated by (ˆ) ( )  d d 0.5 as 0.014 ij ij due to insufficient resolution. Different scales ofd ij in Figures 7(l) and 8(c) are due to different radii ρ 2 of events from N and S and C regions. Similar plots could be presented for whole spectrum of magnitudes in region C; however, elevation above the red line due to binaries is much less than that in Figure 8(c). The reason can be that denser region C with all magnitudes generates higher background of the optical doubles and a consequently lower relative rate of binaries.

Binaries in 3D Events
We present the results obtained from the 3D region defined in Table 2. The parallax and angular components of the star proper motion are the parameters, which substantially enrich the recent Gaia data. We work with the 3D events (9).  (12) and (13) representing uniform simulation. Panel (f) is the ratio of data to simulation from panel (c). Panels (d) and (e) represent 3D plot of separations x ij , y ij in different scales (unit is 1 as). Panels (g)-(l): the same for sources G15.

Analysis of 3D Separations
The summary results of the analysis obtained from all magnitudes G are shown in Figure 9. Distributions of scaled separations in panels (a)-(c), (g), and (h) perfectly match the uniform distribution of sources, but with the exception of the first bin in (a), (b), and (h). Apparent excess of very close pairs in planes XY, YZ, and ZX is seen also in panels (d)-(f). However, the sharp peak can be observed only in the plane XY (panel (d) and its magnified version (j)). Smearing in the direction of Z (difference of radial positions, panels (c), (e), and (f)) is due to lower accuracy in measuring of parallaxes. The errors of local coordinates depend on the errors of separations and with the use of definitions (7) and (8) are calculated as and similarly That is why we prefer distributions of Δ ij to D ij for obtaining precise results. Maximum values of separations X, Y, and Z is 4 pc, which follows from the event radius ρ 3 =2 pc. The excess of close pairs is obvious most explicitly from the distribution of Δ ij in panel (k) that represents a magnified version of (h). The important ratio of the distribution P(Δ ij ) to the uniform The same distributions but for brighter sources G15 are shown in Figure 10. Similarly, as in the 2D case, the peaks are stronger for brighter sources and distributions outside the peaks confirm uniformity of the star distribution. Again due to equal normalization of data and red curves forx < < 0 1the strong peak in panels(g) and (h) is balanced by a small reduction of distribution beyond the peak. The excess of close pairs observed in both figures again convincingly indicates the presence of binaries.
For quantitative estimates, the important panel is (i), which displays the ratio data/simulation. This is more accurate than only displaying peaks with some undefined background. Panel (i) suggests that separations of binary systems in the analyzed region meet very approximately We observe only a tail of distribution corresponding to more separated binaries. Closer pairs are absent due to the limited angular resolution in DR2 data. This result is compatible with the older data reported in Close et al. (1990). In Section 5 further discussion is devoted to the probability of the binary separation above this limit. We have checked that sampling with events generated by spheres of different radius (ρ 3 =5 pc) does not change the approximate result (22).

Proper Motion of Binaries
The proper motion of the stars in DR2 is defined by two angular velocities cos , 23 * in directions of the R.A. and decl. in the ICRS. So the corresponding transverse 2D velocity U is given as where L is distance of the star calculated from the parallax (8).
For the pair of stars we can define:  Note. Only sources of positive parallax are included. ρ 3 is the radius of events, á ñ á ñ L M , are average distance and event multiplicity, and N e is the total number of events. The analysis is done for events 2M15.
where α ij is the angle between both transverse velocities. The corresponding errors read Note that relative error δv ij /v ij can be large, since v ij is small compared with U ij and the errors dv ij and dU ij are the same. In Figure 11(a) we show the distribution of the velocities U of the stars from the region defined in Table 2. Distribution of the corresponding pair angles α ij is presented in Figures 11(b)-(d) for different regions of Δ ij . In Figure 12(a) we have shown the correlation of pair transverse separations and angles a ij . We observe a very narrow peak in the region of small Δ ij and a ij . 1 The peak is connected with presence of binaries as follows. The transverse velocities of two gravitationally coupled stars are whereV is transverse velocity of their center of gravity andv i ,v j are transverse projections of instantaneous orbital velocities, they have always opposite direction. Dominance of very small α ij means that  so for binaries in our Δwindow (Δ min is given by resolution of two close sources and Δ max =0.1 pcby (22)) we have For comparison, we have generated an MC plot from uniform distributions of positions and velocity directions, which is shown in Figure 12(b). Obviously, uniform distribution contradicts Figure 12(a), which reflects the presence of binaries (peak at small α ij and D ij ). The corresponding α ij peak is also observed in Figures 11(b)-(d) on the background of the collective motion of stars (dominance of α ij <90°). The peak is suppressed for Δ ij 1 pc. Selection of comoving systems is a basis of the methodology applied in the catalog JEC. In Figure 13 we show correlations of the velocities V and v ij with Δ ij in the region of small separations, where the binaries are present. With the use of v ij and Δ ij we try to roughly estimate the orbital period of the binary star. To simplify the calculation, we assume the space binary orbits are circular and the star 3D separation is a (semimajor axis). There are the extreme cases: (A) M 1 ≈M 2 , then where w is the space orbital velocity and a equals the diameter of the orbit.
since the separation a equals to the orbit radius. At the same time, Kepler's law implies for orbital period T g :  This relation also allows us to estimate the period. If the plane of orbit is perpendicular to the line of sight (axis Z), like the orbit N in Figure 14 Then we get: The orbit reference frame in the figure is similar to the local 3D event frame defined by the basis (2) and coordinates (7). But its origin L 0 is defined by the actual position of the center of mass of the binary. How does one deal with the orbits whose plane is not perpendicular to the line of sight like the orbit K in the same figure? The orbits in the figure are defined as follows: cos , sin , 0 : cos cos , sin , cos sin : cos cos , sin , 0 where r=a/2 for the case (A), r=a for (B), and f is azimuthal angle in the plane XY. The orbit K inclined at an angle θ is observed only in its projection P. Corresponding observed separation between the stars is Δ ij : Random angles f, θgenerate distribution of ε 1 shown in Figure 14(b). The MC distribution demonstrates smearing of the real separation a due to random f and slope θ of the orbit. Similarly, the ratio a/w is distorted as Since velocity w is perpendicular to r, there is exchange  f f cos sin 2 2 in denominator. Corresponding distribution of ε 2 is shown in Figure 14 and represent a scale of distortion of real orbital periods, if replaced by relations (37). More accurate estimate of the periods in some region of Δ ij can be obtained by rescaling of these relations: We have estimated the average periods from the maximum in Figure 13. If we take the sources roughly in the region of half width of the maximum,

Discussion
The separation of a pair of stars and the similarity of their movements can serve as two signatures of the binaries. We can compare them: (i) Distributions of the 2D and 3D separations are studied in Sections 3 and 4.1. The procedure is simple, the distribution of separations P(Δ) (within the defined circles or spheres) is compared with the corresponding separations P bg (Δ) generated by uniformly distributed sources representing background where P 0 is given by (12) or (18) and N bg is the corresponding number of the background pairs in the data events. Its accurate calculation is described below, see Equations (47) Consider the distribution ( ) D P from Figures 9(k) and 10(k). The corresponding function ( ) b D is shown in Figure 15(a), where the red curve represent fit of the function: The result of the fit is shown in the first row (all pairs) of Table 3. The longer tail corresponding to the second term describes the small probability of bound pair at greater separations: Δ0.15 pc. This tail is not visible in panels (i) in Figures 9 and 10. Two exponential terms in ( ) b D may correspond to two different classes of binaries. The question is to what extent the excess of wide binaries consists of stable bound systems. Part of the excess may be an image of widening pairs that were less separated but weakly bound in the past. The accuracy of the method is based on three conditions: 1. precise separation measurement in a suitably selected statistical set of events that generates ( ) D P , 2. precise modeling of the background defining P bg (Δ), and 3. relatively high peak and low background giving probability ( ) b D close to 1 in the peak region.
(ii) In principle, a similar approach could be applied to comoving pairs. However, it is obvious that meeting the conditions above is more difficult for velocities or their differences. The precision of velocity measurement is lower than separation. The distribution of velocities is far from being simply uniform, we do not know the exact form of the velocity background. Figure 11(b) suggests a more complicated collective motion in the background and a relatively low peak of binaries. That is why we prefer the primary signature based on the spatial separation, where we work with exactly defined background and relatively high peaks, like panels (i) and (k) in Figures 9 and 10. On the other hand, one can expect that a combination of position and velocity data will suppress background and improve the selection of binaries in terms of the function ( ) b D . A selection based on such a combination is described below. In Section 4.2 we have shown that combination of the space separation with proper motion allows us to estimate the orbital period.
We denote by N D bin and N bg D the numbers of real and false (background) binaries in some domain D of separations Δ.
where N D 2 is the number of pairs in a given data set. We have where P bg is distribution defined by (18) and renormalized in such a way that where the integration is over the domain E safely outside the peak of binaries. So we calculate where P 0 is defined by (18) with the substitutionx r = D 2 . The same procedure can be applied for the domains ( ) ( ) a DD D . Since we can assume that Δ and α are not   If we choose ( ) ( ) a a = D E , then the second ratio is 1. In this way, we can calculate N bg D without the knowledge of ( ) a P . The selected domains are shown, together with the results β≈N bin / N 2 in Figure 16. For the calculation, we take the common interval ( ) D º á ñ E 2, 4 pc, where N bin E should be zero. In accordance with Figure 12 The probabilistic function ( ) b D corresponding to the united domainAB is shown in panels (b)-(d) in Figure 15 for different intervals of the magnitude G. Obviously, the function (46) fitted with the parameters listed in Table 3 can be rather approximate. We observe that function β is getting wider with decreasing G. This may suggest that brighter and therefore statistically more massive stars can form stable bound systems even at greater separations. Obviously, for Δ1 pc, the probability β is compatible with zero. This also corresponds to an absence of a binary peak in Figure 11(d), which relates to Δ>1 pc. Total numbers of binaries are listed in Table 4. The results suggest that in the Gaia DR2 data in the region defined by Table 2, the number of binaries can represent ≈2% of all stars in this region.

Catalog
In this section, we describe the catalog of binary candidates, which we have created from the events of multiplicity 2M15 defined by Table 2. For the first version of the catalog we accept only the candidates from domainA shown in the first panel in Figure 16. So, we do not accept all candidates, but only candidates with a high probability to be the true binary. The candidates meet the following conditions: (1) Projection of separation We accept the pairs, which satisfy ( ) D  0.15 pc. 52 cat In general, the projection of separation Δ depends on the reference frame. In the paper, we worked with the local reference frame defined by the event center, where projection Δ into the local plane XY is given by (15). In the catalog we do not use local frames. The cut (52) is applied to Δ cat , which is defined as the length of the arc where L α , n α are defined in Section 2.1. The separations Δ and Δ cat are not exactly equal, but we have checked that in our conditions their difference is small, d á Dñ » 0.007 pc. Then the sharp cut on D cat means only a slightly smeared cut of the distribution of Δ.
(2) Projection of collinearity The pairs must meet the condition Both conditions define the domainAin the first panel of Figure 16. The panel shows that the average probability of a binary star is β≈84%. If the stars are brighter, (second and third panel), then the average β is almost 100%.
(3) Radial separation In fact, the radial separation is not explicitly used in our algorithm for selection of binaries. The reason is a rather low precision of radial separations, as explained in the discussion of panels (c), (e), and (f) in Figures 9 and 10. The only constraint is given by the diameter of our events (4 pc). Separation selection is based solely on Δ ij . Additional cuts on inaccurate radial separation would eliminate many of the real binaries and invalidate the function β calculated for Δ ij . We obtained high β even without a cut on radial separation.
Furthermore, it is evident that spherical events fill the space only partially (≈52%). In this way, half the stars are lost for analysis. We also lose binaries between two adjacent events when each star falls into another. In order to recover these losses, we work with the modified coverage: (i) The event spheres are replaced by cubes of edge 4 pc with no gaps between them. In each cube, we search for the pairs meeting the conditions (52) and (54). (ii) The procedure is repeated with the same cubes centered in the corners of the former cubes and the search results are merged.
The catalog is represented by a matrix that is defined as follows. Each line represents one star and the following data are in the columns: 1-2: Group ID and Group size ( )  n 2 to match stars with the group they belong to.  Table 5. The number of candidates N 2 in this table correspond to N 2 in the domainsA in Figure 16 after increasing with the repeated covering. The comparison of the two catalogs shows the following: However, in the JEC catalog, apart from the cut G13 many other restrictions and selections are made. Definition of the binary is not the same in both catalogs. In our opinion, this is the reason for the difference between (I) and (II). (c) (I) and (II) have 108 common binary candidates. (d) 86(II) candidates are absent in (I) since the separation Δ exceeds 0.15 pc. These candidates do not contradict our general criteria, but the corresponding probability ( ) b D can be lower as shown in Figure 15. (e) 54(II) candidates are absent in (I) since their spatial separation exceeds 4 pc (event diameter). Such candidates may not contradict our criteria; however, ( ) b D due to a great background can be extremely low. (f) 18(II) candidates are absent in (I) since these candidates are located in a dense area generating the high multiplicity events, which exceed our currently set limit. (g) The last 35(II) candidates are absent in (I), mainly due to the fact that even after the second coverage some couples (II) remain separated in two neighboring event cubes.
The total number of the binary candidates of all Gaia magnitudes in the catalog (I) is 80,560, which corresponds to the expected real number of binaries 67,670. The full current catalog (I) in the csv form is available on the websitehttps:// www.fzu.cz/~piska/Catalogue/. We plan to further develop and optimize our catalog methodology.

Summary and Conclusion
We have proposed a general statistical method for analysis of finite 2D and 3D patterns. In the present study, the method has been applied to the analysis of binary star systems in different regions of the Gaia catalog DR2.
Results on 2D statistical analysis were compared with our former results obtained from the previous catalog DR1. The new results give in the distribution of angular separations more clear evidence of binaries. Independent signature follows from the characteristic functions ( ) Q M n , which clearly indicate a tendency to clustering. However, the most important results are obtained from the 3D analysis introduced in the present paper. We have analyzed about 5×10 5 of events inside the cube of edge 400 pc centered at the origin of the galactic reference frame. In distributions of pair separations, we observe the sharp peaks at small separations corresponding to binaries, which are more striking for brighter sources, G15.
The important result of the analysis is probabilistic function ( ) b D , which depends on the separation Δ of a pair of stars and indicates the probability that the pair constitutes a bound system. The function suggests that brighter, more massive binary stars have on average a greater separation. With increasing separation the function falls rapidly. We obtained the ratio binaries/singles ≈2%.
Furthermore, we had shown thata combined analysis of 3D separations with the proper motion of the pairs of sources gives a clear picture of the binaries with two components of the motion: parallel and orbital. The analysis allowed us to estimate the average orbital period and mass of the binary star system in the chosen statistical ensemble.
The highest probability of the binary is observed at smallest separations Δ and angles α between proper motions. From the corresponding domain a D´º á ñ´á ñ 0, 0.15 pc 0, 15 deg, we have created the catalog involving 80,560 binary candidates, which represents 67,670 of the true binaries. This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/ gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC,https://www.cosmos.esa.int/web/gaia/ dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular, the institutions participating in the Gaia Multilateral Agreement. The work was supported by the project LTT17018 of the MEYS (Czech Republic). Furthermore, we are grateful to J. Grygar for deep interest and many valuable comments and J. Palouš and O. Teryaev forvery useful discussions and inspiring comments.  ~< < P l dl P l dl P l ldl l , 0 1. 59 Note. N 2 is the number of binary candidates, N bin is the real expected number of binaries. The data in the second column are related to the full cube region ( ) 400 pc 3 ( Table 2).