Where Are the Water Worlds? Identifying Exo-water-worlds Using Models of Planet Formation and Atmospheric Evolution

Planet formation models suggest that the small exoplanets that migrate from beyond the snowline of the protoplanetary disk likely contain water-ice-rich cores (∼50% by mass), also known as water worlds. While the observed radius valley of the Kepler planets is well explained by the atmospheric dichotomy of the rocky planets, precise measurements of the mass and radius of the transiting planets hint at the existence of these water worlds. However, observations cannot confirm the core compositions of those planets, owing to the degeneracy between the density of a bare water-ice-rich planet and the bulk density of a rocky planet with a thin atmosphere. We combine different formation models from the Genesis library with atmospheric escape models, such as photoevaporation and impact stripping, to simulate planetary systems consistent with the observed radius valley. We then explore the possibility of water worlds being present in the currently observed sample by comparing them with simulated planets in the mass–radius–orbital period space. We find that the migration models suggest ≳10% and ≳20% of the bare planets, i.e., planets without primordial H/He atmospheres, to be water-ice-rich around G- and M-type host stars, respectively, consistent with the mass–radius distributions of the observed planets. However, most of the water worlds are predicted to be outside a period of 10 days. A unique identification of water worlds through radial velocity and transmission spectroscopy is likely to be more successful when targeting such planets with longer orbital periods.


Introduction
Kepler observations have shown that small exoplanets with radius between 1 R ⊕ and 4 R ⊕ are extremely common among planets with orbital periods shorter than 100 days (e.g., Lissauer et al. 2011b;Mulders et al. 2015bMulders et al. , 2018;;He et al. 2021).Mass and radius measurements of such planets show evidence of two populations of planets, rocky and gaseous planets (Lopez & Fortney 2014;Rogers 2015), from the break in the mass-radius relation (e.g., Wolfgang et al. 2016;Chen & Kipping 2017).Further studies with improved precision of stellar radii from follow-up surveys like the California-Kepler Survey and with the help of improved parallaxes from Gaia (Johnson et al. 2017;Van Eylen et al. 2018) have shown that the size distribution of the small exoplanets is bimodal, with a radius gap, also known as the "radius valley," at ∼1.8-2 R ⊕ (Fulton et al. 2017;Fulton & Petigura 2018;Ho & Van Eylen 2023).This suggests a clear bifurcation between the "super-Earth" population at the lower peak and the "sub-Neptune" population at the higher peak.
The two leading theories that have emerged to explain these two populations of exoplanets are atmospheric dichotomy and compositional dichotomy.The first theory suggests that sub-Neptunes are the planets with primordial H-/He-dominated atmospheres, and super-Earths are the planets that have lost their primordial atmospheres.Mass-loss mechanisms that can cause small planets to lose their primordial atmospheres entirely include photoevaporation loss due to high-energy stellar flux (e.g., Owen & Wu 2017;Owen & Campos Estrada 2020;Rogers & Owen 2021;Rogers et al. 2023), corepowered mass loss (e.g., Ginzburg et al. 2018;Gupta & Schlichting 2019), and giant impact loss (e.g., Inamdar & Schlichting 2016;Biersteker & Schlichting 2019;Chance et al. 2022), among others.The other theory suggests that the size bimodality is a manifestation of the two types of core compositions of the small planets.Super-Earths are the planets with Earth-like rocky core compositions and sub-Neptunes are the planets with water-ice-rich cores with larger radii, due to lower density (e.g., Mordasini et al. 2009;Raymond et al. 2018;Venturini et al. 2020;Zeng et al. 2021).These water-icerich planets, also known as "water worlds," are the planets that form exterior to the snowlines of the protoplanetary disks and migrate inward through type I migration to the locations we observe today (e.g., Izidoro et al. 2017;Raymond et al. 2018Raymond et al. , 2020)).Such planets form with a silicate-to-ice ratio of 1:1 (e.g., Lodders 2003;Lopez 2017;Zeng et al. 2019;Mousis et al. 2020;Aguichine et al. 2021) beyond the snowline.These are analogous to the water-rich minor planets and moons (e.g., Enceladus, Pluto, etc.) of the solar system and the water-rich cores of Uranus and Neptune (Mousis et al. 2018).However, the existence of such planets around other stars at short orbital distances remains elusive to date.Luque & Pallé (2022, hereafter LP22) presented a sample list of small (size < 4 R ⊕ ) transiting planets around M dwarfs with refined mass and size estimations that strongly hint at a distinct population of these water worlds in the mass-radius and density space.However, the observed mass-radius distributions can also be explained with a rocky population of planets formed in situ with varying atmospheric content (Rogers et al. 2023).Moreover, in situ models have been successful at explaining the broad distributions of orbital periods, eccentricities, and radii of the Kepler planets (e.g., Hansen & Murray 2013;Ogihara et al. 2015;MacDonald et al. 2020).So the presence of water worlds is not a foregone conclusion from current observations.However, in situ planets start with excess mass in the inner region of the disks and therefore need either pebble drift (e.g., Boley et al. 2014;Chatterjee & Tan 2014, 2015) or planet migration (e.g., Izidoro et al. 2017;Raymond et al. 2018Raymond et al. , 2020)), which would also bring in water worlds from beyond the snowline to the inner region.The latter processes also include planet-disk gravitational interactions that in situ models ignore (Izidoro et al. 2017).Moreover, migration models too can explain most features of the Kepler systems, including the radii and mutual inclinations (coplanarity) of the planets and the period ratio distribution of the systems in mean motion resonance (e.g., the Trappist-1 system).However, most (90%-95%) Kepler systems are found to be somewhat offset from the first-order mean motion resonance (Lissauer et al. 2011a;Fabrycky et al. 2014), which would require other physical processes, such as dynamical instabilities and collisions (Izidoro et al. 2017(Izidoro et al. , 2022)), among others, to break the resonance chains after the migration phase.
While the planets containing rocky cores have been shown to reproduce the Kepler size bimodality with the help of atmospheric escape models, the possible fate of the hypothetical water worlds after migrating inward has been somewhat contentious.Many of the early models of compositional dichotomy do not provide any explanation as to why the water worlds would not accrete H-/He-dominated atmospheres (Izidoro et al. 2017;Raymond et al. 2018).Izidoro et al. (2022) present their pebble accretion model of planet formation to show that giant impacts on the migrating planets by planetesimals can wipe away the primordial atmospheres of most of the planets, unveiling the two kinds of core compositions for a certain initial condition.In such a case, giant impacts are shown to dominate over other atmospheric mass-loss mechanisms, such as photoevaporation, responsible for the atmospheric dichotomy.Although both photoevaporation and giant impacts can account for the atmospheric erosion of super-Earths, measuring the D/H ratio of individual planet atmospheres presents a potential method for differentiating between the two escape methods.The fractional escape of the primordial H/He envelope in the case of photoevaporation is known to alter the atmospheric D/H ratio (Gu & Chen 2023).In contrast, impact-driven escape may result in either no fractionation or a distinct level of fractionation.On the other hand, Mordasini (2018) discusses the presence of water worlds having atmospheres by using the Bern model, but most such planets turn out to be Neptune-sized and hence are not useful in reproducing the Kepler size bimodality.Hence, the impact of the atmospheric evolution and escape of the water-rich sub-Neptunes with atmospheres on the overall demographics of small exoplanets remains a subject of interest.This is further motivated by the recent developments in the models of internal structures of water worlds with H/He atmospheres (Lopez 2017;Zeng et al. 2019).
In this paper, we provide an agnostic overview of the effect of atmospheric evolution and mass-loss processes, such as photoevaporation and giant impacts, on the planets simulated through two suites of formation models: migration and in situ.We use the planetesimal accretion models from the Genesis library developed by Mulders et al. (2020) for that purpose.We compare the outcomes of our models with both the size distribution from Kepler and the mass-radius distributions of the transiting planets that have been followed up by radial velocity (RV) observations (e.g., LP22).We further use these comparisons to benchmark the migration models from which we predict the occurrences of water worlds as a function of their mass, radius, and orbital period.In Section 2, we explain the general methodology: the combinations of formation models, host stars, and atmospheric mass-loss processes and how we implement them to reproduce the Kepler size bimodality.We define the possible water worlds from an observational perspective and discuss the occurrences of such water worlds from observations (as upper limits) and from models in Section 3. In Section 4, we elaborate on how we calculate the probabilities of finding water worlds from their kernel density estimations (KDEs) and use them to guide future searches for water worlds.We conclude with the key points in Section 5.

Methods
Here we study the different processes of the atmospheric evolution of the small planets in tandem with their formation theories.Instead of adopting some initial random distributions for the properties of the planetary cores in our study, we use the outcomes of the planetesimal accretion models available from the Genesis database (see Section 2.1).This combined study provides an avenue for analyzing the possible bulk compositions of the observed planets from their bulk density and orbital period distributions.This study also allows us to verify which of the contesting theories of atmospheric evolution of the small planets effectively reproducing the Kepler size bimodality are consistent with the formation theories.Moreover, the Genesis models include migrating planets in addition to planets forming in situ and also simulate the giant impact phases required to study the effect of impact loss.We produce different models for the final architectures of the simulated planets based on their formation process (migration or in situ), type of host star (G or M), atmospheric loss mechanisms (photoevaporation or impact), and bulk composition (rocky or water-ice-rich).

The Genesis Database of Models of Planet Formation
The Genesis database4 is a library of N-body simulations of formations of the terrestrial planets developed by Mulders et al. (2020).These models use the simulations of Hansen & Murray (2013) to model the Kepler super-Earths as a starting point.Based on the distribution pattern of the solids around the host stars, the annular range of distribution of the solids, and the ratio of the number of embryos to the number of planetesimals, Mulders et al. (2020) presented 15 models with different initial conditions, each having 50 runs of simulations.12 such models include planets growing in situ and the other three include migration.Each model shows a different distribution of core mass and orbital period (see Figure 3 of Mulders et al. 2020).We combine these models so that the final suites of models exhibit lognormal-like mass distributions ranging between ∼1M ⊕ and ∼25M ⊕ and period distributions closest to the Kepler period distribution ranging between ∼1 day and ∼100 days.We arrive at three suites of such models: 1.In situ: combining all 12 in situ models from Genesis. 2. Migration-A: combining the migration models Gen-M-s22 and Gen-M-s50 from Genesis. 3. Migration-B: combining the migration models Gen-M-s10 and Gen-M-s50 from Genesis.
All three migration models in Genesis, namely Gen-M-s10, Gen-M-s22, and Gen-M-s50, initiate with solids distributed between 0.05 and 10 au, albeit with different normalizations of the surface density.They migrate inward under the influence of the gas disk that stretches up to an inner edge of 0.05 au.
Depending on the snowline location (see Section 2.2.1), migrating planets that make up the population of small exoplanets in the inner region of the disk are likely to consist of both rocky and water-ice-rich cores, as further elaborated in Section 2.3.The gas disk disperses after 5 Myr from the start of the simulations, after which dynamic instabilities set in.This instability phase leads to the breaking of several of the resonant chains of planets that formed during migration, orbital crossing among planets, and giant impacts.Unlike the in situ suite of models, the migration simulations do not incorporate lowermass planetesimals, which could potentially impact the atmospheric erosion process, as suggested by Chen & Jacobson (2022).However, as our current study does not consider the influence of planetesimals on atmospheric erosion, their absence does not significantly affect the evolution of the migrated planets.The potential impact of planetesimals is further discussed in Section 2.6.
The N-body simulations in the Genesis database assume perfect accretion of the cores.This is corroborated by independent studies carried out by Chambers (2013) and Dugaro et al. (2020), where they conducted comparisons to assess the impact of hit-and-run collisions and collisional fragmentation relative to perfect mergers.Chambers (2013) argued that large bodies often reaccrete mantle fragments at a subsequent time, leaving their final composition largely unchanged.Conversely, for water worlds, Dugaro et al. (2020) found that the fragments' contribution to their final mass and water content is negligible.The distributions of the core masses for the three suites of models are shown in Figure 3.The observed period ratio distribution of small Kepler exoplanets (size <4 R ⊕ ) suggests that the majority of the resonant chains break after the gas disk dispersal (e.g., Izidoro et al. 2017Izidoro et al. , 2021Izidoro et al. , 2022)).To match our migration models with this feature, we choose 95% of the runs with broken resonant chains and 5% of the stable runs for each model, following Izidoro et al. (2021Izidoro et al. ( , 2022)).
This step only provides us with the mass and orbital period distributions of the planetary cores.To determine the bulk compositions of the cores, we need to define the location of the snowline, which in turn depends on the spectral type of the central host star.The choice of host stars is explained in the following subsection.

Host Stars
The final architectures of the planets depend on the central star.We consider two types of host stars for our models: G-type and M-type.We use the same distribution of planetary cores adopted from the Genesis database, albeit simulated for a Sunlike star, for both choices of host stars.This is motivated by the fact that the overall size distribution of Kepler planets (Gupta & Schlichting 2019;Ho & Van Eylen 2023) as well as the total mass budget of the super-Earths and sub-Neptunes (see the discussions in Mulders 2018) do not alter significantly with the change in the host star mass, although the disk masses are known to scale with the stellar mass (Pascucci et al. 2016).The parameters that change with the host stars are the location of the snowline of the disk, which dictates the water-ice content of the cores, the luminosity of the star, which dictates the equilibrium temperature (T eq ) of the planets, and the highenergy X-ray and ultraviolet (XUV) flux from the stars, which dictates the photoevaporation loss rate of the atmospheres of the planets.The fiducial values of the parameters for the two types of stars chosen in our models are shown in Table 1.For the G-type host star, we have chosen the same values as for the Sun, and for the M-type host star, we have chosen the median values of the parameters of the host stars in the target list of LP22.

Location of Snowline
We consider the location of the snowline at the time of planetesimal formation, as this is the time when the water/ice gets trapped in the planetesimals beyond the snowline.They do not further sublimate on migrating inward, unlike the pebbles (Mulders et al. 2015a), but can only lose water partially by late collisions after the gas disk dispersal or from 26 Al heating (Izidoro et al. 2017(Izidoro et al. , 2022;;Monteux et al. 2018;Raymond et al. 2018).The location of the snowline in the solar system at the time of planetesimal formation has been inferred from the transition between hydrous and anhydrous asteroids to be ∼2.5 au (Abe et al. 2000).However, this estimated location of the snowline could change significantly if we consider that the asteroids have been scattered into their current locations (Walsh et al. 2012;DeMeo & Carry 2014).Moreover, Mulders et al. (2015a) calculate the 1σ range of the locations of the snowlines for a solar-mass star to be between ∼2 and ∼5 au for a disk made up of small grains and between ∼1.1 and ∼2.4 au for a disk made up of large grains.Keeping this uncertainty in mind, we choose the fiducial location of the snowlines for the G-type stars at 2.2 au.Following the scaling relation of Mulders et al. (2015a) and keeping in mind the associated uncertainties, we select the fiducial location of the snowlines for M-type stars as 0.8 au.However, the effect of other values of snowline locations is described in Section 3.3.

The Cores and H/He Atmospheres of the Genesis Planets
The final states of the simulations of the Genesis planets provide us with the distributions of mass and semimajor axis of the planetary cores.We choose the cores with orbital period (P) <100 days and size within 1 R ⊕ and 4 R ⊕ , as we are interested in the super-Earth and sub-Neptune populations.We consider two kinds of bulk compositions of the cores: 1. Earth-like rocky composition with 32.5% Fe and 67.5% MgSiO 3 by mass, as the detailed works on the massradius relations of the observed terrestrial planets indicate the dominant presence of such cores (Carter et al. 2012;Dressing et al. 2015;Zeng et al. 2016, etc.).2. Earth-like rocky composition (50%-100% by mass) + a layer of H 2 O (0%-50% by mass).
To calculate the water/ice mass fraction (WMF) of the cores, we assume that, in the beginning of the simulations, the embryos/ planetesimals within the snowlines are fully rocky (100% Earthlike + 0% H 2 O) and beyond the snowlines they are water-ice-rich (50% Earth-like + 50% H 2 O; Raymond et al. 2018;Izidoro et al. 2021).Over the course of the simulations, the WMFs are evolved post-process by updating them after each collision according to their mass ratios and pre-collision WMFs.The evolutions of the WMFs for a migration model (Gen-M-s22) and an in situ model (Gen-O-s22) from the Genesis database (Mulders et al. 2020) are shown in Figure 1.We identify the planets with WMFs >0.3 as water-rich planets, which is primarily motivated from observations, as explained in Section 3.2.The relative abundances of the rocky planets (WMF <0.1), water-rich planets (WMF >0.3), and planets with intermediate water-rock compositions (0.1 < WMF <0.3) are shown in Figure 2. To assess the impact of our assumption of the WMF being a one-step function of the distance from the host star, we repeated our calculations with a two-step function.We did not find a significant change in the relative abundances of the rocky, water-rich, and intermediate planets.
Hence, we proceed with the results from our one-step approximation as mentioned above.
Figure 2 shows a significantly low abundance of planets with intermediate water-rock compositions from the migration models, which is consistent with our definition.Also, the fraction of water-ice-rich cores is much higher around M-type stars than G-type stars, due to the fact that the snowline is much closer in the case of M-type stars (see Table 1).Figures 1 and 2 reconfirm that only migration models can explain the efficient transport of high mass fractions of H 2 O to the planets within orbital periods of 100 days.
We consider that all planets accrete H/He atmospheres from the disk and retain them after the gas disk dispersal.We consider a log-uniform distribution for the initial atmospheric mass fraction (X i ) of the planets, following Rogers et al. (2023), as The upper limit is consistent with the estimations of the atmospheric mass fractions (X) of the observed sub-Neptunes from their mass and radius measurements (e.g., Lopez & Fortney 2014;Wolfgang & Lopez 2015).It also covers the range of atmospheric mass fractions possible after the "boil-off" stage (Misener & Schlichting 2021;Rogers et al. 2023).We consider any planet with X  10 −4 as a stripped super-Earth.

Converting Mass to Radius
The sizes of the planetary cores are calculated by using the model of Zeng et al. (2019), as The existing models for calculating the size of a planet having an atmosphere for a given value of X and equilibrium temperature (T eq ) are discussed by Rogers et al. (2023).For a planet with age >1 Gyr, we found that the models by Lopez & Fortney (2014) and from the publicly available code evapmass by Owen & Campos Estrada (2020) are in good agreement and both account for the contraction of the atmospheres of the planets due to cooling over time.As pointed out by Rogers et al. (2023), the model by Zeng et al. (2019) considers a temperature parameter, which is the temperature at a pressure of 100 bar and can be interpreted as the equilibrium temperature only when the planets are old enough (age >1 Gyr).Thus, the model by Zeng et al. (2019) may be used to calculate only the size of the young planets.As we focus on exoplanets with age >1 Gyr in all our calculations, we consider the model by Lopez & Fortney (2014) to compute the size of the atmospheres.Also, the size of an atmosphere does not depend on its metallicity, as Lopez & Fortney (2014) showed that any difference between the solar metallicity model and the enhanced opacity model gets erased after several Gyr.The atmospheric mass fraction of the planets and hence their sizes further change as they lose their atmospheres over time through different mechanisms, as explained in the next subsection.These loss mechanisms shape the final architectures of the planets, which can be compared with the observed planetary statistics.

Loss of Atmosphere
The terrestrial planets can lose their atmospheres due to Parker-type winds.We discuss here two of the leading theories of such mass-loss mechanisms: photoevaporation loss and impact loss.The effect of core-powered mass loss on the size distribution of small exoplanets across orbital periods is essentially similar to that of photoevaporation (e.g., Rogers et al. 2021) and hence is not included in this work.The photoevaporation loss mechanism has been widely studied (Owen & Wu 2017;Owen & Campos Estrada 2020) and is found to robustly reproduce the bimodal size distribution of the Kepler planets through simulation.Depending on the density and distance from the stars, the planets could entirely or partially lose their atmospheres, resulting in an overall atmospheric dichotomy.On the other hand, the impact theory suggests that the planets undergoing giant impacts (mass ratio 0.1) can completely lose their primordial atmospheres, regardless of their mass and distance from the host stars (Biersteker & Schlichting 2019;Izidoro et al. 2022), exposing their compositional diversity.Here, we assess if the Genesis planets undergo an adequate number of giant impacts to reflect their compositional dichotomy, i.e., rocky versus water-ice-rich composition, in their size distribution.We subject the three different models explained in Section 2.1 to these atmospheric mass-loss processes to assess which combinations of models and mass-loss mechanisms turn out to be consistent with the Kepler size distribution.

Photoevaporation Loss
We calculate the photoevaporation mass-loss rate and timescale by following the formalism of Owen & Wu (2017).The mass-loss rate depends on the high-energy luminosity (L HE ) of the host stars and an efficiency parameter η, denoting the efficiency of these high-energy photons for mass removal (Owen & Wu 2017).We calculate L HE as a function of the mass and age of the host stars by following Owen & Wu (2017), who assume a linear dependence of L HE on the stellar mass.Additionally, we adopt the same saturation timescale of 100 Myr for both types of stars, as proposed by Owen & Wu (2017).They argue that the time-integrated XUV flux is predominantly influenced by the initial 100 Myr, making it unlikely that a more extended saturation timescale for M dwarfs would significantly alter the outcomes.Any potential differences in this regard are expected to be accommodated by the range of values assumed for η.We numerically solve the mass-loss equations and take the sizes of the planets after an age of 5 Gyr as the final sizes of the planets, since they do not change significantly afterward.Following Owen & Wu (2017), Rogers & Owen (2021), etc., we calculate the photoevaporation efficiency, considering the energy-limited scenario, as a function of the escape velocity of the planets (v esc ) as η 0 and α are free parameters of our models and hence we run our simulations over a grid of values for η 0 and α.This approach helps us address the dependence of η on the planet mass.H/He take a longer time to escape from massive planets, allowing them to lose more energy through radiative cooling and slowing down the mass-loss process (Owen & Jackson 2012).While energy-limited escape applies to the weakly irradiated planets, some of the planets in our models could lose their atmospheres through recombination-limited escape (Murray-Clay et al. 2009; Owen & Alvarez 2016) or X-ray evaporation (Owen & Jackson 2012), which would further complicate the situation.However, Owen & Alvarez (2016) showed that most of the small planets with both mass and radius measurements fall into the energy-limited regime.Moreover, the energy-limited approach adopted in this paper (i.e., Equation (3)) has been shown to well reproduce the Kepler size bimodality (e.g., Owen & Wu 2017;Rogers & Owen 2021).For all the model suites with G-type host stars, we find that the size distribution of the simulated planets, especially the location of the radius valley, turns out to be consistent with that from Kepler when we adopt η 0 < 0.06 and 0 < α < 0.5.Since we do not have an observed size distribution for the planets around late M dwarfs to date to benchmark with, we use the same values of η 0 and α for the case of M-type host stars and still find a bimodal size distribution of the simulated planets, as speculated (Cloutier & Menou 2020).
Figure 4 shows that the Migration-A model is found to provide the best match with the Kepler size distribution.However, photoevaporation can produce the "evaporation valley" with all three suites of models and for both types of host stars.It is the fractional occurrence of water-ice-rich planets that distinguishes the three models.

Impact Loss
Giant impacts during planetary accretion can deliver significant energy to the planet, thereby heating the core and the envelope, which can cause the hydrodynamic escape of the H/He envelope, leading to a rapid mass loss.The effect of the shockwave generated by a giant impact can also eject a fraction of the envelope, but that effect is less significant compared to the thermal effect (Biersteker & Schlichting 2019) and hence we only consider the latter in this work.Following Biersteker & Schlichting (2019) and Izidoro et al. (2022), we assume that after the gas disk dispersal, the thermal effect from a giant impact on a planet (where the mass of the impacting body is 0.1 times the planet mass) can completely strip the primordial atmosphere.In the absence of the gas disk, this loss cannot be replenished.The in situ simulations are performed without any gas disk throughout, whereas in the case of the migration models, the gas disk is dispersed after 5 Myr from the start of the simulations.We assume that an impact by a less massive planetesimal does not change the atmospheric mass fraction of the planets.
Figure 5 shows that the impact stripping process can somewhat reproduce the size bimodality with the Migration-B models, whereas it absolutely fails to produce the radius valley with the Migration-A and in situ models.Moreover, in the case of the Migration-B models, it is not the dichotomy in the bulk composition but rather the dichotomy of atmospheres (the presence and absence of atmospheres) that causes the bimodality of the size distribution in our models, similar to the case of photoevaporation loss, as evident from Figure 5.

Discussion of Our Modeling Steps
Our study shows that the current suites of migration models of the Genesis database perform significantly better than the in situ models in reproducing the Kepler size bimodality.While photoevaporation is found to robustly explain the size bimodality with different distributions of core mass and location, the impact stripping process appears to do so only with certain distributions.
As we aim to combine multiple complex processes with the N-body simulations of the Genesis database, we adopt simplified models with certain assumptions at various steps.A few of these simplifications and assumptions are described here, which can be seen as a guide for future improvements.First, our models simulate zero to a few ultra-close-in super-Figure 3. Distributions of the core masses of the planets we adopt from the Genesis database segregated as rocky cores and water-ice-rich cores by considering a G-type host star.Note that in our migration models, there is no clear separation in mass between the rocky cores and the water-rich cores, which shows why compositional dichotomy is unlikely to work in our models.
Earths with orbital period 4 days, failing to explain the Neptune desert.We plan to address this in our future studies, with new N-body simulations with refined initial parameters.Second, we employ a simple model for the impact stripping mechanism by following Izidoro et al. (2022), where we assume that lower-mass impactors below the cutoff limit do not cause any erosion of atmosphere.While this is true for a single impact, as the plume-driven local mass loss due to an impacting planetesimal may not be significant, the cumulative effect of multiple small impacts can cause significant erosion of atmospheres from the embryos (Schlichting et al. 2015; Chen & Jacobson 2022).However, this process is unlikely to significantly impact the planetary statistics in our in situ suite of models.Giant impacts are already observed to completely erode the primordial atmospheres of the majority of planets in this suite, resulting in an overabundance of bare super-Earths that is inconsistent with the Kepler distribution (refer to Figure 5).Conversely, while giant impacts are found to be effective in explaining the radius valley for the Migration-B models, they appear to be inefficient in eroding the atmospheres of planets in the Migration-A models.Therefore, introducing planetesimal-driven escape into the Migration-A models could offer an explanation for the radius valley.However, migrating planets could capture most of the planetesimals without losing their atmospheres before the dispersal of the gas disk and by the time the gas disk disperses, the number of planetesimals may not be large enough to cause significant atmospheric erosion.Nonetheless, we plan to include the effects of planetesimals in the case of migration and a detailed model of plume-driven loss due to impacts by planetesimals in our follow-up studies.Finally, for the water worlds, we adopt the models of Zeng et al. (2019), who assume isothermal vapor/liquid/supercritical fluid envelopes.Using updated models of water worlds, for example, the adiabatic envelope models of Aguichine et al. (2021), can significantly alter the population statistics.Nonetheless, this initial approach to developing population synthesis models based on the Genesis formation models helps us gain essential insights.
The primary aim of studying the impact stripping process was to test the hypothesis of compositional dichotomy through the impact stripping of migrating planets, as proposed by Izidoro et al. (2022).However, as we have a different model setup, we find that the impact stripping of the planets from our migration models does not result in compositional dichotomy, but rather accounts for the size bimodality with atmospheric dichotomy, similar to the case of photoevaporation.This is primarily because the masses of the rocky and water-rich cores from the migration models do not show any bimodality (see Figure 3), unlike the planets of Izidoro et al. (2022), thereby lacking any inherent bimodality essential for compositional dichotomy to work.Additionally, we find that our migration models produce a significant number of bare water-ice-rich planets adding to the super-Earth population, in contrast to Izidoro et al. (2022).This happens because no small water-icerich planet goes through any giant impact in the models of Izidoro et al. (2022), whereas such planets are found to lose their atmospheres by both giant impacts and photoevaporation in our migration models.We use the occurrence of these bare water worlds as a diagnostic to our analysis, as explained in the following section.Since the in situ models do not favor the water-world hypothesis, we assess the possibility of the water worlds in the following sections with the help of our migration models.

The Water Worlds: From Models and Observations
As migration models suggest the possibility of water-ice-rich super-Earths and sub-Neptunes, we look for similar patterns from observations.We consider the planets with both mass and radius measurements for this.Such planets, although currently small in sample size, allow us to look into the mass-radiusorbital period distributions of the planets that can be compared with model outcomes.

The Sample List of LP22 and the Transiting Extrasolar Planets Catalogue
LP22 suggested three populations of planets-"rocky," "water-rich," and "gas-rich"-from the mass-radius relations and density distributions of the planets around M dwarfs that they studied.Although the density of a "water-rich" planet can also be explained by a rocky planet having a thin layer of atmosphere, it is the distinct clustering of "water-rich" planets in the mass-radius-density space that interests us, as we find similar planets from our migration models.Even after updating the sample list by including planets from the most recently updated Transiting Extrasolar Planets Catalogue5 (TEPCat; Southworth 2011) around M dwarfs, we find a similar pattern.Although a few planets have now appeared with intermediate densities (see the green points in Figure 6), the fraction of such "water-rich" planets is found to be almost unchanged, which motivates us to use this sample list to define the water worlds.

Revised Definition of Water Planets
We assume that the "water-rich" planets identified by LP22 are truly water planets and verify if their fractional occurrence is consistent with our model predictions.This would imply that these planets have significantly lost their primordial H/He atmospheres, and hence we call them bare water planets (BWPs).On the other hand, our migration models suggest that a fraction of the gas-rich planets could also be made up of water-ice-rich cores, which we call gas-rich water planets (GWPs).Following LP22, we define the BWPs as planets strictly falling on or around the 50% H 2 O line in the massradius diagram.Keeping in mind the uncertainty in the models of the mass-radius relations, we allow a small range of WMFs around 0.5 for the definition of BWPs.To accommodate the "water-rich" planets of LP22 within our definition, we set this range between the 40% H 2 O line and the 70% H 2 O line in the mass-radius diagram (see Figure 6).To ensure this, we introduce a new parameter  , which is somewhat equivalent to the mass-radius constant ( f ) in Equation (2), as where m p and r p are the observed mass and radius of the planets in Earth units.For a bare planet,  becomes exactly equal to f.Thus, for a bare Earth-like rocky planet, ( ) = =  f 0 1, and, according to our definition, BWPs are planets with  between f (0.4) = 1.1976 and f (0.7) = 1.3164.Following this, we identify the planets with F f (0.7) as gas-rich planets.Accordingly, we also identify the planets from our models as water planets (bare or gas-rich) if their water fractions are 0.4.This constraint is consistent with the models, as the water fractions of most of the planets from our models are either 0-0.1 or 0.3-0.5 (see Figure 2).This is also justified from the observational perspective, as the water planets with much lower water fractions would be difficult to detect in the initial attempts.The upper limit of 0.7 on water fraction is also consistent with our models, as none of the planets from our models have water fraction >0.5 and most of the gas-rich planets are found with ( ) >  f 0.7 .We use this definition to identify the bare rocky, bare water, and gas-rich planets from observations and from our models.For observed planets around M dwarfs, we use our updated list mentioned in Section 3.1, and in the case of G dwarfs, we use the planets from the latest version of TEPCat.We apply the same constraints on the precision of the mass and radius of the planets as LP22 in all cases, i.e., 8% on radius and 25% on mass, with an exception in the case of Kepler-138 d.We include Kepler-138 d in our sample list of planets around M dwarfs, although it has a mass uncertainty of ∼33%, as it is a strong water-world candidate, as suggested by previous studies (Piaulet et al. 2023).We then compare the fractional occurrence of the BWPs predicted by our models with that from observations.

Mass-Radius-Period Distributions and Occurrence of Water Planets
Both the migration and in situ models are found to explain the observed mass-radius distributions for both types of host stars, as is evident from Figure 7.This is also on par with the arguments of Rogers et al. (2023).Therefore, although current observations cannot be used to draw any inferences about water worlds, we leverage our migration models to prescribe a systematic set of diagnostics that can motivate future observations.
We calculate the maximum possible occurrence of BWPs as a fraction of the bare planets from current observations by using the definition described in Section 3.2, considering the error bars of the masses and radii of those planets.Note that these values only represent the upper limits of the occurrence of BWPs, as the possibility of them being rocky planets with thin atmospheres cannot be ruled out.Table 2 shows the percentage occurrence of BWPs among bare planets calculated from observations (upper limits) and from our migration models, for both G-and M-type host stars.We choose three of the combinations of migration models and mass-loss mechanisms mentioned in Section 2.5: Migration-A + photoevaporation, Migration-B + photoevaporation, and Migration-B + impact stripping.To predict the occurrence of water planets from our models, we utilize the  -period distributions of the simulated planets, which are described as follows.We first compute the KDEs from the  -period distributions of the simulated planets separately for the four types of compositions, viz., bare rocky, bare water, gas-rich rocky, and gas-rich water, as shown in Figure 8.The KDEs essentially represent the conditional probability density functions (PDFs) of finding a planet of these given compositions at a particular point on the  -period plane.Subsequently, we integrate the PDFs across the range of  values and orbital periods covered by the samples of observed planets and calculate the probability that a randomly selected bare planet from that plane is a water world.Evidently, the KDEs calculated from our models are found to be consistent with our compositional classification of the observed planets.The occurrences shown in Table 2 correspond to our fiducial values of the locations of the snowlines of the disks (see Table 1).
Figure 7.The mass-radius relations of the simulated planets from our migration and in situ models, both of which are found to explain the mass-radius relations of the observed planets.However, they predict completely different bulk compositions, which can be reflected in the atmospheric composition.
We also calculate the occurrences for other values of the locations of the snowline.Figure 9 shows which range of values of the locations of the snowline is consistent with observations.Clearly, the fractions of BWPs from the photoevaporation models are consistent with the snowline locations suggested by the disk models (Raymond et al. 2007;Mulders et al. 2015a).Conversely, the same produced by the impact models are found to be inconsistent with the upper limits obtained from observations, which otherwise suggests the snowline locations to be unusually away from the host stars (>3 au for a G-type host and 1.6 au for an M-type host).

Discussion of the Possibility of Water Worlds
Our study shows that the migration + photoevaporation models strongly favor the water-world hypothesis.The impact stripping mechanism is found to produce more water planets compared to photoevaporation, when subjected to the same distribution of planets.Overall, both the migration models can well explain the upper limits set on the occurrence of the BWPs derived from observations, with some adjustments to the models and the locations of the snowline.Moreover, Figure 8 shows that the KDEs can be used to estimate the likelihood of a gas-rich planet having a water-rich core, which otherwise cannot be inferred directly from observations.
Table 2 shows that our migration models tend to produce a higher fraction of BWPs around M dwarfs than around G dwarfs.On the contrary, observations suggest a more similar BWP occurrence for both types of host stars.This questions the validity of the water-world hypothesis.Alternatively, this discrepancy might arise because previous spectroscopic missions did not look where migration models suggest the highest concentration of water worlds.

Guidance for Future Searches for Water Worlds
While the predicted fraction of BWPs is consistent with current observed exoplanets, these water worlds are not equally distributed across the parameter space.We calculate the likelihood (2D PDF) of detecting a water-ice-rich planet over a rectangular grid of orbital periods and  values from our migration models, with the help of the same KDEs mentioned in Section 3.3.Figure 10 shows the corresponding map of the likelihood of occurrence of a water-rich planet (either bare or gas-rich) over the  -period plane.Observed planets are overplotted in the figure to show the probabilities of the bare and gas-rich planets identified from observations being water worlds.Our study shows a strong dependence of the likelihood of water planets on orbital periods.We show the mass-radius distributions of the simulated planets from the Migration-A + photoevaporation model around M-type host stars in Figure 11 for two different ranges of orbital periods: P < 10 days and P > 10 days.Clearly, the water worlds are more abundant beyond orbital periods of 10 days, implying that the snowline has effectively moved from ∼0.8 to ∼0.06 au (10 days).This is slightly longer than the range over which most planets have been followed up by spectroscopic observations to date.We find a similar pattern around the G-type hosts stars.
The period dependence of the occurrence of water worlds is calculated by computing the 1D PDF of the occurrence of BWPs and GWPs as a function of orbital period, which is shown in Figure 12.As is evident from the figure, the PDFs for both the BWPs and GWPs peak beyond orbital periods of 10 days.<23.3% ± 5.7% 25.5% ± 5.3% 34.9 ± 7.6 54.4% ± 0% 0% Note.The uncertainties in the upper limits obtained from observations appear from the uncertainties in the mass and radius of the planets.The uncertainties in the model values appear from the random distribution of X i and also from the range of values adopted for η in the case of photoevaporation.The snowline locations are chosen at 2.2 and 0.8 au for the G-type and M-type stars, respectively.The total likelihood (area under the curve) of finding a BWP or GWP is also high, beyond orbital periods of 10 days.The occurrence of BWPs drops at 50 days, due to the overabundance of gas-rich planets.Hence, future RV follow-up efforts could prioritize longer orbital periods (10-50 days) to increase their odds of finding BWPs.Also, since the absence of an atmosphere on such a planet could strongly indicate a water-rich core, this is the range of orbital periods where spectroscopic or phase-curve studies (Kempton et al. 2023) with JWST and future space-bound and ground-based missions would have a higher probability of identifying water worlds.
On the other hand, we find from our models that the likelihood of finding a GWP increases as we go farther away from the central star.Thus, JWST and future atmospheric Figure 9. Variation of the percentage occurrence of BWPs among bare planets (BWP/BP) calculated from our models by using their period- KDEs with the location of the snowline of the disk.The dark-and light-shaded regions denote the 1σ and 3σ uncertainties, respectively.The blue error bar denotes the 1σ uncertainty range of the BWP/BP occurrence computed from observations.The cyan vertical lines denote the possible range of snowline locations that can explain the upper limit of the occurrence of BWPs derived from observations.Impact loss is very efficient at stripping H/He atmospheres, thereby creating a lot of BWPs.As observations indicate a much lower occurrence (upper limit) of bare water worlds (see Table 2), this alternatively requires the snowline to be farther away.
survey missions should look beyond orbital periods of 10 days to verify the existence of water-rich sub-Neptunes.Atmospheric features that can be used as traits for their water-ice-rich bulk composition are the high mean molecular weight or high metallicity (Kempton et al. 2023) of the atmospheres and the high abundance of water vapor in the atmospheres, among others.Table 3 shows a list of shortlisted planets with known mass and radius for which the calculated probability of being a water world from our migration models is 50%.The table also shows the values of their transmission spectroscopic metric (TSM) for observation using the JWST/NIRISS instrument (Kempton et al. 2018).The uncertainties in the probabilities include the effect of the uncertainties in the model parameters (e.g., photoevaporation efficiency), but are dominated by the uncertainties in the measured mass and radius of the planets, especially for planets with intermediate  values (likely BWPs).While it is currently difficult to improve the precision of the planetary radii, owing to the limits posed by the precision of the stellar radii, future RV studies could focus on improving the precision in their mass measurements.

Discussion of the Water-world Candidates
Table 3 shows the list of potential water-world candidates that could be followed up by JWST to find atmospheric tracers of their volatile contents.This list contains some of the targets that are already suspected to be water worlds, on the basis of observations with JWST or the Hubble Space Telescope (HST).For example, K2-18 b, which is suggested to be a gasrich water world by our models, is suspected to be a hycean  (hydrogen + ocean) planet from recent observations with JWST (Madhusudhan et al. 2023).The observed spectra suggest the presence of a shallow H 2 atmosphere, which then requires an H 2 O layer (most likely, in an ocean form) that can explain the bulk density (Madhusudhan et al. 2023).Again, Piaulet et al. (2023) claim Kepler-138 d to be a bare water world (∼51% water by volume) from their interior modeling of the planet, which is in strong agreement with our model prediction.Their calculations were further supported by the flat optical/IR transmission spectrum that they obtained from HST, making it a prime target for JWST.
Conversely, our models suggest that TOI 1695 b, once thought to be a water world based on its mass and radius measurements (Cherubim et al. 2023), is less likely to be a water world, due to its relatively high density and short orbital period.However, that does not rule out the possibility of a low bulk content of H 2 O, as we have restricted our definition of water worlds to a lower limit of 30% of WMF.Again, TESS and HST observations of TOI-270 d (Mikal-Evans et al. 2023) tend to support an H 2 -rich atmosphere that shows strong absorption features of H 2 O.Although this is in disagreement with our models suggesting it to be a BWP, it could still be a GWP, which requires further follow-up with JWST for confirmation.

Conclusion
We combined different in situ and migration models from the Genesis database and subjected the simulated planets to atmospheric mass-loss mechanisms including photoevaporation and impact stripping.By comparing the model outcomes with the Kepler size distribution and to an observed sample of planets with precise masses and radii, we find that:  Notes.The probabilities of being a water world (ww prob) are calculated with the help of our Migration-A + photoevaporation models.The TSM values are calculated by following Kempton et al. (2018).a The uncertainties in ww prob predominantly stem from the uncertainties in the radius and mass of the planets.
(This table is available in its entirety in machine-readable form.) 1.Both in situ and migration models are consistent with the radius valley and the mass-radius relations of the observed planets, but the migration models predict a significant number of water-rich planets within orbital periods of 100 days, while the in situ models dictate that all of them are rocky.2. Impact stripping alone can explain the size bimodality only with the migrating planets from the Genesis database, for a certain distribution of core mass and position.In such a case, impact stripping would result in a very high number of BWPs, which would be an indicator of this process.3. Migration + photoevaporation models, on the other hand, predict an intermediate fraction of bare planets to be water-rich (∼10%-30% around G dwarfs and ∼20%-35% around M dwarfs), which is consistent with the maximum possible fraction of water worlds from the current observed sample of planets with precise mass and radius measurements (24.2% ± 6.1% around G dwarfs and 23.3% ± 5.7% around M dwarfs).4. However most bare water worlds are predicted from the migration models at orbital periods of 10-50 days where few RV follow-up efforts are concentrated.Similarly, the fraction of water worlds among the sub-Neptunes also increases significantly outside of ∼10 days.5. The Genesis Population Synthesis code, which can be used to develop population synthesis models based on the Genesis database of formation models and to reproduce the results of this paper, can be found at doi:10.57716/1b63-nw69 and https://github.com/arcunique/GPS.
All the models have been generated with the help of the existing formation models of the Genesis database, as our first step in developing the Genesis population synthesis models.Now that we have an end-to-end model of population synthesis, we plan to conduct more N-body simulations with a refined set of initial parameters based on the observed massradius period distribution in our follow-up work.Based on our present study, we propose that follow-up RV and spectroscopic surveys target planets at longer orbital periods to test the waterworld hypothesis, and provide a list of probable targets with high TSM and RV semi-amplitudes for follow-up.

Figure 1 .
Figure1.WMFs of the embryos of a migration suite and an in situ suite of simulations from the Genesis database evolving over the course of the simulations, considering a G-type host star.The migration models can explain the transport of water-rich cores to the observable region, i.e., within orbital periods of 100 days, while the in situ models cannot.

Figure 2 .
Figure 2. Distributions of the WMFs of the cores produced by all the migration models combined and the in situ models combined from the Genesis database.Note that the fraction of cores with intermediate WMFs is low; the majority are either predominantly rocky or have high WMF (>0.3).

Figure 4 .
Figure 4.The radius of the Genesis planets vs. their orbital period without any atmospheric loss (left) and with photoevaporation loss (middle), and the radius distributions of the planets of all periods after photoevaporation loss (right).The black dashed lines in the first two columns show the observed slope of the radius valley over the orbital period (Gupta & Schlichting 2019; Van Eylen et al. 2019; Izidoro et al. 2022).The observed radius distribution for the Sun-like (G-type) stars is taken from Fulton & Petigura (2018).

Figure 5 .
Figure5.The same as Figure4, but considering only impact loss.Evidently, the impact loss of H/He atmospheres is less efficient than photoevaporation loss in creating the radius valley in our models.

Figure 6 .
Figure 6.The mass-radius relations of the planets from LP22 and the most recently updated TEPCat (Southworth 2011).

Figure 8 .
Figure8.The KDEs of the occurrences of simulated planets in  -period space calculated separately for the bare rocky, bare water, gas-rich rocky, and gasrich water planets.The KDEs can be used to calculate the PDFs over the entire  -period space.The likely natures of the observed planets identified only from their  values are found to be consistent with the background KDEs.

Figure 10 .
Figure10.Color map of the probability of a planet with a given orbital period and  value being a water world, as predicted by our migration+photoevaporation models.The overplotted error bars denote the observed planets from our sample lists, and their colors denote their likely nature based on their  values only.Evidently, the planets around M dwarfs are more likely to be water planets than those around G dwarfs, especially at longer orbital periods.

Figure 11 .
Figure 11.Mass-radius distributions of the simulated planets from our migration + photoevaporation model around G-type host stars, showcasing that the water-rich planets are abundant beyond orbital periods of 10 days.The water-rich planets are enlarged for clarity.

Figure 12 .
Figure12.Fractional occurrence of water worlds as functions of the orbital period predicted by our migration + photoevaporation model for the G-type host star.While the BWPs and GWPs are likely to be concentrated beyond orbital periods of 10 days, it is more likely to find a water planet around an M dwarf than around a G dwarf.

Table 1
Fiducial Values of the Properties of the Host Stars and the Disks Chosen in Our Models

Table 2
The Maximum Possible Occurrences of BWPs as a Fraction of Bare Planets (BWP/BP) from Observations and Models

Table 3
List of Planets with High Mean Probability (ww prob) of Being a Water World (>40% for Bare and >60% for Gas-rich)