BASILISK II. Improved Constraints on the Galaxy-Halo Connection from Satellite Kinematics in SDSS

Basilisk is a novel Bayesian hierarchical method for inferring the galaxy-halo connection, including its scatter, using the kinematics of satellite galaxies extracted from a redshift survey. In this paper, we introduce crucial improvements, such as updated central and satellite selection, advanced modelling of impurities and interlopers, extending the kinematic modelling to fourth order by including the kurtosis of the line-of-sight velocity distribution, and utilizing satellite abundance as additional constraint. This drastically enhances Basilisk's performance, resulting in an unbiased recovery of the full conditional luminosity function (central and satellite) and with unprecedented precision. After validating Basilisk's performance using realistic mock data, we apply it to the SDSS-DR7 data. The resulting inferences on the galaxy-halo connection are consistent with, but significantly tighter than, previous constraints from galaxy group catalogues, galaxy clustering and galaxy-galaxy lensing. Using full projected phase-space information, Basilisk breaks the mass-anisotropy degeneracy, thus providing precise global constraint on the average orbital velocity anisotropy of satellite galaxies across a wide range of halo masses. Satellite orbits are found to be mildly radially anisotropic, in good agreement with the mean anisotropy for subhaloes in dark matter-only simulations. Thus, we establish Basilisk as a powerful tool that is not only more constraining than other methods on similar volumes of data, but crucially, is also insensitive to halo assembly bias which plagues the commonly used techniques like galaxy clustering and galaxy-galaxy lensing.


INTRODUCTION
According to the current cosmological paradigm, the vast majority of all galaxies form and reside in extended dark matter haloes (White & Rees 1978;Mo et al. 2010).Halo occupation modelling tries to use observational constraints on the population of galaxies in order to infer the statistical link between the galaxy properties (mainly their luminosity or stellar mass) and the properties of the dark matter haloes (mainly some measure of halo mass) in which they reside (see Wechsler & Tinker 2018, for a review).The resulting 'galaxy-halo connection' provides valuable insight regarding the formation and evolution of galaxies, and benchmarks to calibrate, compare and validate semi-analytic models (eg.Somerville et al. 2008;Stevens et al. 2016) and simulations (eg.Crain et al. 2015;Munshi et al. 2013;Stinson et al. 2013).In addition, since it describes the link between the light we see and the mass that ★ E-mail: kaustav.mitra@yale.edugoverns the dynamical evolution of the Universe, it is a powerful tool that allows astronomers to constrain cosmological parameters using the observed distribution of galaxies (e.g., Yang et al. 2004;Seljak et al. 2005;Tinker et al. 2005;Yoo et al. 2006;Cacciato et al. 2009).
Arguably, the most straightforward method to infer the galaxy halo connection, and one that has become extremely popular, is subhalo abundance matching (hereafter SHAM, Kravtsov et al. 2004;Vale & Ostriker 2006;Conroy et al. 2006;Reddick et al. 2013).It matches the ordered list of galaxies (typically ranked by stellar mass or luminosity) to that of subhaloes (typically ranked by their peak or infall mass) 1 .In this mostly non-parametric method, one usually allows for some amount of scatter (a free parameter) in the rank-order matching, to have realistic spread in the stellar mass -halo mass 1 Rather than abundance matching individual galaxies to subhaloes, one can also match the abundance of galaxy groups (identified using some group finder) to dark matter host haloes (e.g., Yang et al. 2005aYang et al. , 2007)).
relation (e.g., Behroozi et al. 2010).A key advantage of SHAM over other methods, discussed below, is that it only requires stellar mass (or luminosity) measurements of the galaxies.However, an important downside is that it relies crucially on the assumption that both the galaxy sample and the (sub)halo sample (typically taken from a -body simulation) are complete.Hence, SHAM cannot be applied to subsamples of galaxies (i.e., samples of emission line galaxies, or galaxies selected by colour).In addition, even if the galaxy sample is complete, the (sub)halo catalogues used, which are typically extracted from numerical simulations, suffer from incompleteness due to artificial disruption (van den Bosch et al. 2018;van den Bosch & Ogiya 2018) and failures of subhalo finders (Han et al. 2012;Diemer et al. 2024).This can significantly impact the galaxy-halo connection inferred via SHAM (Campbell et al. 2018).
These problems can be overcome using data that more directly constrains halo mass.The two most commonly used methods are galaxy clustering (e.g., Berlind & Weinberg 2002;Yang et al. 2003;van den Bosch et al. 2007;Zheng et al. 2007;Zehavi et al. 2011) and galaxy-galaxy lensing (e.g., Guzik & Seljak 2002;Mandelbaum et al. 2006Mandelbaum et al. , 2016;;Leauthaud et al. 2017).The former relies on the fact that more massive haloes are more strongly clustered (Mo & White 1996); hence, the clustering strength of a given population of galaxies informs the characteristic mass of the haloes in which they reside.Unfortunately, its reliability is hampered by the finding that halo clustering strength depends not only on mass but also on secondary halo properties (e.g., Gao et al. 2005;Wechsler et al. 2006;Dalal et al. 2008;Lacerna & Padilla 2011;Salcedo et al. 2018), something that is collectively referred to as halo assembly bias.Galaxy-galaxy lensing, which is a manifestation of weak gravitational lensing, uses the tangential shear distortions of distant background galaxies around foreground ones in order to constrain the halo masses of the latter (e.g., Brainerd et al. 1996;Hoekstra et al. 2001;Sheldon et al. 2004;Mandelbaum et al. 2006).Although, in principle, a fairly direct probe of halo mass, this method requires tedious shape measurements of faint background sources, which can be prone to effects like blending and intrinsic alignment.Typically the background sources lack spectroscopic redshifts, which can also cause systematic errors in the modelling of their measured shear distortions.In addition, on large scales the 2-halo term of the lensing shear is also impacted by the same assembly bias issues that plague clustering.
Another method that can be used to constrain the galaxy-halo connection, but which has hitherto been severely under-utilized, is satellite kinematics.It uses measurements of the line-of-sight velocities of satellite galaxies with respect to their corresponding central galaxy in order to constrain the gravitational potential, and hence the mass, of the host halo2 .With the exception of large galaxy groups and clusters, individual central galaxies typically only have a few spectroscopically detected satellites.Consequently, it is common to combine the satellite velocity measurements from a large stack of central galaxies in order to estimate an average satellite velocity dispersion, which in turn is used to infer an average host halo mass using either a virial mass estimator or a simple Jeans model (e.g., Zaritsky et al. 1993;Brainerd & Specian 2003;Prada et al. 2003;Norberg et al. 2008;Wojtak & Mamon 2013).
It has often been argued that satellite kinematics is not a reliable mass estimator for any combination of the following reasons: (a) satellite galaxies are not necessarily virialized tracers (Wang et al. 2017(Wang et al. , 2018;;Adhikari et al. 2019), (b) their orbits may well be anisotropic (Diemand et al. 2004a;Cuesta et al. 2008;Wojtak & Mamon 2013), resulting in a well-known mass-anisotropy degeneracy (Binney & Mamon 1982), and (c) the stacking that is used implies 'mass-mixing' (i.e., combining the kinematics of haloes of different masses), which muddles the interpretation of the data.In addition, the selection of centrals and satellites from a redshift survey is unavoidably impacted by impurities, incompleteness and interlopers, further complicating the analysis.Despite these concerns, a number of studies have progressively improved satellite kinematics and have shown that it can yield reliable, as well as precise, constraints on the galaxy-halo connections 3 .In particular, van den Bosch et al. (2004) demonstrated that by selecting centrals and satellites using iterative, adaptive selection criteria the impact of impurities and interlopers can be minimized.More et al. (2009b) has shown that by combining different weighting schemes one can accurately account for mass mixing, and even constrain the scatter in the stellar mass-halo mass relation (also see More et al. 2009aMore et al. , 2011)).This was significantly improved upon by Lange et al. (2019a,b) who demonstrated that kinematics of satellite galaxies from a large redshift survey such as the Sloan Digital Sky Survey (SDSS York et al. 2000) can yield constraints on the galaxy-halo connection that are complementary to, and competitive with, constraints from galaxy clustering and/or galaxy-galaxy lensing.Wojtak & Mamon (2013) were the first to analyse satellite kinematics while accounting for orbital anisotropy.Using a method first developed by Wojtak et al. (2008Wojtak et al. ( , 2009) ) they were able to simultaneously constrain halo mass, halo concentration and orbital anisotropy, albeit without accounting for mass mixing.
van den Bosch et al. (2019, hereafter paper I) developed Basilisk, a Bayesian hierarchical inference formalism that further improves on the ability of satellite kinematics to constrain the galaxy halo connection.Unlike previous methods, Basilisk does not resort to stacking the kinematics of satellite galaxies in bins of central luminosity, and does not make use of summary statistics, such as satellite velocity dispersion.Rather, it leaves the data in the raw form and computes the corresponding likelihood.Consequently, it can simultaneously solve for halo mass and orbital anisotropy of the satellite galaxies, while properly accounting for scatter in the galaxy-halo connection.In addition, Basilisk can be applied to flux-limited, rather than volume-limited samples, greatly enhancing the amount and dynamic range of the data.
Paper I also tested and validated Basilisk against mock data sets of varying complexity, and demonstrated that it yields unbiased constraints on the parameters specifying the galaxy-halo connections.However, in order to speed up the analyses, all those tests where performed using mock data samples that were only about 1/8 the size of the full SDSS sample analysed here.When we ran Basilisk on full-sized mocks instead, the model parameter uncertainties shrank considerably, as expected, revealing several significant discrepancies that turned out to be systematic.This necessitated a number of modifications to Basilisk, which we present in the first half of this paper.Most notably, we introduce significant improvements to the treatment of interlopers (i.e., galaxies that are selected as satellites but that do not reside in the same dark matter halo as the central), allowing for both a population of splash-back galaxies (Diemer & Kravtsov 2014;Adhikari et al. 2014;More et al. 2015;O'Neil et al. 2022) and a large scale infall population akin to the well-known Kaiser (1987) effect.In addition, we slightly modify the cylindrical selection criteria in order to improve the purity of our sample (i.e., reduce misclassification of satellites as centrals), we assure that the selection of secondaries around each individual primary is volume-limited, and we forward-model the contribution of impurities that arise from haloes in which the brightest galaxy is a satellite rather than the central.We also let go of the oversimplified assumption that the satellite velocity profile along any given line-of-sight is Gaussian, as was done in paper I. Rather, we now use the fourth-order Jeans equations to model the kurtosis of the line-of-sight velocity distribution (LOSVD).This enables more accurate modelling of the full phase-space distribution of satellite galaxies, and allows Basilisk to break the mass-anisotropy degeneracy.Finally, we also replace fitting binned statistics of centrals with zero (detected) satellites, as done in paper I, with a more general, Bayesian hierarchical modelling of the number of satellites around each central.Although this data on satellite abundances does not yield direct kinematic constraints on halo mass, it greatly helps to constrain the overall galaxy-halo connection.
The goal of this paper is threefold: (i) showcase the advancements in satellite kinematics methodology that we have introduced in Basilisk, and highlight its improved performance when tested against realistic SDSS-like mock data; (ii) apply Basilisk to SDSS DR7 data to simultaneously constrain the conditional luminosity functions of central and satellite galaxies, the satellite velocity anisotropy and satellite radial distribution, all with unprecedented precision, and compare those with previous constraints on halo occupation statistics; and (iii) establish Basilisk as a powerful method to infer the galaxy-halo connection which is free of halo assembly bias effects, and that is even more constraining than commonly used techniques like galaxy clustering and galaxy-galaxy lensing, when applied on data of similar volumes.

Selecting central-satellite pairs
The first step in analysing satellite kinematics is to select a sample of centrals and their associated satellites from a redshift survey.Unfortunately, this selection is never perfect; one undoubtedly ends up selecting some bright satellites as centrals (we refer to these as 'impurities') and not every galaxy selected as a satellite actually resides in the same host dark matter halo as the corresponding central (those that don't are referred to as 'interlopers').In what follows, we therefore use 'primaries' and 'secondaries' to refer to galaxies that are selected as centrals and satellites, respectively.
A galaxy at redshift  is considered a potential primary if it is the brightest galaxy in a conical volume of opening angle Θ pri ap ≡  pri ap / A () centered on the galaxy in question, and extending along the line-of-sight from  − (Δ) pri to  + (Δ) pri .Here  A () is the angular diameter distance at redshift , and (Δ) pri = (Δ pri max /) (1 + ).The parameters  pri ap and Δ pri max specify the primary selection cone.Following Lange et al. (2019a), we select the primaries in a rank-ordered fashion, starting with the most luminous galaxy in the survey.Any galaxy located inside the selection cone of a brighter galaxy is removed from the list of potential primaries.All galaxies fainter than the primary and located inside a similar cone, but defined by  sec ap and Δ sec max , centred on the primary are identified as its secondaries.Note that, although it is common to refer to these selection volumes as 'cylinders', a convention we also adopted in paper I, in actuality the selection volumes are frustums of cones.In order to rectify this confusing nomenclature, in this paper we refer to them as 'selection cones' (see Fig. 1).
The four parameters  pri ap , Δ pri max ,  sec ap , and Δ sec max control the completeness and purity of the sample of primaries and secondaries.Increasing  pri ap and/or Δ pri max , boosts the purity among primaries (i.e., it reduces the number of satellites erroneously identified as centrals), but reduces the overall completeness.Similarly, decreasing  sec ap and/or Δ sec max reduces the number of interlopers, but at the costs of a reduced number of satellites, which are the dynamical tracers of interest.Since brighter primaries typically reside in larger, more massive haloes, it is advantageous to scale the sizes of the selection cones with the luminosity of the primary (van den Bosch et al. 2004).In particular, we adopt  where  10 = /(10 10 ℎ −2 L ⊙ ), and  200 is allowed to take a maximum value of 4. The values of  pri ap and  sec ap correspond to roughly 1.65 and 0.4 times the halo virial radius, respectively, while the value for Δ sec max is large enough to include the vast majority of all satellites around primaries of the corresponding luminosity.Note that the numerical values in these selection criteria are tuned in order to optimize the selection of primaries and secondaries against impurities and interlopers.In particular, they are slightly different from the values we adopted in paper I.As detailed in Appendix A, this is done in order to reduce the fraction of impurities that are neither true centrals, nor the brightest satellites in their corresponding host haloes.These impurities are particularly difficult to account for in our forward-modelling approach and can cause a small but systematic overestimate of the scatter in the relation between halo mass and central luminosity.
The SDSS redshift catalogue, to which we apply Basilisk in this study, is a flux-limited survey.As emphasized in paper I, an important advantage of Basilisk over earlier studies of satellite kinematics, is that it is not limited to volume-limited subsamples, thereby greatly boosting the number of primaries and secondaries to be used in the analysis.However, in order to facilitate proper modelling of the number of secondaries (true satellites and interlopers) we need to assure that the selection of secondaries around each individual primary is volume limited.This is something that was not implemented in paper I, but which turned out to be important in order to avoid a systematic bias in the inferred faint-end slope of the satellite luminosity function.This effect was not significant in the smaller mock data samples used to test Basilisk in paper I, but could no longer be overlooked using data sets comparable in size to the SDSS data used here.
In this paper, we limit our analysis to primaries in the lumi- Redshift increases from left to right, and the depth of these cones are expressed as a velocity difference with respect to the primary galaxy on which the cones are centered.Note that all radii listed here are computed at the redshift of the primary.see the text for details.
nosity range 9.504 Here  0.1  is the absolute magnitude in the SDSS -band + corrected to  = 0.1.In addition, we only use data in the redshift range 0.02 ≤  ≤ 0.20.Note that the selection cone used to identify secondaries around a primary of luminosity  pri at redshift  pri extends from  front to  back , given by and Hence, as depicted in Fig. 2 we are only complete in the selection of secondaries with luminosities  s ≥  min ( back ).Here  min () is the minimum luminosity of galaxies at redshift  that make the apparent magnitude limit of our survey data (  = 17.6; see §6).In order to assure a complete, volume-limited selection of secondaries around each primary, secondaries fainter than  min ( back ) are discarded.In addition, in order to assure that the entire secondary selection cone around a given primary fits within the flux-limits of the SDSS data, we require that  pri >  min ( back ).Finally, the redshifts of the primaries are restricted to 0.034 ≤  pri ≤ 0.184, such that the  front and  back of the most luminous primaries fit within the 0.02 ≤  ≤ 0.20 limits of the entire sample.
To elucidate this further, Fig. 2 illustrates the bounds on the selection of primaries and secondaries.The solid, vertical lines at  = 0.02 and  = 0.20 mark the minimum and maximum redshifts of the entire sample, while the dashed, vertical lines at  = 0.034 and  = 0.184 mark the redshift limits allowed for primaries.Dashed, horizontal lines at log  = 11.104 and log  = 9.504 mark the luminosity cuts for primaries.The solid circles, labelled A to H, represent hypothetical primaries of three different luminosities  1 ,  2 and  3 (indicated by three different colours).The shaded rectangle associated with each primary indicates the allowed luminosity-redshift ranges of its corresponding secondaries.Since the redshift extent of the secondary selection cone scales with the luminosity of the primary, the shaded regions of fainter primaries have a smaller Δextent, as is evident from the figure.Note that these shaded regions extend down to where the apparent magnitude of the secondaries at the back-end of the selection cone is equal to the magnitude limit of the survey, which can be significantly lower than log  = 9.504, specifically for the primaries that are relatively nearby (e.g., primaries A and F).
Primary A is at the minimum allowed redshift for primaries,

𝑧
pri min = 0.034, such that the 'front' end of its secondary selection cone is equal to the minimum redshift of our survey data (i.e.,  front = 0.02).Similarly, primary C is located at the maximum allowed redshift for primaries,  pri max = 0.184, and has  back = 0.20.Primary H is also special in that it has the highest redshift possible given its luminosity.Had it been any farther away, the far end of its secondary selection cone would stick outside of the SDSS flux limit, resulting in incompleteness.The three dashed and dotted curved lines, labelled as  min ( back ( 1/2/3 )), show the lower luminosity limits for secondaries as a function of  pri , corresponding to each of the three different primary luminosities represented in the figure .For example, primaries of luminosity  2 (like primary D and E) can not have secondaries fainter than the middle dashed curve in green.This ensures that their secondaries are individually volume limited around each of those primaries.

Survey incompleteness
As any spectroscopic redshift survey, the SDSS data, from which our sample of primaries and secondaries derives, suffers from spectroscopic incompleteness due to fibre collisions and other failure modes (see Blanton et al. 2005, for details).Each galaxy in the survey is assigned a spectroscopic completeness, C spec , which indicates the fraction of spectroscopic targets in the angular region of the galaxy in question with a useful spectrum.In order to avoid primaries in regions with poor spectroscopic completeness, we remove all primaries with C spec < 0.8.
If a primary is close to the edge of the survey, such that its secondary selection cone sticks partially outside of the survey footprint, or if the secondary selection cone overlaps with a masked region, the number of secondaries may be underestimated.In order to account for this, we proceed as follows.For each primary we uniformly distribute ∼ 5 × 10 4 particles in its secondary selection cone.We then compute the fraction,  app , of those particles that are located inside the angular footprint of the SDSS, accounting for both survey edges and masked areas.In what follows we use  app, to denote the aperture completeness of galaxy .In order to avoid primaries with a poor aperture completeness, we remove all primaries with  app < 0.8.
As demonstrated in Lange et al. (2019a), it is important to correct satellite kinematics data for fibre-collision induced incompleteness.In the SDSS, spectroscopic fibres cannot be placed simultaneously on a single plate for objects separated by less than  fc ≡ 55 ′′ (Blanton et al. 2003).Although some galaxies are observed with multiple plates, yielding spectroscopic redshifts even for close pairs, roughly 65% of galaxies with a neighbour within 55 ′′ lack redshifts due to this fibre collision effect.In order to correct the data for the presence of fibre collisions, we follow Lange et al. (2019a) and start by assigning each fibre-collided galaxy the redshift of its nearest neighbour (see Blanton et al. 2005;Zehavi et al. 2005).Note that we only use these during the identification of primaries.Once the selection is complete, all fiber-collided primaries and secondaries are removed from the sample5 .In addition, each galaxy is assigned a spectroscopic weight,  spec , that is computed as follows.For each galaxy we first count the number of neighboring galaxies, , brighter than   = 17.6 within a projected separation less than 55 ′′ .Next, for all galaxies in the survey with  neighbours, we compute the fraction,  spec , of those neighbours that have been successfully assigned a redshift.Finally, all galaxies with  neighbours are then assigned a spectroscopic weight equal to  spec = 1/  spec .
In order to correct for aperture incompleteness and fibre collisions, Basilisk down-weights the expectation value for the number of secondaries around primary  (see equation [21] below), using the following correction factor: Here  spec,  is the spectroscopic weight for secondary  associated with primary .Since correcting for fibre collisions is extremely difficult on scales below the fibre-collision scale, we remove all secondaries with  p <  cut ( pri ) ≡  A ( pri )  fc .Hence, the secondary selection volumes used in the end are conical frustums with a central hole with an opening angle of 55 ′′ (see Fig. 1).As shown in paper I and Lange et al. (2019a), this combined approach of downweighting the model predictions for the number of secondaries and ignoring secondaries below the fibre-collision scale accurately accounts for incompleteness arising from fibre-collisions in the SDSS.

OBSERVABLES
Here we describe the various observables used by Basilisk in order to constrain the galaxy-halo connection.These include (i) accessible 2D phase-space parameters of primary-secondary pairs (line-of-sight velocity and projected separation), which contains the information regarding the kinematics of satellite galaxies, (ii) statistics regarding the number of secondaries per primary (including primaries with zero secondaries), which helps to constrain the halo occupation statistics, and (iii) the galaxy luminosity function.The following subsections discuss each of these observables in detail.

Satellite kinematics
For each primary-secondary pair in the sample we compute their projected separation and their line-of-sight velocity difference Here  pri and  sec are the observed redshifts of the primary and secondary, respectively,  is the speed of light, and  is the angular separation between the primary and secondary on the sky.As detailed in paper I, the main data vector used in Basilisk is given by where the union is over all  + primaries with at least one secondary.Here  sec, is the number of secondaries associated with primary , and it is made explicit that  pri, ,  pri, , and  sec, are only treated as conditionals for the data {Δ   ,  p,  |  = 1, ...,  sec, }.In other words, we consider  pri, ,  pri, and  sec, as 'given' and shall not use the distributions of these quantities as constraints on our satellite kinematics likelihood.Rather, Basilisk uses the number densities of all galaxies as additional constraints (see §3.3).The main reason for doing so is to make the method less sensitive to the detailed selection of primaries, which is difficult to model in detail.In particular, as discussed in paper I, this approach makes Basilisk fairly insensitive to details regarding the  200 () relation (equation [1]) used to define the selection cones.

Number of secondaries
The data vector D SK described above only contains primaries with at least one secondary.The complementary data vector D 0 = ({ pri, ,  pri, } |  = 1, 2, ...,  0 ) lists all  0 primaries with zero spectroscopically detected secondaries.Even though D 0 contains no kinematic data, it still provides additional constraints on the galaxyhalo connection, in particular regarding the occupation statistics of satellite galaxies.In paper I we utilized this information by computing the fraction,  0 =  0 /( 0 +  + ), of primaries, in a given bin of log  pri and  pri , that have zero secondaries.Here  0 is the number of 'lonely primaries' with zero detected secondaries, and  + is the number of primaries that have at least one secondary.As discussed in paper I, this  0 statistic provides valuable constraints on the galaxy-halo connection.However, upon closer examination we found that the binning used in this method causes small, but systematic errors in the inference.Using smaller bins was not able to solve this problem, which is why we ultimately opted for the following alternative, unbinned approach.
In line with Basilisk's philosophy to leave the data as much as possible in its raw form, rather than computing  0 on a (log  pri ,  pri )-grid, we use the following raw data vector as constraint on the model: Here the union is over (a random subset of) all  pri =  0 +  + primaries, independent of how many secondaries they have (i.e., including the primaries with zero secondaries).Since  0 ≫  + , computing the likelihood D NS for all  pri primaries is much more time-consuming than computing the likelihood for the satellite kinematics data vector (eq.[6]).Therefore, we only use a downsampled, random subset of  NS = O ( + ) primaries, where each primary has a probability equal to  + / pri to be included.In the case of the SDSS data set described in §6 this probability is 0.094.This downsampling assures that the computation of the likelihood for D NS has a CPU requirement that is comparable to that for D SK .We emphasize that our constraints are primarily driven by the satellite kinematics data.Hence, this down-sampling of the satellite abundance data has no significant impact on our constraining power of the central galaxy-halo connection or the orbital anisotropy of the satellite galaxies.It only slightly broadens the posterior constraints for some of the parameters characterizing galaxy-halo connection of satellite galaxies.

Galaxy Number Densities
The final observable that we use to constrain the galaxy-halo connection is the galaxy luminosity function, which provides important additional constraints on the CLF (e.g., Yang et al. 2003;van den Bosch et al. 2003;Cooray & Milosavljević 2005;Cooray 2006), and therefore helps to tighten the posterior in our inference problem.We use the number density of galaxies in 10 bins of 0.15 dex in luminosity, ranging from 10 9.5 to 10 11 ℎ −2 L ⊙ .These are computed using the corresponding, volume-limited subsamples, carefully accounting for the SDSS DR-7 footprint.In what follows, we refer to the data vector representing these 10 number densities as D LF .The covariance matrix of this data is computed using a jackknife estimator.In particular, we apply a recursive routine6 developed by Zhou et al. (2021), that takes into account the survey mask and window, and iteratively constructs N maximally compact, equal-area partitions of the survey footprint (see also Wang et al. 2022).We adopt N = 100 which is large enough to capture the covariance in the survey while also being small enough to assure that each subregion still hosts an adequate number of galaxies.7

METHODOLOGY
We analyse the data described above using the Bayesian, hierarchical satellite kinematics code Basilisk, which is described in detail in paper I.Here we briefly summarize its salient features and introduce a few modifications that improve Basilisk's performance.
Basilisk uses an affine invariant ensemble sampler (Goodman & Weare 2010) to constrain the posterior distribution, ( |D) ∝ L (D|) () . (8) 9.0 9.5 10.0 10.5 11.0 Illustration of the CLF used to characterize the galaxy-halo connection.Blue dashed and brown solid curves show the CLFs for central and satellite galaxies, respectively, in haloes of mass  = 10 13 ℎ −1 M ⊙ used to construct the Tier-3 mock data discussed in §5.The different parameters that characterize the exact shape of the CLF are listed inside curly brackets.
Here D = D SK + D NS + D LF is the total data vector,  is the vector that describes our model parameters, () is the prior probability distribution on the model parameters, and L (D|) is the likelihood of the data given the model.The latter consists of three parts: the likelihood L SK for the satellite kinematics data D SK , the likelihood L NS for the numbers of secondaries as described by the data vector D NS , and the likelihood L LF for the luminosity function data D LF .
In what follows we briefly describe the computation of each of these three different likelihood terms in turn.However, we first describe the model that we use to characterize the galaxy-halo connection.

Conditional luminosity function
The galaxy occupation statistics of dark matter haloes are modelled using the conditional luminosity function (CLF), Φ(|, ) d, which specifies the average number of galaxies with luminosities in the range [ − d/2,  + d/2] that reside in a halo of mass  at redshift  (Yang et al. 2003;van den Bosch et al. 2003).In particular, we write that Here and throughout the rest of the paper, subscripts 'c' and 's' refer to central and satellite, respectively, and we assume that the CLF is redshift independent, at least over the redshift range considered here (0.02 ≤  ≤ 0.20).
The CLF of centrals is parametrized using a log-normal distribution (see blue, dashed curve in Fig. 3), The mass dependence of the median luminosity, Lc , is parametrized by a broken power-law: which is characterized by three free parameters; a normalization,  0 , a characteristic halo mass,  1 , and two power-law slopes,  1 and  2 .Motivated by the fact that several studies suggest that the scatter,  c , increases with decreasing halo mass (e.g., Sawala et al. 2017;Matthee et al. 2017;Pillepich et al. 2018;Wechsler & Tinker 2018;Lange et al. 2019b), we allow for a mass-dependent scatter using Hence, the scatter is characterized by two free parameters, a normalization,  13 , that specifies the intrinsic scatter in log  c in haloes of mass  = 10 13 ℎ −1 M ⊙ , and a power-law slope  P .Note that this is slightly different from the parametrization adopted in paper I.
For the satellite CLF we adopt a modified Schechter function (see red curve in Fig. 3): Thus, the luminosity function of satellites in haloes of a given mass follows a power-law with slope  s and with an exponential cut-off above a critical luminosity,  * s ().Throughout we adopt which is motivated by the results from galaxy group catalogues (see Yang et al. 2009, and paper I).As in Lange et al. (2019b), we assume a universal value for the faint-end slope of the satellite CLF,  s , independent of halo mass.Finally, the normalization where  12 = /(10 12 ℎ −1 M ⊙ ).Note that this characterization of the CLF is very similar to that adopted in a number of previous studies (Cacciato et al. 2009(Cacciato et al. , 2013;;More et al. 2009a;van den Bosch et al. 2013;Lange et al. 2019a,b).All CLF parameters, along with parameters that characterize the satellite velocity anisotropy, and nuisance parameters used for interloper modelling, are listed in Table 1.It also includes the best-fitting values and 1 confidence intervals for all the parameters, obtained by fitting the SDSS-DR7 data.

Spatial distribution of satellites
Throughout we assume that the radial distribution of satellite galaxies is given by a spherically symmetric, generalized NFW (gNFW) profile Here R and  are free parameters and  s is the scale radius of the dark matter halo, which is related to the halo virial radius via the concentration parameter  vir =  vir / s .This gNFW profile has sufficient flexibility to adequately describe a wide range of radial profiles, from satellites being unbiased tracers of their dark matter halo ( = R = 1), to cored profiles that resemble the radial profile of surviving subhaloes in numerical simulations ( = 0, R ∼ 2).This also brackets the range of observational constraints on the radial distribution of satellite galaxies in groups and clusters (e.g., Carlberg et al. 1997;van der Marel et al. 2000;Lin et al. 2004;Yang et al. 2005b;Chen 2008;More et al. 2009a;Guo et al. 2012;Cacciato et al. 2013;Watson et al. 2010Watson et al. , 2012)).

Satellite Kinematics
The data vector for the satellite kinematics is given by eq. ( 6) and contains the projected phase-space coordinates Δ and  p of all secondaries (satellite galaxies plus interlopers) associated with the  + primaries (centrals plus impurities).We make the reasonable assumption that the data for different primaries is independent.Additionally, for a primary with more than one secondary, we assume that the phase-space distribution of the secondaries are not correlated with each other.The latter may not be entirely justified, given that satellites are often accreted in groups, which can bias halo mass estimates (Old et al. 2018).We emphasize, though, that the majority (71% in the case of the SDSS data discussed in §6) of primaries that contribute to the satellite kinematics data only have a single secondary.In addition, tests based on realistic simulation-based mocks (see §5) indicate that any potential correlations between satellites (subhaloes) that occupy the same host halo can safely be ignored (i.e., do not cause a significant systematic error in our inference).Hence, we have that Here, (Δ,  p | pri ,  pri ,  s ) is the probability that a secondary galaxy in a halo at redshift  pri , with a primary of luminosity,  pri , and with a total of  s detected secondaries has projected phasespace parameters (Δ,  p ).For true satellites, the probability is computed assuming that satellite galaxies are a virialized, steadystate tracer of the gravitational potential well in which they orbit (see §4.2.3).Throughout, we assume dark matter haloes to be spherical and to have NFW (Navarro et al. 1997) density profiles characterized by the concentration-mass relation of Diemer & Kravtsov (2015) with zero scatter.Hence, host haloes are completely specified by their virial mass, , alone8 , which implies that we can factor the likelihood as This equation describes a marginalization over halo mass, which serves as a latent variable for each individual primary, accentuating the hierarchical nature of our inference procedure.Note that the 'prior' for halo mass is informed by  pri ,  pri , and  s according to the model .Using Bayes theorem, we have In what follows we discuss each of the conditional probability functions required to compute L SK in turn.

The probability 𝑃(𝑁
The number of secondaries,  s , associated with a particular primary consists of both satellites (galaxies that belong to the same dark matter host halo as the primary), and interlopers (those that do not).Throughout we assume that the number of interlopers and the number of satellite galaxies are independent, and that both obey Poisson statistics.As shown in paper I, this implies that where is the expectation value for the number of secondaries, corrected for fibre collision and aperture incompleteness using the correction factor of equation ( 3), and with  int (, ) and  sat (, , ) as the expectation values for the numbers of interlopers and satellites, respectively.
The expectation value for the number of satellites brighter than the magnitude limit  min ( back ), in a halo of mass  at redshift  pri , that fall within the aperture used to select secondaries around a primary of luminosity  pri , is given by Note that  min is a function of  back which in turn is a function of  pri and  pri (see §2.1).Here Φ s (|) is the satellite component of the CLF given by equation ( 13) and  ap is the aperture fraction, defined as the probability for true satellites to fall within the secondary selection cylinder specified by  sec ap and Δ sec max .Given that Δ sec max is much larger than the extent of the halo in redshift space, we have that Here  vir =  vir (,  pri ) is the virial radius of the halo in question, and  max =  sec ap ( pri ) and  min =  cut ( pri ) are the outer and inner radii of the conical volume used to select the secondaries.The function nsat ( |, ) is the average, radial profile of satellites around haloes of mass  at redshift , normalized such that and More specific expressions for  sat (, , ) and  ap (, , ) are provided in paper I.
For the interlopers, one naively expects the abundance to be proportional to the number density of galaxies within the relevant range of luminosities and the volume of the secondary selection cone.However, being biased tracers of the mass distribution, galaxies are highly clustered which typically will boost the number density of galaxies in the vicinity of a bright primary.Moreover, this clustering strength is known to depend on halo mass, galaxy luminosity and redshift (e.g., Mo & White 1996;Zehavi et al. 2011), and to be affected by peculiar velocities, in particular due to large-scale infall (Kaiser 1987).We bypass the intricate complexities involved with modeling this clustering on small scales by modeling the expectation value for the number of interlopers as the product of an effective 'bias',  eff , and the expectation value for the number of galaxies with  min ( back ) <  <  pri in a randomly located conical selection volume,  cone ( pri ,  pri ): where each term on the right-hand side is a function of { pri ,  pri }. Here is the average number density of galaxies at redshift  pri with luminosity in the range [ min ,  pri ], with (, ) the halo mass function at redshift , computed using the fitting function of Tinker et al. (2008), and with  () the Hubble parameter9 .The effective bias is modelled as where  0 ,  1 , and  2 are three free nuisance parameters that fully specify our interloper bias model, and that are constrained simultaneously with all other physical parameters.This model has proved to be sufficiently flexible to accurately model the full complexity of interloper abundance in realistic simulation-based mock data (see Section 5.2).

The probability 𝑃(𝑀, 𝐿 pri , 𝑧 pri )
The function (,  pri ,  pri ) describes the probability distribution function of primaries as a function of host halo mass, luminosity and redshift, and can be written as where (, ) is the halo mass function (Tinker et al. 2008) and C(, , ) is a 'completeness', to be defined below.As in paper I, if we assume that all primaries are true centrals, then we have that ( pri |,  pri ) = Φ c ( pri |).However, in reality some primaries are misidentified satellites, and such impurities need to be accounted for.In paper I we argued that the impact of these impurities is sufficiently small that it can be ignored.Although this was indeed the case for the small mock data sets used there, the impact of impurities can no longer be ignored when using data sets similar in size to the SDSS data analysed here.In fact, detailed tests showed that they can systematically bias the inferred scatter in the relation between halo mass and central luminosity, and we therefore devised the following scheme in order to account for impurities.
The vast majority of all impurities in realistic SDSS-like mocks (such as the Tier-3 mock described in §5) are those satellite galaxies which happen to be the brightest galaxy in their halo (even brighter than their central).In what follows, we refer to these as Type-I impurities.Since primaries are by definition the brightest galaxies in their selection cones, such brightest-halo-galaxy (hereafter BHG) satellites typically end up being selected as primaries, rather than their corresponding central.In rare cases, a primary is neither a true central, nor a BHG satellite.We refer to these as Type-II impurities, which arise, for example, if the true central is the BHG but is absent from the SDSS survey data, either because of fiber collisions or because it falls outside the window of the SDSS footprint.As detailed below, Type I impurities can be accounted for in our new theoretical modelling.However, Type-II impurities are virtually impossible to model accurately.Using detailed mock data sets, we therefore tuned our selection criteria in order to minimize the contribution of Type-II impurities.In particular, we found that we were able to significantly reduce their frequency by slightly enlarging the volume of the primary selection cone as described in §2.1;In particular, the new criteria reduce the fraction of Type-II impurities from ≳ 1% when using the old selection criteria used in paper I, to ∼ 0.5% with our new selection criteria.More importantly, in mock data, the new selection criteria predominantly eliminate the presence of Type-II impurities that are extreme outliers of the average relation between halo mass and primary luminosity, and which are the main culprits for causing mild systematic errors in the inferred galaxy-halo connection (specifically in the scatter,  c ). Detailed tests with mock data, presented in §5 below, show that our new primary selection criteria sufficiently suppress the impact of Type-II impurities that it allows for unbiased estimates of the galaxy-halo connection (at least for a survey the size of SDSS).
Therefore, in what follows, we ignore Type-II impurities and assume that primaries are either true centrals or BHG satellites (i.e., Type-I impurities).Hence, we have that Here ( bs < |, ) is the probability that the brightest satellite in a halo of mass  at redshift  has a luminosity less than , which is given by Here Λ(|, ) is the expectation value for the number of satellites brighter than  in a halo of that mass and redshift, which in turn is given by Differentiating ( bs < |, ) with respect to luminosity yields: The two other terms that appear in equation ( 31) are ( c =  pri |, ), which is simply equal to the central CLF, Φ c ( pri |), and its cumulative distribution, which is given by The expression for ( pri |,  pri ) given by equation ( 31), when substituted in equation ( 30), accurately forward models the impact of the vast majority of impurities.
The final ingredient we need is an expression for the completeness C(, , ), which is defined as the fraction of haloes of mass  at redshift  =  pri with a central or brightest satellite of luminosity  pri that falls within the survey volume of the SDSS, and that is selected as a primary by our selection criteria.In general we have that C(, , ) = C( |, )C 0 (, ).As is evident from equation ( 19), the modelling in Basilisk is independent of C 0 , which drops out (see also paper I).In other words, we only need to account for the halo mass dependence of the completeness.As shown in Appendix B, this mass-dependence is already accounted for by our forward-modelling of the Type-I impurities.Hence, we set C(, , ) = 1 throughout.

4.2.3
The probability (Δ,  p |,  pri ,  pri ) In order to model the line-of-sight kinematics of the secondaries we proceed as follows.Since secondaries consist of both true satellites and interlopers, which have distinct phase-space distribution, we write with the interloper fraction defined as where  int and  tot have been individually defined in §4.2.1.We first describe how we compute  sat (Δ,  p |, , ) (in §4.2.4) before detailing our treatment of interlopers ( §4.2.5).

The phase-space distribution of satellites:
In computing the joint 2D probability  sat (Δ,  p |, , ), we assume that the baryonic matter of the central galaxy has a negligible impact on the kinematics of its satellite galaxies10 , and we model the satellites as tracers in a pure dark matter halo which is fully characterized by its halo mass and concentration.Throughout, we use the median concentration-halo mass relation of Diemer & Kravtsov (2015), and we emphasize that our modelling is fairly insensitive to the exact choice of the concentration-mass relation within reasonable bounds of its theoretical uncertainty.We also assume the central galaxy to be located at rest at the centre of the halo.As shown in paper I, relaxing this assumption by allowing for nonzero velocity bias for centrals has negligible impact on Basilisk's inference.
Under these assumptions we have that with Here  ap is defined in equation ( 23), and is the projected, normalized number density distribution of satellite galaxies.
In paper I, we made the simplified assumption that the line-ofsight velocity distribution, (Δ | p , , ), is a Gaussian, which is completely characterized by the line-of-sight velocity dispersion  los ( p |, , ).However, there is no a priori reason why the LOSVD should be Gaussian.In fact, the detailed shape of the LOSVD contains valuable information regarding the velocity anisotropy (e.g., Dejonghe 1987;Gerhard 1991;Wojtak & Mamon 2013), which we aim to constrain using Basilisk.In this work we therefore improve upon paper I by extending our modelling of the kinematics to fourth-order and by describing (Δ | p , , ) as a generalised Gaussian with a projected velocity dispersion,  los ( p |, , ), and a line-of-sight kurtosis,  los ( p |, , ).The projected, line-of-sight velocity dispersion is related to the intrinsic, radial velocity dispersion,  2  ( |, ), according to where  2  ( |, ) follows from the second order Jeans equation for a spherically symmetric NFW halo (see equation ( 50) in paper I, and Binney & Mamon 1982, for more details).Here, the local anisotropy parameter relates the tangential ( t ) and radial ( r ) velocity dispersions.
For our fiducial model we assume that  is independent of both radius and halo mass, and we constrain this 'average' velocity anisotropy using the satellite kinematics data.In § 6.4 we discuss the implications of adopting more flexible models in which the anisotropy parameter is allowed to depend on halo mass.Note that the upper-integration limit of equations ( 40) and ( 41) is set to  sp (, ) =  sp  vir (, ), instead of  vir (, ), to account for a population of splash-back galaxies (see §4.2.5).
The projected, fourth moment of the LOSVD at projected separation  p is given by where  is ( |) in general, and  4  ( |, ) follows from the fourth-order spherical Jeans equation ( Lokas 2002;Lokas & Mamon 2003), which for radius-independent anisotropy is given by: Here  () is the enclosed mass of the spherical NFW halo inside radius .Given the fourth-order line-of-sight velocity moment, we can compute the projected kurtosis as Finally, in order to account for non-zero redshift errors in the data, the line-of-sight velocity dispersion is modified according to  los →

√︃
2 los + 2 2 err , with  err = 15 km s −1 the typical SDSS redshift error (Guo et al. 2015).Having computed both the velocity dispersion and kurtosis, we model the detailed shape of the LOSVD, (Δ | p , , ), using a symmetric (all odd moments are equal to zero), generalized form of the normal distribution, known as the Langdon (1980) distribution11 : Here the parameters   and  are related to the variance,  2 , and the kurtosis, , according to The reason for using this particular distribution function is purely one of convenience;  L (Δ) has a nice analytic closed form, is simple to compute, has all the features required of a probability distribution (normalized and positive-definite), and includes the Gaussian as a special case ( = 2).
In Basilisk, we use equation ( 47) to compute   and  from  2 los ( p |, ) and  los ( p |, ), after which we compute which is properly normalized such that its integral from −Δ sec max to Δ sec max is unity.

The phase-space distribution of interlopers
In paper I we assumed that interlopers have a constant projected number density and a uniform distribution in line-of-sight velocities, so that Here  max =  sec ap () and  min =  cut () are the outer and inner radii of the conical volume used to select secondaries around primaries of luminosity  at redshift  and Δ sec max is the corresponding line-ofsight depth (see §2.1).
However, as discussed in detail in paper I, a subset of the interlopers are either infalling or splash-back galaxies and have kinematics that are very similar to the true satellites.Assuming that the velocity distribution of interlopers is uniform ignores this 'kinematically coupled interloper population', which causes Basilisk to overestimate the number of satellite galaxies.Although the resulting offsets were modest for the smaller mock samples studied in paper I, they cause a significant, systematic bias (predominantly in the satellite CLF parameters  0 ,  1 and  2 ) when using larger samples.This motivated us to develop a more sophisticated treatment for the phase-space distribution of interlopers.
Based on a detailed assessment of interlopers in our mock data sets (see paper I and §5 for details), we now model the interlopers as consisting of three fairly distinct populations: (i) a population of 'splash-back galaxies' associated with the host halo of the primary, and extending out to a distance  sp from the primary, (ii) a roughly uniform background population of 'true' interlopers that are uncorrelated with the primary, and (iii) an 'infalling' population of interlopers, located outside of the splash-back radius.This infall motion, on large linear scales, is responsible for redshift space distortions in clustering data known as the Kaiser effect (Kaiser 1987).
We assume that the phase-space distribution of splash-back galaxies can be modelled similar to that of the satellites; i.e., they follow the same  sat ( |, ), extrapolated to beyond the halo's virial radius, and their kinematics obey the same Jeans equations.The only difference is that they are located between the host halo's virial radius,  vir , and a splash-back radius  sp ≡  sp  vir .In order to account for a population of splash-back galaxies we simply change the upper-integration limit of equations ( 23), ( 40), ( 41), ( 43) and ( 44) from  vir (, ) to  sp (, ).Throughout we adopt  sp = 2, which is motivated by estimates of the splash-back radius in simulations (Diemer 2017;Mansfield et al. 2017).In addition, detailed tests with mock data sets (see §5) show that Basilisk yields unbiased estimates of the velocity anisotropy parameter  for  sp ≳ 1.5.We find that not accounting for splash-back galaxies (i.e., setting  sp = 1) results in a weak bias of , without significantly affecting any of the other parameters.On the other hand, setting a much larger value for splash-back radius, like  sp = 3, yields posteriors that are indistinguishable from those for  sp = 2. Hence, our choice of  sp = 2 is reasonable and our results are robust against modest changes in the adopted value of  sp .
We assume that both the uncorrelated 'background' interlopers (bg) as well as the 'infalling' interlopers (inf) have a uniform angular distribution on the sky, such that their phase-space distribution can be written as where the term in square brackets is normalized such that its integral from −Δ sec max to +Δ sec max is unity.Since the secondary selection volume is conical in shape, the backside has a larger volume than the front (see Fig. 1).Note that all the secondaries of any given primary have luminosities above a fixed threshold.Therefore, due to the conical selection volume, we expect the velocity distribution of the uncorrelated background interlopers,  bg (Δ), to increase with Δ.In particular,  bg (Δ) is proportional to the comoving volume of the corresponding velocity slice of the secondary selection cone, and we therefore adopt Here  ′ =  pri + (1 +  pri )Δ/ and  () is the comoving distance out to redshift .
We have experimented with modelling the line-of-sight velocity distribution of the infalling population of interlopers,  inf (Δ), using the linear (Kaiser 1987) model, but this did not yield sufficiently accurate results.We therefore opted for a semi-empirical approach.Tests with mock data (see §5) show that  inf (Δ) is accurately fit by a Gaussian,  inf (Δ) =  inf e − 1 2 (Δ 2 / 2 inf ) , with  inf and  inf free parameters that vary from primary to primary.Rather than trying to devise an analytical model for these parameters, we use the following data-driven approach.Around each primary we select a set of 'tertiary' galaxies in a conical volume similar to that used for the secondaries, but at larger projected distances from the primary.More specifically, the tertiary selection cone is specified by an inner projected radius  int min = 0.6  200 ℎ −1 Mpc and an outer projected radius  int max = 0.9  200 ℎ −1 Mpc, and by the same redshift depth as the secondary selection cone (shown by the outermost hollow annular cone in Fig. 1).Tests with mock data (see Fig. 4) indicate that (i) the line-of-sight velocity distribution of these tertiaries is virtually indistinguishable from that of the infalling interlopers among the secondaries, (ii) the results are insensitive to the exact radii of the tertiary selection cone 12 , and (iii) less than one percent 12 A weak dependence of  inf on projected radius is apparent in Fig. 4 The thick green and black lines are the LOSVDs of true satellites and of interlopers, respectively.Note that the vertical scale is logarithmic to highlight the high-velocity wings of the interloper distribution.The thin coloured lines are the LOSVDs of tertiary mock galaxies in annular rings around primaries with radial bins ranging from {  p,min ,  p,max } = {0.45,0.60} to {0.90, 1.05} ×  200 ℎ −1 Mpc, as indicated.All histograms are individually normalized to highlight their relative shapes.As is evident, the LOSVDs of these tertiaries closely resemble those of the interlopers.This empirical fact indicates that one can model the LOSVD of interlopers as that of 'tertiary' galaxies in any of these annular rings, and it is the data-driven method we introduce in Basilisk (see the text for details).
of the tertiaries are actual satellites of the primary.The latter indicates that for most primaries  int min lies well outside the virial radius of the host halo of the primary, as required.We assume that  inf and  inf are each quadratic functions of log( pri ) and  pri (including the cross-term), and determine the corresponding 6 × 2 = 12 coefficients by simultaneously fitting the velocity distribution of all tertiaries around all primaries.These coefficients are then used in Basilisk to model the line-of-sight velocity distribution of the infalling population of interlopers.
Note that this rather elaborate model for the phase-space distribution of interlopers has zero degrees of freedom.The only degrees of freedom for the interloper-modelling is with regard to their number density, which is modelled via the effective bias described by equation (29).

Modelling the number of secondaries
As mentioned in §3.2, the data vector D NS that describes the number of secondaries for a random subset of  NS primaries, including those with zero secondaries, contains valuable information regarding the occupation statistics of satellite galaxies, and hence, the CLF.
Similar in spirit to how we compute the likelihood for the satellite kinematics by marginalizing over halo mass (cf. equation [18]), has also been noted by Mamon et al. (2010).We have experimented with implementing such a  p -dependence and extrapolating this to the radial interval of the secondary selection cone, but found that this had a negligible impact on the inference.

Modelling the galaxy luminosity function
The final observational constraint that we use to constrain the halo occupation statistics is the comoving number density of SDSS galaxies,  gal ( 1 ,  2 ) in ten 0.15 dex bins in luminosity, [ 1 ,  2 ] covering the range 9.5 ≤ log /(ℎ −2 L ⊙ ) ≤ 11.0 (see §3.3).We include this data in our inference problem by defining the corresponding log-likelihood Here n obs is the data vector and n() is the corresponding model prediction, computed from the CLF and the halo mass function using where  SDSS = 0.1 is a characteristic redshift for the SDSS data used, 13 and  is the precision matrix, which is the inverse of the covariance matrix, with Hartlap et al. (2007) correction (see §3.3).

Numerical implementation
The fiducial model used by Basilisk is characterized by a total of 16 free parameters: 6 parameters describing Φ c (|) (namely log  1 , log  0 ,  1 ,  2 ,  13 and  P ), 4 parameters describing Φ s (|) (namely  s ,  0 ,  1 , and  2 ), 1 parameters () to quantify the average velocity anisotropy of satellite galaxies, 2 parameters ( and R) that describe the radial number density profiles of satellite galaxies (see equation [16]), and 3 nuisance parameters ( 0 ,  1 and  2 ) that specify the abundance of interlopers (see equation [29]).We assume broad uniform priors on all parameters, except for .The value of the anisotropy parameter  formally ranges from −∞, for maximal azimuthal anisotropy, to +1, for maximal radial anisotropy, which is difficult to probe with our MCMC sampler.Hence, in order to assure roughly equal amounts of parameter space for radially and azimuthally anisotropic models, we sample B ≡ − log(1 − ), rather than .In particular, we adopt uniform priors over the range −1 ≤ B ≤ 1, which corresponds to −9 ≤  ≤ 0.9.Probing the posterior ( |D) over the fiducial 16-dimensional parameter space requires close to a million likelihood evaluations, each of which involves thousands of numerical integrations (see paper I for details).In order to make this problem feasible we perform the Bayesian inference under the assumption of a fixed normalized, radial number density distribution of satellite galaxies, i.e., fixed values for  and R.This has the advantage that  ap (, , ) and  int (Δ,  p |, ) are all independent of the model, , while  sat (Δ,  p |, , ) only depends on a single anisotropy parameter (see §4.2.4).We compute  sat (Δ,  p |, , ) for each centralsatellite pair for 10 values of B between −1 and 1 (or  between −9 and 0.9), and we interpolate it for intermediate values.Combined with the fact that we perform all integrations over halo mass using Gaussian quadrature with fixed abscissas (see paper I) implies that we only need to compute all these quantities once for each primary and/or secondary, which we then use to find the posterior ( |D) in the 14-dimensional parameter space at fixed (, R).As a consequence, for a single evaluation of the full likelihood The MCMC sampler used to probe our 14 dimensional parameter space is the affine invariant stretch-move algorithm of Goodman & Weare (2010).Throughout we use 1,000 walkers and the proposal density advocated by Goodman & Weare (2010).This results in typical acceptance fractions between 0.3 and 0.4, and the MCMC chain is typically converged after about 500 steps (i.e., 5 × 10 5 likelihood evaluations).We have experimented at length with other initial guesses, and find the results to be extremely robust, and to always fully converge well under 1 million likelihood evaluations.
Finally, throughout we adopt flat priors on all parameters, with very broad prior bounds that do not affect our inference.

Tier-3 mock data
We validate the performance of Basilisk using the Tier-3 mock data introduced in paper I.This mock sample is constructed using the  = 0 halo catalogue of the high-resolution SMDPL simulation (Klypin et al. 2016), which uses 3840 3 particles to trace structure formation in a cubic volume of (400ℎ −1 Mpc) 3 , adopting cosmological parameters consistent with Planck Collaboration XVI (2014).Each host halo in the catalogue with a mass  vir ≥ 3 × 10 10 ℎ −1 M ⊙ is populated with mock galaxies with luminosities  ≥ 10 8.5 ℎ −2 L ⊙ according to a particular fiducial CLF model.Each central galaxy is given the position and velocity of its halo core, defined as the region that encloses the innermost 10% of the halo virial mass.Satellite galaxies are assigned the phase-space coordinates of the subhaloes with the highest peak halo masses.If the number of satellites, drawn from the input CLF, exceeds the number of resolved subhaloes in a specific host halo, we randomly assign the excess satellites the halo-centric positions and velocities of subhaloes hosted by other haloes of similar mass.Note that no assumption of quasi-equilibrium dynamics has been made in the mock making procedure.Therefore, our Tier-3 mock satellites obey the Jeans equations only as much as the live subhaloes do in the SMDPL simulation.Results are shown for only those 10 parameters that characterize the CLF.Panels along the diagonal show marginalized 1D posteriors while off-diagonal panels show 2D posteriors.In the latter case, contours demarcate the 68 and 95 % containment of the posterior, while the blue dashed lines indicate the true input values used to create the mock data set.Brown contours correspond to our fiducial model that assumes a mass-independent velocity anisotropy, ; and the green contours show the constraints for a model in which  is allowed to depend on halo mass (discussed in §5.3).The posteriors for all parameters are in good agreement with the input values, and virtually independent of whether the velocity anisotropy is mass dependent or not.For all parameters we adopted very wide and flat prior with bounds that have no impact on the posteriors.
Once all haloes have been populated with mock galaxies, we construct a mock SDSS survey as follows.First, we place a virtual observer at a random position within the simulation volume.We use this virtual observer to convert the (, , ) coordinates of each galaxy into a cosmological redshift,  cosm , and sky coordinates (using a random orientation).If necessary, the simulation box is repeated with random sets of right angled rotations until the entire cosmological volume out to  cosm = 0.20 is covered.Next, we overlay the SDSS DR7 footprint on the simulated sky, and only keep galaxies with   ≤ 17.6 that lie within the SDSS DR7 survey window.Redshift-space distortions are simulated by adding (1+ cosm ) los / to the redshift of each galaxy, with  los the galaxy's peculiar velocity along the line-of-sight.Spectroscopic redshift errors in the SDSS are simulated by adding a random Δ from a Gaussian with scatter  err = 15 km s −1 (Guo et al. 2015).Finally, we simulate the effect of fibre collisions induced spectroscopic incompleteness following the method of Lange et al. (2019a).Once the mock spectroscopic survey is completed, we select primaries and secondaries using the selection cones described in §2.1, and assign spectroscopic weights to all secondaries using the method described in §2.2.Similar to what we do for the real data, we remove primaries with an aperture completeness  app < 0.8 and exclude secondaries that are located within 55 ′′ from their primary.Finally, we use the mock data to compute the comoving abundances of galaxies in the ten luminosity bins described in §3.3 using the same method as used for the real SDSS data (i.e., by taking into account the SDSS DR7 footprint with its mask and window functions as well incompleteness caused by fiber collision).

Inference from the Tier-3 mock
The Tier-3 mock data described above is used to test the performance of Basilisk.As described in §4.5, we first determine the best-fit values of  and R, which characterize the radial number density profile of satellite galaxies, properly marginalized over all other model parameters.In the Tier-3 mock, satellites are placed in subhaloes which are known to have a radial profile that differs starkly from that of the dark matter.The true radial number density distribution closely follows the generalised NFW shape (equation 16) with (, R) ≈ (0.0, 2.6).Hence, the radial profile of satellites, in our Tier-3 mock, is cored and has a scale-radius that is significantly larger than that of their dark matter host haloes.This feature of the DM-only simulation is consistent with many previous studies (e.g., Ghigna et al. 1998;Diemand et al. 2004b;Springel et al. 2008;Jiang & van den Bosch 2017).The best-fit values obtained by Basilisk, when applied on the Tier-3 mock data is (, R) = (0.0, 2.5), in almost perfect agreement with the profile inferred directly from the -body simulation.Thus, in agreement with what we reported in paper I, Basilisk can accurately recover the radial profile of satellite galaxies.Next, we keep (, R) fixed at the best-fit values, (0.0, 2.5), and run Basilisk to infer the posterior distribution of the remaining 14 parameters.The brown lines in Fig. 5 show the CLF constraints thus obtained for our fiducial model14 .Note that the posteriors of all CLF parameters are in excellent agreement with the input values, shown as blue dashed lines.For comparison, the posteriors indicated in green correspond to a model that allows for mass-dependence of the orbital anisotropy, and will be discussed in §5.3.Panel (a) of Fig. 6 demonstrates that Basilisk accurately fits the galaxy luminosity function.The blue circles show the mean number density of galaxies in logarithmic bins of luminosity of width 0.15 dex (roughly the width of the blue vertical bars), while the brown band indicates the 95% confidence interval as inferred from Basilisk's posterior.b) shows the relationship between halo mass and median central galaxy luminosity.Note that the true input model (blue squares) is perfectly recovered by Basilisk (brown band, indicating the 95 % confidence interval from the posterior).The pink dots indicate the true halo masses and luminosities of all primaries (both true centrals and impurities) in the Tier-3 satellite kinematics sample.The magenta triangles show their median values of log  pri in bins of log .Note that, due to selection bias, these do not agree with the true input model (blue squares).A more luminous primary has a larger secondary selection volume associated with it, as well as a larger luminosity range between  pri and  min ( back ).Therefore, at any given redshift, brighter primaries are more likely to have at least one secondary than less luminous primaries in haloes of the same mass, hence the brighter primaries get preferentially selected in the D SK data.This causes the median luminosity of primaries in the satellite kinematics sample to be biased high with respect to that of true centrals.This effect is especially pronounced at the low halo mass end, where the expectation value for the number of secondaries is lowest.Despite this strong and unavoidable selection bias, Basilisk perfectly recovers the true input relation between central luminosity and halo mass, and not the biased relation!This indicates that Basilisk, in its forward modelling, accurately accounts for selection bias and other systematics such as the presence of impurities.
Panels (c) and (d) of Fig. 6 show, respectively, the logarithmic scatter in central galaxy luminosity at fixed halo mass,  c , and the normalization of the satellite CLF,  * s .Once again, blue dots show the true input values, while the brown shaded regions mark the 95% confidence interval of the posterior distribution inferred with Basilisk.Finally, the brown-shaded histograms in panels (e) and (f) show, respectively, the posterior distributions for the faint-end slope of the satellite CLF,  s , and that of the orbital anisotropy parameter, .The blue, vertical line in panel (e) indicates the true input value of  s , while the blue-shaded region in panel (f) show the range of mean velocity anisotropy of subhaloes in different halo masses in the SMDPL simulation used to construct the Tier-3 mock (discussed further in §5.3).As is evident, in each case the posterior constraints are in excellent agreement with the input values, which indicates that Basilisk can put tight and accurate constraints on the more intricate aspects of the galaxy-halo connection, beyond the mere relation between halo mass and central luminosity.
Fig. 7 shows the interloper fraction,  int , as a function of primary galaxy luminosity in six redshift bins (indicated by different colours).Solid circles indicate the true interloper fractions in the Tier-3 mock sample with the error bars computed assuming Poisson statistics.The coloured bands show the corresponding posterior predictions as inferred by Basilisk, which are in good agreement with the true interloper fractions.Hence, Basilisk correctly distinguishes satellites from interlopers, at least in a statistical sense, and accurately recovers their relative prevalence as a function of luminosity and redshift of the primary.
Finally, Figs. 8 and 9 compare the posterior predictions for the HOD and CLF, respectively, to their true input used to construct the Tier-3 mock data (solid dots).In particular, Fig. 8, shows the average number of central (circles) and satellite (triangles) galaxies per halo, above four different luminosity thresholds ( th ), as a function of host halo mass.Interestingly, the true satellite mean occupation in our mock data deviates significantly from a simple power-law (which is a common assumption in the literature), and Basilisk accurately recovers that complexity in its shape across all luminosities and halo masses.each panel.Note that in all cases the halo occupation statistics are recovered with exquisite precision and accuracy.

Velocity Anisotropy in the Tier-3 Mock
Unlike previous studies of satellite kinematics, which used satellite velocity dispersions in bins of primary luminosity, a unique aspect of Basilisk is that it models the full probability distribution (Δ,  p |, , ).By modelling the full 2D phase-space information, rather than just the second moment of velocity, Basilisk has the potential to break the mass-anisotropy degeneracy that hampers dynamical models which only rely on measurements of the satellite velocity dispersions.Hence, Basilisk is able to simultaneously constrain the halo masses of the primaries, as well as the velocity anisotropy of its secondaries.
The brown shaded contours in Fig. 10 show the 68 and 95 percent confidence intervals on the orbital anisotropy parameter  as inferred by Basilisk from the Tier-3 mock data.Note that the constraints ( = 0.11 ± 0.05) are remarkably tight and indicate a mild, radial anisotropy.Recall that the Tier-3 mock was constructed by placing satellite galaxies inside subhaloes in the SMDPL simulation.Hence, these satellite galaxies have the same orbital anisotropy as those subhaloes.The solid, black line indicates the average anisotropy parameter of subhaloes in the SMDPL simulation as a function of host halo mass.It, too, indicates a mild radial anisotropy in reasonable agreement with the constraints obtained with Basilisk.However, this comparison is not entirely fair.After all, our secondary selection criteria typically only selects secondaries with projected separations  sec ap < ∼ 0.4 vir (see §2.1).Therefore, it is more meaningful to compare our posterior constraints with the orbital anisotropy of subhaloes located in the inner regions of their host haloes.The blue, solid line in Fig. 10 shows the orbital anisotropy of subhaloes with a 3D halo-centric radii less than 0.4 vir .These are in better agreement with Basilisk's posterior constraints, especially given that the satellite kinematics data is dominated by haloes above 10 12 ℎ −1 M ⊙ .Hence, we conclude that Basilisk, by using the full line-of-sight velocity distributions of the secondaries, is indeed able to break the mass-anisotropy degeneracy and obtain simultaneous, reliable constraints on both halo mass and orbital anisotropy.
Thus far we have only considered models in which the orbital anisotropy is a 'universal' constant, independent of halo mass or halo-centric radius.However, a detailed analysis of the orbital anisotropy of dark matter subhaloes in the SMDPL simulation (and hence our Tier-3 mock satellites) reveals a rather complicated dependence on both halo mass and halo-centric radius (see Fig. C1 in paper I).Indeed, the blue and black curves in Fig. 10 indicate some dependence on halo mass, albeit weak.We therefore also analysed the Tier-3 mock data using a more flexible model with a mass-dependent orbital anisotropy parameter, given by () = 1 − 10 − B (  ) , with Here both B 12 and B 14 are free parameters for which we adopt uniform priors ranging from −1 to +1.Hence, by replacing our fiducial model, in which  is independent of halo mass, with this mass-dependent model we add one extra free parameter to the mix.Remarkably, we find that this extra degree of freedom has no discernible impact on the constraints of any of the other param- eter.This is evident from Fig. 5, where the green contours show the posterior constraints from the mass-dependent  model, which are indistinguishable from those of our fiducial model (brown contours).The green shaded contours in Fig. 10 show the 68 and 95 percent confidence intervals on () of the corresponding model.It reveals a weak hint for the orbital anisotropy to become more radially anisotropic in lower mass haloes, in agreement with the weak trend for the SMDPL subhaloes with  < 0.4 vir .However, the uncertainties at the low mass end are rather large, and Basilisk's inference is consistent with a constant, mass independent  at the 2 level.

The Data
Having demonstrated the Basilisk can accurately infer the galaxyhalo connection from kinematic data of primaries and secondaries that can be extracted from a galaxy redshift survey, we now apply Basilisk to data from the SDSS.In particular, we use the New York University Value-Added Galaxy Catalog (VAGC; Blanton et al. 2005), which derives from the Seventh Data Release of the SDSS (SDSS DR7; Abazajian et al. 2009).More specifically, we use the VAGC bright0 sample 15 , which includes ∼ 570, 000 galaxies with a limiting Petrosian magnitude of   < 17.6.We use this data to identify primaries and secondaries using the selection criteria outlined in §2.1.As already mentioned there, we limit our analysis to galaxies in the redshift range 0.02 ≤  ≤ 0.20.In order to assure that the secondary selection cones fit entirely within this redshift range, the redshifts of primaries are restricted to 0.034 ≤  ≤ 0.184 (see Fig. 2).Primaries are also limited to have luminosities in the 15 http://sdss.physics.nyu.edu/lss/dr72/bright/0/range 9.504 ≤ log[ pri /(ℎ −2 L ⊙ )] ≤ 11.104.We end up with a total of ∼ 165, 000 primaries16 , of which  + = 18, 373 primaries have at least one secondary.The total number of secondaries, and thus the total number of primary-secondary pairs for which kinematic data is available, is 30, 431.
The upper left-hand panel of Fig. 11 shows the luminosity distributions of all primaries (blue), primaries with at least one secondary (orange), and secondaries (green).Note that primaries with at least one secondary are brighter, on average, than those with zero secondaries.This simply follows from the fact that, on average, brighter centrals reside in more massive haloes, which host more satellites.Also, given a fixed halo mass the brighter primary is assigned a larger secondary selection volume, making it more likely to have a secondary.This latter effect, though, is subdominant.Note also that there are no satellites with  < 10 9 ℎ −2 L ⊙ .As discussed in §2.1, this is a consequence of the apparent magnitude limit of the SDSS survey combined with the fact that we only allow for primaries brighter than 10 9.504 ℎ −2 L ⊙ .The upper right-hand panel shows the probability,  0 , that a primary of luminosity  pri contains zero secondaries.It is simply defined as the fraction of primaries, in a given luminosity and redshift bin, with zero secondaries, i.e.,  0 =  0 /( 0 +  + ), where  0 and  + are the number of primaries in our sample with zero and at least one detected secondary, respectively.These probabilities have been computed using a 8 × 6 uniformly-spaced grid in (log  pri ,  pri ) covering the ranges [9.504, 11.104] and [0.034, 0.184], respectively.Different colours correspond to different redshift bins, as indicated.We emphasize that this binned data are not used in our inference; it is merely used here to illustrate how  0 scales with luminosity and redshift.Errorbars are computed assuming Poisson statistics, and are smaller than the data points in most cases.Note that  0 increases with decreasing luminosity and increasing redshift, as expected from the Malmquist bias resulting from the apparent magnitude limit of the spectroscopic SDSS data.The lower left-hand panel shows the multiplicity function, i.e., the number of primaries each with  sec secondaries.Note that most primaries have zero secondaries, and that there are very few primaries with more than 20 secondaries.Finally, the lower right-hand panel shows the distribution of the aperture completeness  app for all primaries.Note that the vast majority of primaries have  app = 1, i.e., their entire secondary selection cone completely falls within the SDSS survey footprint.As mentioned in §2.2, primaries with  app < 0.8 have been removed from the sample.
Figure 12 plots the line-of-sight velocity difference Δ as a function of the luminosity of the primary galaxies (left-hand panel) and as a function of the projected distance between primarysecondary pairs (right-hand panel).Data points are colour-coded according to the redshift of the primary, as indicated.This constitutes the satellite kinematics data in SDSS, that we attempt to forward-model with Basilisk in order to constrain the galaxy halo connection.The deficit of data points at small  p reflects that we have removed secondaries with a projected separation less than  fc = 55 ′′ because of fiber collision issues (see §2.2).Similarly, the absence of data points for low  pri and large |Δ | reflects the luminosity dependence of the secondary selection criteria (see §2.1).Evidently, that the velocity dispersion of secondaries is a strong function of primary luminosity, consistent with the expectation that more luminous centrals reside in more massive dark matter haloes.The low-density, high velocity wings of (Δ) at any given  pri  reflects the contribution of foreground and background interlopers, i.e., galaxies selected as secondaries that do not reside in the dark matter halo of the primary.

Results
We start our analysis of the SDSS data presented above by constraining the satellite radial number density profile (equation [16]).Using a 15 × 15 grid in (, log R)-space, we obtain (, R) = (0.94, 1.7) (see Appendix C for details), in good agreement with the results of Lange et al. (2019b) and Wojtak & Mamon (2013).Note that this central slope, , is significantly steeper than what we inferred for the Tier-3 mock data ( = 0.0, see §5.2), in which the satellites were positioned on subhaloes in numerical simulations.This indicates that the radial distribution of real satellite galaxies is more centrally concentrated than that of subhaloes in DM-only numerical simulations.This discrepancy is most likely is due to a combination of artificial disruption in simulations (Peñarrubia et al. 2010;van  ) and failures of the subhalo finders being used (e.g., Knebe et al. 2011;Han et al. 2012;Diemer et al. 2024).
Next, keeping R and  fixed at 1.7 and 0.94, respectively, we run Basilisk to constrain the posterior distribution of the remaining 14 parameters that characterize the CLFs of central and satellite galaxies, the interlopers, and the average orbital anisotropy of satellite galaxies.Once again, we adopt very broad non-informative priors for all parameters.Table 1 lists the best-fit parameters plus their 68% confidence intervals thus obtained.We emphasize that, as shown in paper I and Appendix C, all these results are extremely robust to modest changes in R and .
Before showing the key results on the CLF, we first demonstrate that the best-fit model of Basilisk is an excellent fit to the data.In order to illustrate this, we bin the data in 2D-bins of luminosity and redshift of the primaries.We emphasize that no such binning was used in the analysis; it is merely used here for the purpose of visualization of the data and its corresponding prediction from Basilisk.The various panels in Fig. 13 show the LOSVDs of primary-secondary pairs for bins in log  pri (different rows) and  pri (different columns).We only show panels for which the luminosity lower bound of the {log  pri ,  pri } bin falls above the flux limit at the redshift upper bound of that bin.Blue dots and shaded histograms show the stacked data, while the red shaded bands show to 95% confidence intervals obtained using the inferred posterior distributions.In order to quantify the level of agreement between the data and the model, we proceed as follows.Let  s ( pri ,  pri ) be the actual number of secondaries in the SDSS data for each of the various {log  pri ,  pri } bins.Using the best-fit model's predicted (Δ) for each {log  pri ,  pri } bin we draw  s ( pri ,  pri ) values of Δ, and compute the likelihood of this fake data representing the best-fit model.We repeat this 10 4 times.The red-shaded histogram in the inset-panel in the lower-right corner of Fig. 13 shows the resulting distribution of likelihoods, which we then compare to the analogous likelihood for the actual SDSS data (blue, vertical dashed line).The fact that the latter is perfectly consistent with the distribution of likelihoods (red histogram) indicates that the LOSVDs obtained from Basilisk are in excellent agreement with the data, fitting not only the roughly Gaussian LOSVDs centered on Δ = 0, but also the extended wings due to the interlopers.
Fig. 14 uses the same (log  pri ,  pri ) binning and panels17 as Fig. 13, but this time we plot the distributions of the numbers of secondaries per primary (red dots with Poisson errorbars).More specifically, the -axis indicates the number of secondaries,  sec , and the -axis indicates log(1 +  pri ), where  pri is the number of primaries that each have  sec secondaries.In most cases the distributions clearly peak at  sec = 0, as most primaries in SDSS DR7 do not have a spectroscopically detected secondary.The only exceptions are a few high-log  pri bins.Once again, in each panel the red shaded bands indicate 95% confidence intervals obtained using the posterior distributions inferred with Basilisk.The insetpanel in the lower-right compares the likelihood of the SDSS data -1000 0 1000 Here  pair is total number of primary-secondary pairs in the SDSS dataset.Note that the model inference is an excellent match to the data.
given the best-fit model, to the distribution of expected likelihoods computed by drawing 10 4 random realisations of the best-fit multiplicity function.This time the likelihood of the SDSS data falls at the edge of the expected range, indicating that the fit to the data is not optimal.Indeed, upon closer inspection one can notice that the best-fit model overpredicts the multiplicity of primaries with  sec ∼ 3 − 6, especially for some intermediate  pri and  pri bins.
As we demonstrate in a forthcoming paper, this small discrepancy arises from certain limitations in the satellite CLF model, and can be resolved by adopting a slightly more flexible halo-occupation modelling without significantly affecting any of the main relations presented here.Fig. 15 shows several key halo mass dependencies that characterize the galaxy-halo connection inferred by Basilisk from the SDSS data.In each panel, the brown shaded bands show the inferred 95 % confidence intervals, while the coloured symbols show bestfit constraints from previous SDSS-based studies 18 .In particular, we compare our inference to the results from an analysis of galaxy group catalogues by Yang et al. (2008), to results based on a simultaneous analysis of galaxy clustering and galaxy-galaxy lensing by Cacciato et al. (2013), and to results of the most recent analysis of satellite kinematics by Lange et al. (2019b).Note that the latter did not use the Bayesian hierarchical methodology of Basilisk, but rather was based on the standard summary statistics of host- 18 Where needed we have converted these results to our definition of halo mass.
weighted and satellite-weighted velocity dispersions as a function of binned primary luminosity.
Panel (a) plots the median central luminosity,  c , as a function of halo mass.As is evident, the constraints obtained with Basilisk are in excellent agreement with previous results, though we emphasize that our constraints are significantly tighter.
Panel (b) plots the posterior constraints on  c (), characterizing the scatter in log  c at fixed halo mass.Most studies in the past have assumed  c to be independent of halo mass, and inferred values that lie roughly in the range 0.15 − 0.2 dex (e.g.,More et al. 2009a;Cacciato et al. 2013;Shankar et al. 2014).Basilisk, on the other hand, allows for a mass dependence as characterized by equation ( 12).Yet, despite this extra degree of freedom, our inference is statistically consistent with a constant  c = 0.17 dex over the entire range of halo masses probed.Note that this is different from Lange et al. (2019b) who, using a similar mass-dependent characterization of the scatter in the log  c -log  relation, inferred that  c increases with decreasing halo mass, as depicted by the blue circles in Panel (b).As discussed in more detail in §6.3, the reason for this discrepancy can be attributed to the fact that all previous analyses, including Lange et al. (2019b), invariably assumed the brightest halo galaxy to be the central.
Panel (c) shows the posterior constraints on the the faint-end slope,  s , of the satellite CLF.Throughout we have assumed a global, mass-independent  s similar to Cacciato et al. (2013), Lange et al. (2019b) and most other previous work.Our inference that  s = −0.87 ± 0.06 is in excellent agreement with Lange et al. (2019b), and is largely consistent with the constraints obtained by Yang et al. (2008) and Cacciato et al. (2013)   though, that Yang et al. (2008) inferred that  s becomes significantly steeper at the massive end, reaching values as low as −1.5 for groups with an inferred halo mass  > ∼ 3 × 10 14 ℎ −1 M ⊙ .In future work we plan to allow for a mass-dependent  s , to see if satellite kinematics reveal a similar mass dependence as that inferred from the galaxy group catalogue of Yang et al. (2005a), and to study how this extra degree of freedom impacts the other parameters that characterize the galaxy-halo connection.
Finally, panel (d) of Fig. 15 shows the constraints on the normalization,  * s , of the CLF of satellite galaxies, as a function of host halo mass.The constraints obtained by Basilisk, as depicted by the brown bands, are in fair agreement with previous constraints, especially if the uncertainties on the latter are taken into account (note that the coloured symbols only indicate the best-fit values).At the low mass end, the results of Lange et al. (2019b), which are also based on satellite kinematics, seem to suggest significantly larger values of  * s (i.e., more satellites per halo).This is likely due to the fact that Lange et al. (2019b) have assumed that interlopers have a uniform distribution of line-of-sight velocities, (as did many other previous studies, such as McKay et al. 2002;Prada et al. 2003;van den Bosch et al. 2004;Conroy et al. 2007;Norberg et al. 2008;More et al. 2009aMore et al. , 2011)).As discussed in §4.2.3, this oversimplified assumption implies that some of the splash-back galaxies and infalling interlopers, which have a LOSVD that resembles that of true satellites, will be incorrectly 'counted' as satellite galaxies.Because of this, and since we have demonstrated that Basilisk can accurately recover the interloper fraction as well as the CLF normalization,  * s (), we reckon that our results for the abundance of satellite galaxies are likely to be more accurate.In all fairness, we emphasize that most previous studies of satellite kinematics were not aiming to accurately recover satellite abundances; rather they mainly focused on constraining halo masses as a function of primary galaxy luminosity.As discussed in detail in van den Bosch et al. (2004), this aspect of the analysis is not significantly impaired by the oversimplified assumption that the LOSVD of interlopers is uniform.

Scatter in central galaxy luminosity
Different empirical estimates of the galaxy-halo connection, at present, broadly agree on the relation between central galaxy luminosity (or stellar mass) and halo mass.However, the constraints on the scatter in this relation, especially as a function of halo mass, has yet to attain similar convergence, and therefore is considered a key parameter at the forefront of empirical modelling (see e.g., Wechsler & Tinker 2018) and is highly informative for testing physical models (see e.g., Porras-Valverde et al. 2023).
Basilisk's inference of the logarithmic scatter in central luminosity,  c ( vir ), shows no significant halo mass dependence despite having the freedom in the model.Fig. 16 compares our constraints on  c (grey band) with estimates from previous studies (all error-bars and uncertainties in this plot are 68% confidence intervals).Crucially, our constraints disagree with Lange et al. (2019b), who also used satellite kinematics extracted from the SDSS DR-7 data.They split their sample into red and blue centrals, and the In each panel, the brown shaded regions show the 95 per cent confidence intervals inferred by Basilisk.The coloured symbols show constraints from previous studies using galaxy group catalogues (Yang et al. 2008), galaxy clustering (Cacciato et al. 2013) and satellite kinematics (Lange et al. 2019b).From top to bottom, the panels plot the median luminosity of centrals ( Lc ), the log-normal scatter in central luminosity ( c ), the faint-end slope of the satellite CLF ( s ), and the normalization of the satellite CLF ( * s ), each as a function of halo mass.see the text for a detailed discussion.
scatter in the two sub-populations are shown by the shaded regions of corresponding colour.At the high halo mass end, their red-fraction of centrals approaches unity, and thus the red-shaded region should be a good approximation of the overall scatter.As is evident, it reveals weak but significant mass-dependence with d c /d log  ≈ −0.04, which, at first sight appears inconsistent with our results.However, Lange et al. (2019b), as all other previous studies, have simply assumed that their primary, defined as the brightest galaxy in the selection cone, is always the central galaxy.
Hence, their  c has to be interpreted as the scatter in the brightest halo galaxy, log  BHG , rather than that in log  c .Basilisk, on the other hand, accounts for type-I impurities, which are BHG-satellites that are misclassified as primaries.In particular, we have demonstrated that by directly forward mod- Basilisk (grey band), shows no mass dependence.This is in tension with the scatter for red and blue centrals, as inferred by a recent analysis of satellite kinematics in the SDSS by Lange et al. (2019b), shown by the red and blue shaded regions.This tension can be resolved by noting that Lange et al. (2019b) assumed that all their primaries are centrals.Consequently, their constraint on  c is really a constraint on the scatter in the brightest halo galaxies (BHGs).The black dashed line shows the predicted scatter in log  BHG for our best-fit model, computed using Basilisk.Note that this is in excellent agreement with the results of Lange et al. (2019b) for the red centrals, which are the dominant population of centrals in massive haloes.For comparison, the yellow hatched region shows the constraints on  c (assumed to have no mass dependence) obtained by Cacciato et al. (2013) based on an analysis of galaxy clustering plus galaxy-galaxy lensing, while the red and blue plus signs show the constraints for red and blue centrals, respectively, inferred by Yang et al. (2008) using a galaxy group catalogue.All uncertainty bands and error-bars shown correspond to 68% confidence intervals.see the text for a more detailed discussion.elling these BHG-satellites, Basilisk's inferred  c is an unbiased estimate of the intrinsic luminosity scatter of true centrals.The probability that the BHG is a satellite, rather than a central, increases strongly with halo mass (Skibba et al. 2011;Lange et al. 2018).Therefore, the inferred scatter at the high mass end, from studies that did not account for type-I impurities, may have been biased.We can directly test this with Basilisk, which makes it straightforward to compute the expected scatter in log  BHG and compare it to that in log  c .The black dashed curve in Fig. 16 shows the predicted scatter in BHG luminosity,  BHG , as a function of host halo mass for our best-fit CLF model.Note that  BHG drops significantly below  c at the high-mass end.This is because it is mostly the fainter centrals that are 'replaced' by a brighter satellite, which causes the distribution of BHG luminosities to be narrower than that of the true centrals.For  > ∼ 10 13.5 ℎ −1 M ⊙ , the massdependence of the inferred  BHG is in excellent agreement with Lange et al. (2019b) (recall that the vast majority of all centrals in massive haloes are red, and the comparison should thus be with the red-shaded region).Above 10 13 ℎ −1 M ⊙ , the black dashed line also shows improved agreement with previous results from an analysis of galaxy clustering and galaxy-galaxy lensing by Cacciato et al. (2013), who assumed that  c is mass-independent, and that from an analysis of a SDSS galaxy group catalogue by (Yang et al. 2008).
As is evident from Fig. 16, the various results disagree strongly at the low-mass end ( < ∼ 10 13 ℎ −1 M ⊙ ).This, however, has to be interpreted with caution, as none of the constraints are particularly reliable there.For example, the results of Yang et al. (2008) can not be trusted at the low-mass end, since their halo masses are estimated from the total group luminosity.Since this is dominated by the central luminosity in low mass haloes, their inferred scatter at the low mass end is guaranteed to be an underestimate (see e.g., Campbell et al. 2015).In the case of Lange et al. (2019b), most of the constraining power comes from haloes with  ≥ 10 13 ℎ −1 M ⊙ .As they assume a simple linear dependence of  c on log , the constraints at the low-mass end mainly reflect extrapolation of the assumed linear relation.Our results are also affected this way, but less so, since we have used a flux-limited sample, rather than a more restricted volume-limited sample as in Lange et al. (2019b).This allows for a better sampling of fainter centrals, that reside in lower-mass haloes.

Orbital Anisotropy of Satellite Galaxies in SDSS
The brown contours in Fig. 17 show the 68 and 95 percentile constraints on the orbital anisotropy parameter, , for satellite galaxies in the SDSS data as inferred from our fiducial model.We infer a significant radial anisotropy with  = 0.29 +0.05 −0.04 .These global constraints on the average orbital anisotropy of satellite galaxies, across a large range of halo masses, are perfectly consistent with, but significantly tighter than, the results of Wojtak & Mamon (2013) who also analysed the kinematics of satellite galaxies in SDSS data to infer  = 0.26 ± 0.09.We emphasize that, unlike Basilisk, the analysis of Wojtak & Mamon did not account for mass mixing, and was based on a much smaller sample of primary-secondary pairs than used here.
Interestingly, our constraints on the orbital anisotropy are also consistent with the typical orbital anisotropy of subhaloes in numerical simulations of structure formation in a ΛCDM cosmology.In fact, the green contours show the constraints we obtain using a model in which the orbital anisotropy is allowed to depend on halo mass as given by equation [56].We find a weak indication that the orbits of satellite galaxies become more radially anisotropic towards lower halo mass.Most importantly, these results for the SDSS data (Fig. 17) are consistent with those for the Tier-3 mock data (Fig. 10), in which the satellite orbits reflect those of subhaloes in the ΛCDM-based SMDPL.The fact that the orbital anisotropy of satellite galaxies in the SDSS appears to be consistent with that of subhaloes in -body simulations can be heralded as yet another success for the ΛCDM model.
Although the weak mass-dependence of the orbital anisotropy inferred here is intriguing, especially in light of the agreement with the Tier-3 results, we emphasize that these results have to be interpreted with caution.The reason is that we have excluded data on projected separations < 55 ′′ because of fibre collision issues.As a consequence, the range in radii probed, in terms of the halo virial radius, in low mass haloes is different than that probed in more massive haloes.Hence, any potential radial dependence of the orbital anisotropy of satellite galaxies can, in principle, masquerade as a mass dependence in our analysis.We intend to address this 'degeneracy' in a forthcoming paper (Mitra et al., in prep) in which we consider models in which the orbital anisotropy is allowed to depend on halo-centric radius, as well as halo mass.In particular, we will consider Osipkov-Merritt model (Osipkov 1979;Merritt 1985), as well as more realistic simulation-inspired models such as those used by Mamon & Lokas (2005).

SUMMARY AND CONCLUSION
In paper I we presented Basilisk, a novel, Bayesian hierarchical method for analysing the kinematics of satellite galaxies.Based on the spherically symmetric Jeans equations it models the kinematics of large ensembles of satellite galaxies associated with central galaxies that span a wide range in halo mass and luminosity.The halo masses of the individual centrals act as latent variables in a hierarchical Bayesian framework that uses the data to constrain the detailed galaxy-halo connection as characterized by the CLF.Unlike traditional methods for analysing satellite kinematics, Basilisk does not make use of any summary statistic, such as velocity dispersions of satellite galaxies in central galaxy luminosity bins.Rather, it leaves the data in its raw form, which has the advantage that all data are used optimally while avoiding systematics that arise from binning.In addition, whereas traditional methods typically require volume-limited samples, Basilisk can be applied to flux limited samples, thereby greatly enhancing the quantity and dynamic range of the data.And finally, Basilisk is the only available method that simultaneously solves for halo mass and orbital anisotropy of the satellite galaxies, while properly accounting for 'mass-mixing'.
In this paper we have presented a number of important improvements to Basilisk, required for an unbiased recovery of all parameters when using large samples of data comparable to what can be achieved with existing SDSS catalogues.In particular, • We introduced an improved selection of primaries and secondaries that assures that the secondaries associated with each individual primary are volume-limited, even-though the overall sample is still flux-limited.This facilitates a more accurate modelling of the abundance and velocity distribution of the secondaries.
• We forward model the contribution of impurities among the primaries, where impurities are predominantly those satellites that are brighter than their corresponding centrals.
• We slightly modified the selection criteria of primaries to minimize the effect of other kinds of impurities that are extremely difficult to forward-model.
• We extended the satellite kinematics model to higher-order, by using the fourth-order Jeans equation to compute the kurtosis of the LOSVD.Incorporating this, in the modelling of the full 2D phase-space distribution of satellites, allows Basilisk to break the mass-anisotropy degeneracy, and to put tight constraints on the global, average velocity anisotropy of satellite galaxies.
• We drastically improved the modelling of interlopers by (i) accounting for the fact that the selection volume of secondaries is conical rather than cylindrical, (ii) accounting for splash-back galaxies, and (iii) using a data-driven method to model the line-ofsight velocity distribution of the interlopers.
• Instead of discarding primaries with zero secondaries, Basilisk utilizes their information to further constrain the galaxyhalo connection.Congruous with the satellite kinematics methodology, we introduced a similar Bayesian hierarchical framework to model the abundance of secondary galaxies around each primary, which improves significantly on the stacking-based approach used in paper I.This allows Basilisk to put unprecedented constraints on the satellite CLF.
Using realistic mock data of similar quality and volume as the SDSS DR7, we have demonstrated that, with this improved methodology, Basilisk can break the mass-anisotropy degeneracy, and simultaneously constrain the host masses and average orbital velocity anisotropy of satellite galaxies.In particular, Basilisk achieves an unbiased recovery of all 10 CLF parameters that characterize the galaxy-halo connection covering almost four orders of magnitude in halo mass (from ∼ 10 11 to 10 15 ℎ −1 M ⊙ ), and with unprecedented accuracy.In addition, it simultaneously recovers the orbital anisotropy parameter, , the luminosity and redshift dependence of the interloper fraction, and the radial number density profile of satellite galaxies.It is worth emphasizing that the recovery is unbiased despite the fact that the selection of primaries and secondaries is (unavoidably) plagued by biases, incompleteness, and impurities.
We applied Basilisk to a sample of 18, 373 primaries and 30, 431 secondaries extracted from the SDSS DR-7 data, yielding some of the tightest constraints on the galaxy-halo connection to date (Table 1).The model accurately reproduces both the abundance and line-of-sight velocity distributions of secondaries (Figs. 13 and Fig. 14), and is in good agreement with previous constraints on the galaxy-halo connection derived from galaxy group catalogues, galaxy clustering, galaxy-galaxy lensing and previous analyses of satellite kinematics.
Assuming that the orbital anisotropy of satellite galaxies is independent of halo mass and halo-centric radius, our analysis of SDSS data reveals a significant radial anisotropy of  = 0.29 +0.05 −0.04 , in excellent agreement with, but significantly tighter than, previous results (Wojtak & Mamon 2013).We also find a weak indication that  is slightly larger in lower mass haloes, in good agreement with the orbital anisotropy of subhaloes in dark-matter only simulations of structure formation in a ΛCDM cosmology (e.g., Diemand et al. 2004a;Cuesta et al. 2008;Sawala et al. 2017;van den Bosch et al. 2019).Since satellite are believed to reside in subhaloes, this may be considered another success of the standard model for structure formation.
We find that the radial number density profile of satellite galaxies,  sat ( |), is tightly constrained and well characterized by a generalized-NFW profile (equation [16]) with a central cusp-slope  = 0.94 (compared to  = 1 for a pure NFW profile), and a characteristic scale radius that is roughly two times larger than what is expected for the dark matter.Within the uncertainties, this is consistent with several previous studies (e.g., Yang et al. 2005b;Chen 2008;More et al. 2009a;Guo et al. 2012;Lange et al. 2019b).Consistent with paper I, we find our results to be extremely robust to modest changes in  sat ( |); the only parameter that displays some dependence is the anisotropy parameter,  (see Appendix C).This is to be expected given that both  and  sat ( |) appear in the Jeans equation used to model the kinematics of the satellite galaxies.
Interestingly, we find no evidence for a significant halo mass dependence of the scatter in central luminosity, and at any given halo mass we find the luminosity scatter to be around  c = 0.17 dex.This is inconsistent with the latest analysis of satellite kinematics by Lange et al. (2019b), also based on SDSS DR-7 data, who inferred that the scatter decreases with increasing halo mass, albeit only weakly with d c /d log  ∼ −0.04.As discussed in §6.3, this discrepancy arises primarily from the fact that Lange et al. (2019b) and all previous studies simply assumed the brightest galaxy in the halo to be the central.We, however, take into account the existence of brightest halo galaxy satellites, and forward-model the probability of misidentifying them as primaries.By doing so, we demonstrate that Basilisk's inference of  c is an unbiased recovery of the intrinsic luminosity scatter of true centrals.From our best-fit model we can predict what the scatter in brightest halo galaxy luminosity should be, as a function of halo mass, and that is in good agreement with the scatter inferred by Lange et al. (2019b) and other previous analyses.
For completeness, we point out that several studies that used stellar mass, rather than -band luminosity, to characterize the galaxy-halo connection also inferred that the scatter in stellar mass of central galaxies decreases with increasing halo mass (Moster et al. 2010;Zu & Mandelbaum 2015;Behroozi et al. 2019).However, most of these inferences were only significant below the ∼ 2 level.Hence, observationally it remains unclear whether or not the scatter in the galaxy-halo connection has a significant mass dependence.Taking our results at face value, it seems that scatter is fairly mass-independent, at least for log  > ∼ 10 12 ℎ −1 M ⊙ , and that previous indications for a significant mass dependence at the massive end are likely a result of confounding true centrals with brightest halo galaxies.
On the theory side, the situation is even more higgledypiggledy, with a clear lack of consensus (e.g., see Fig. 2 in Porras-Valverde et al. 2023).In general, semi-analytical models (e.g., Somerville et al. 2012;Lu et al. 2014;Henriques et al. 2015;Croton et al. 2016) predict a weak mass dependence with a small, negative value for d c /d log , but the magnitude of the overall scatter is typically much larger than what is inferred observationally (Wechsler & Tinker 2018).Hydrodynamical simulations of galaxy formation, typically predict a significantly lower scatter, at least for haloes with  > ∼ 10 12 ℎ −1 M ⊙ , in much better agreement with observations.Typically, though, they predict that the scatter rapidly increases for  < ∼ 10 12 ℎ −1 M ⊙ (e.g., Matthee et al. 2017;Pillepich et al. 2018).Finally, empirical models such as the UniverseMachine (Behroozi et al. 2019) and EMERGE (Moster et al. 2018) seem to predict  c () relations that fall roughly in between the predictions from semianalytical models and hydrodynamical simulations.
To conclude, we have demonstrated that satellite kinematics extracted from galaxy redshift surveys contain a wealth of information regarding the statistical relation between galaxies and their associated dark matter haloes.The Bayesian hierarchical framework Basilisk, developed here and in paper I, is able to analyse such data in an unbiased way, yielding accurate constraints on the galaxy-halo connection over a wide range of halo mass, and with unprecedented precision.Hence, satellite kinematics is complementary to other techniques that are used to constrain the galaxy-halo connection, in particular galaxy clustering and galaxy-galaxy lensing.Importantly, by only probing the smallest, most non-linear scales (i.e., the 1-halo term) it is insensitive to halo assembly bias, which ham-pers an unambiguous interpretation of the 2-halo term in clustering and galaxy-galaxy lensing.Hence, it is to be expected that by combining all these methods, degeneracies can be broken which opens up new avenues to test our cosmological paradigm and our models for galaxy formation.To this end, we plan to use, and where necessary further develop, Basilisk in future work.In particular, among others, we intend to explore additional degrees of freedom in the characterization of the galaxy-halo connection (for example, mass dependence in the faint-end slope of the satellite CLF and in the ratio of the characteristic luminosities of the centrals and satellites in haloes of any given mass), the impact of baryonic effects on the halo potential (which may introduce systematic errors in the inference from satellite kinematics), and the impact of scatter in the halo concentration-mass relation (and the expected correlation with the abundance of subhaloes/satellite galaxies).In addition, we are excited about the prospects of using Basilisk to probe the galaxy-halo connection as a function of secondary galaxy properties, such as galaxy colour and/or size, and to put constraints on cosmological parameters by combining satellite kinematics with other observables.1).Here  200 is an estimate for the satellite velocity dispersion in units of 200 km s −1 , which scales with the luminosity of the primary as log  200 =  0 +  1 log  10 +  2 (log  10 ) 2 , where  10 =  pri /(10 10 ℎ −2 L ⊙ ).
In paper I we adopted exactly the same parameters as Lange et al. (2019b):  h = 0.5,  s = 0.15,  h = 1000,  s = 4000/ 200 and ( 0 ,  1 ,  2 ) = (−0.04,0.38, 0.29).As discussed in paper I, this results in impurity fractions of ∼ 5 percent.These impurities cause a slightly bias estimate of the scatter in the galaxy-halo connection.In the case of the small mock data samples used in paper I, the effect was not significant.However, when using an 8-times larger, full-size SDSS sample the systematic bias in scatter becomes > 3 significant.
In §4.2.2 we classified impurities as either Type-I (BHG satellites) or Type-II (neither a central nor a BHG satellite).The green dots in the left-hand panel of Fig. A1 show the luminosities as function of host halo mass of the primaries selected from the Tier-3 mock data using the selection criteria used in paper I. Blue and orange contours mark the 4 and 7 c ranges around the median relation between halo mass and central luminosity used to construct the mock data.Red and black circled dots mark impurities of Type-I and II, respectively.As is evident, Type-I impurities have luminosities that are comparable to those of true centrals at the same host halo mass.That is because Type-I impurities are BHG satellites which are brighter than their corresponding central galaxy (hence they must have a luminosity in the typical range of Φ c (|) for a true central to be fainter).Being the brightest one in the corresponding halo, a Type-I impurity is impossible to avoid in the selection procedure.However, as we have demonstrated in §4.2.2, we actually forward model the contribution of Type-I impurities.
Type-II impurities, though, are a much bigger concern.As is evident from Fig. A1, these can have luminosities that are much lower than that of a typical central at the corresponding halo mass (by as much as 7).Since lower luminosities are indicative of a lower halo mass, a too-large contribution of Type-II impurities can give rise to significant, systematic errors in the inference.Since the kinematic information from secondaries associate with Type-II impurities still reflect a high velocity dispersion consistent with the actual halo mass, the main effect of Type-II impurities is to cause a systematic overestimate in the scatter of central luminosities at a fixed halo mass (i.e., an overestimate of  c ).Since we are not aware of a reliable method to forward model the impact of Type-II impurities, it is prudent that we minimize their incidence by tuning our selection criteria accordingly.
After extensive testing with different mock data sets similar to the Tier-3 mock discussed in the main text, we finally settled on the following set of parameters:  h = 0.6,  s = 0.15,  h =  s = 1000 and { 0 ,  1 ,  2 } = {0.04,0.48, 0.05}.With these new selection criteria we are able to reduce the fraction of Type-II impurities from ∼ 1.1% to ∼ 0.5%.The impact of this reduction can be seen by comparing the two panels of Fig. A1.Note that in addition to dropping the fraction of Type-II impurities to sub-percent levels, the new selection criteria preferentially removes the most dramatic outliers and also drastically reduces the contribution of Type-II impurities at the high mass end, where the old selection criteria caused the fraction of Type-II impurities to be very high.
With these new and improved selection criteria we find that Type-II impurities no longer cause a significant overestimate of  c .Although the new selection criteria reduces the number of primaries in the satellite kinematics sample by almost 40 percent, we find that this does not significantly compromise the precision with which Basilisk can infer the galaxy-halo connection.The reason is that the main reduction of primaries occurs at the low-luminosity end, where most of the secondaries are interlopers that do little to constrain the halo occupation model.For each luminosity bin, the completeness is roughly independent of halo mass at the low-mass end, but with a steep, almost exponential, decline at the high mass end.The yellow shaded region roughly indicates, for each luminosity bin, the 5 to 95 percentile range of halo masses.Note that the exponential decline of C (  | , ) at large masses only affects the top ∼ 5 percent of centrals of a given luminosity.For the vast majority of centrals the mass dependence of C (  | , ) is therefore negligible (which was the assumption we made in paper I).The coloured, solid lines show the theoretically predicted C (  | , ) computed under the assumption that the steep decline at the high-mass end is entirely due to centrals being fainter than their corresponding brightest satellites.As is evident, this is an excellent fit to the data, indicating that we can actually take the full mass-dependence of C (  | , ) into account by forward-modelling the halo occupation of brightest halo galaxies, instead of centrals.see the text for details.
parameter  (bottom-right panel), which is to be expected from the fact that both  sat ( |) and  appear in the expression for the line-of-sight velocity dispersion given by equation ( 41).
Note also that the constraints on  and R are inconsistent with satellite galaxies following the same radial profile as the dark matter (i.e.,  = R = 1, indicated by the red square in the top-left panel) at > 5 significance.If we had assumed that satellite galaxies are an unbiased tracer of their host halo mass distribution, which is not uncommon in the literature when modelling the galaxy-halo connection, we would have obtained the posteriors indicated by the red histograms.Interestingly, most CLF parameters would still be consistent with the values inferred using our fiducial, best-fit model with {, R} = {0.94,1.7}.The main exceptions, though, is the orbital anisotropy parameter, which would be biased high (i.e., we would markedly overestimate the radial velocity anisotropy).A few other parameters like  2 ,  s , and  p , are also somewhat biased in red histograms.Thus, we would incorrectly infer a steeper Lc () relation, a shallower faint-end slope of satellite CLF, and a slight decreasing trend in the central luminosity scatter with host halo mass, if we wrongly assumed the satellites to follow the dark matter radial distribution.In conclusion, Basilisk yields tight constraints on the radial number density profile of satellite galaxies, and whatever degeneracy remains between the central density slope and concentration has no significant impact on any inferred parameter.
This paper has been typeset from a T E X/L A T E X file prepared by the author.

Figure 1 .
Figure 1.Schematic showing the conical cylinders used for the selection of primaries and secondaries, and for modelling the interloper velocity profile.Redshift increases from left to right, and the depth of these cones are expressed as a velocity difference with respect to the primary galaxy on which the cones are centered.Note that all radii listed here are computed at the redshift of the primary.see the text for details.

LFigure 2 .
Figure 2. Illustration showing the luminosity and redshift cuts that play a role in our selection of primaries and secondaries.Coloured dots indicate primaries while the corresponding shaded rectangles indicate the volumelimited luminosity and redshift ranges of their secondaries.see the text for a detailed explanation.

Figure 4 .
Figure 4. Normalized line-of-sight velocity distribution (LOSVD) of galaxies around Tier-3 primaries in a narrow redshift and luminosity bin ( pri = 0.065 ± 0.01 and log  pri = 10.35 ± 0.1).The thick green and black lines are the LOSVDs of true satellites and of interlopers, respectively.Note that the vertical scale is logarithmic to highlight the high-velocity wings of the interloper distribution.The thin coloured lines are the LOSVDs of tertiary mock galaxies in annular rings around primaries with radial bins ranging from {  p,min ,  p,max } = {0.45,0.60} to {0.90, 1.05} ×  200 ℎ −1 Mpc, as indicated.All histograms are individually normalized to highlight their relative shapes.As is evident, the LOSVDs of these tertiaries closely resemble those of the interlopers.This empirical fact indicates that one can model the LOSVD of interlopers as that of 'tertiary' galaxies in any of these annular rings, and it is the data-driven method we introduce in Basilisk (see the text for details).

Figure 5 .
Figure 5. Marginalized posteriors obtained by Basilisk for the Tier-3 mock (assuming the best-fit radial profile for the satellites, with  = 0 and R = 2.5).Results are shown for only those 10 parameters that characterize the CLF.Panels along the diagonal show marginalized 1D posteriors while off-diagonal panels show 2D posteriors.In the latter case, contours demarcate the 68 and 95 % containment of the posterior, while the blue dashed lines indicate the true input values used to create the mock data set.Brown contours correspond to our fiducial model that assumes a mass-independent velocity anisotropy, ; and the green contours show the constraints for a model in which  is allowed to depend on halo mass (discussed in §5.3).The posteriors for all parameters are in good agreement with the input values, and virtually independent of whether the velocity anisotropy is mass dependent or not.For all parameters we adopted very wide and flat prior with bounds that have no impact on the posteriors.

Figure 6 .
Figure 6.Posterior constraints on the galaxy-halo connection in the Tier-3 mock inferred with Basilisk.In each panel blue dots, lines or shaded regions indicate the true values in the Tier-3 mock, while the brown shaded regions or histograms indicate the 95 percentile posterior ranges inferred with Basilisk (panels a-d), or the full posterior distributions (panels e and f).Panel (a): Number density of galaxies in 0.15 dex bins of galaxy luminosity.Panel (b): Luminosity of central galaxies as a function of halo mass.For comparison, the pink points show the true log  vir versus log  pri of all selected primaries, including impurities, while the magenta triangles show their corresponding median log  pri in bins of halo mass.Note that, due to selection effects, this median deviates from the true underlying relation (blue squares).Panel (c): The scatter,  c in the luminosity of central galaxies as a function of halo mass.Panel (d): The normalization,  * s (  ), of the satellite CLF as a function of halo mass.Panel (e): The faint-end slope,  s , of the satellite CLF.Panel (f): The (average) orbital anisotropy, , of satellite galaxies.

Figure 7 .Figure 8 .
Figure 7.The points with error-bars show the true interloper fractions in our Tier-3 mock sample as a function of primary luminosity for 6 different redshift bins, as indicated.The shaded bands of corresponding colour show the 68 and 95 percent confidence intervals of the posterior inferred with Basilisk.The good agreement indicates that Basilisk can accurately distinguish, in a statistical sense, between true satellites and interlopers.

Fig
Fig. 6 panel (b)  shows the relationship between halo mass and median central galaxy luminosity.Note that the true input model (blue squares) is perfectly recovered by Basilisk (brown band, indicating the 95 % confidence interval from the posterior).The pink dots indicate the true halo masses and luminosities of all primaries (both true centrals and impurities) in the Tier-3 satellite kinematics sample.The magenta triangles show their median values of log  pri in bins of log .Note that, due to selection bias, these do not agree with the true input model (blue squares).A more luminous primary has a larger secondary selection volume associated with it, as well as a larger luminosity range between  pri and  min ( back ).Therefore, at any given redshift, brighter primaries are more likely to have at least one secondary than less luminous primaries in haloes of the same mass, hence the brighter primaries get preferentially selected in the D SK data.This causes the median luminosity of primaries in the satellite kinematics sample to be biased high with respect to that of true centrals.This effect is especially pronounced at the low halo mass end, where the expectation value for the number of secondaries is lowest.Despite this strong and unavoidable selection bias, Basilisk perfectly recovers the true input relation between central luminosity and halo mass, and not the biased relation!This indicates that Basilisk, in its forward modelling, accurately accounts for selection bias and other systematics such as the presence of impurities.Panels (c) and (d) of Fig.6show, respectively, the logarithmic scatter in central galaxy luminosity at fixed halo mass,  c , and the normalization of the satellite CLF,  * s .Once again, blue dots show the true input values, while the brown shaded regions mark the 95% confidence interval of the posterior distribution inferred with Basilisk.Finally, the brown-shaded histograms in panels (e) and (f) show, respectively, the posterior distributions for the faint-end slope of the satellite CLF,  s , and that of the orbital anisotropy parameter, .The blue, vertical line in panel (e) indicates the true input value of  s , while the blue-shaded region in panel (f) show the range of mean velocity anisotropy of subhaloes in different halo masses in the SMDPL simulation used to construct the Tier-3 mock (discussed further in §5.3).As is evident, in each case the posterior constraints are in excellent agreement with the input values, which indicates that Basilisk can put tight and accurate constraints on the more intricate aspects of the galaxy-halo connection, beyond the mere relation between halo mass and central luminosity.Fig.7shows the interloper fraction,  int , as a function of primary galaxy luminosity in six redshift bins (indicated by different colours).Solid circles indicate the true interloper fractions in the Tier-3 mock sample with the error bars computed assuming Poisson statistics.The coloured bands show the corresponding posterior predictions as inferred by Basilisk, which are in good agreement with the true interloper fractions.Hence, Basilisk correctly distinguishes satellites from interlopers, at least in a statistical sense, and accurately recovers their relative prevalence as a function of luminosity and redshift of the primary.Finally, Figs. 8 and 9 compare the posterior predictions for the HOD and CLF, respectively, to their true input used to construct the Tier-3 mock data (solid dots).In particular, Fig.8, shows the average number of central (circles) and satellite (triangles) galaxies per halo, above four different luminosity thresholds ( th ), as a function of host halo mass.Interestingly, the true satellite mean occupation in our mock data deviates significantly from a simple power-law (which is a common assumption in the literature), and Basilisk accurately recovers that complexity in its shape across all luminosities and halo masses.Fig.9plots the central (purple) and satellite (orange) CLFs for 8 different halo masses, as indicated in the top-right corner of

Figure 9 .
Fig. 6 panel (b)  shows the relationship between halo mass and median central galaxy luminosity.Note that the true input model (blue squares) is perfectly recovered by Basilisk (brown band, indicating the 95 % confidence interval from the posterior).The pink dots indicate the true halo masses and luminosities of all primaries (both true centrals and impurities) in the Tier-3 satellite kinematics sample.The magenta triangles show their median values of log  pri in bins of log .Note that, due to selection bias, these do not agree with the true input model (blue squares).A more luminous primary has a larger secondary selection volume associated with it, as well as a larger luminosity range between  pri and  min ( back ).Therefore, at any given redshift, brighter primaries are more likely to have at least one secondary than less luminous primaries in haloes of the same mass, hence the brighter primaries get preferentially selected in the D SK data.This causes the median luminosity of primaries in the satellite kinematics sample to be biased high with respect to that of true centrals.This effect is especially pronounced at the low halo mass end, where the expectation value for the number of secondaries is lowest.Despite this strong and unavoidable selection bias, Basilisk perfectly recovers the true input relation between central luminosity and halo mass, and not the biased relation!This indicates that Basilisk, in its forward modelling, accurately accounts for selection bias and other systematics such as the presence of impurities.Panels (c) and (d) of Fig.6show, respectively, the logarithmic scatter in central galaxy luminosity at fixed halo mass,  c , and the normalization of the satellite CLF,  * s .Once again, blue dots show the true input values, while the brown shaded regions mark the 95% confidence interval of the posterior distribution inferred with Basilisk.Finally, the brown-shaded histograms in panels (e) and (f) show, respectively, the posterior distributions for the faint-end slope of the satellite CLF,  s , and that of the orbital anisotropy parameter, .The blue, vertical line in panel (e) indicates the true input value of  s , while the blue-shaded region in panel (f) show the range of mean velocity anisotropy of subhaloes in different halo masses in the SMDPL simulation used to construct the Tier-3 mock (discussed further in §5.3).As is evident, in each case the posterior constraints are in excellent agreement with the input values, which indicates that Basilisk can put tight and accurate constraints on the more intricate aspects of the galaxy-halo connection, beyond the mere relation between halo mass and central luminosity.Fig.7shows the interloper fraction,  int , as a function of primary galaxy luminosity in six redshift bins (indicated by different colours).Solid circles indicate the true interloper fractions in the Tier-3 mock sample with the error bars computed assuming Poisson statistics.The coloured bands show the corresponding posterior predictions as inferred by Basilisk, which are in good agreement with the true interloper fractions.Hence, Basilisk correctly distinguishes satellites from interlopers, at least in a statistical sense, and accurately recovers their relative prevalence as a function of luminosity and redshift of the primary.Finally, Figs. 8 and 9 compare the posterior predictions for the HOD and CLF, respectively, to their true input used to construct the Tier-3 mock data (solid dots).In particular, Fig.8, shows the average number of central (circles) and satellite (triangles) galaxies per halo, above four different luminosity thresholds ( th ), as a function of host halo mass.Interestingly, the true satellite mean occupation in our mock data deviates significantly from a simple power-law (which is a common assumption in the literature), and Basilisk accurately recovers that complexity in its shape across all luminosities and halo masses.Fig.9plots the central (purple) and satellite (orange) CLFs for 8 different halo masses, as indicated in the top-right corner of

Figure 10 .
Figure10.Satellite velocity anisotropy in the Tier-3 mock.Brown shaded contours indicate the 68 and 95 per cent confidence intervals on the orbital anisotropy parameter  as a function of halo mass inferred from our fiducial model in which  is assumed to be global constant (independent of halo mass).The green shaded bands show the confidence intervals obtained when allowing for halo mass dependence as described in the text (see equation[56]).The black and blue lines show the host halo mass dependence of the true, mean velocity anisotropy of all subhaloes in the SMDPL simulation, and of those with a halo-centric radius  < 0.4 vir , respectively.

Figure 11 .
Figure 11.Statistics of our sample of primaries and secondaries selected from the SDSS-DR7.Top-left: The numbers of all primaries (blue), primaries with at least one secondary (orange), and secondaries (green) as a function of their luminosities.Top-right: Probability,  0 , that a primary of luminosity  pri has zero secondaries.Results are shown for 6 redshift bins, as indicated.Bottom-left: Multiplicity function, indicating the number of primaries (as log[  pri + 1]) with  sec secondaries in our sample.Bottom-right: the distribution of aperture completeness,  app , for all primaries.

Figure 12 .
Figure 12.The velocity difference, Δ for all the 30,431 primary-secondary pairs in our SDSS sample as function of the luminosity of the primary,  c , (left-hand panel) and as a function of the projected separation between primary and secondary,  p (right-hand panel).Colours indicate the redshift of the primary, as indicated in the colour-bar on the right.

Figure 13 .
Figure13.The LOSVDs of secondaries around primary galaxies stacked in bins of luminosity (increases vertically) and redshift (increases horizontally), as indicated.Only bins that fall entirely above the SDSS flux limit are shown.The SDSS data is shown as the blue shaded histograms, while the red bands indicate the 95% confidence intervals obtained using the posterior distributions inferred with Basilisk.The red histogram in the inset-panel in the lower-right corner shows the likelihood distribution of 10 4 ×  pair random realizations of  (Δ ) data for the inferred best-fit model, while the blue, dashed line shows the corresponding value for the actual SDSS data (see the text for details).Here  pair is total number of primary-secondary pairs in the SDSS dataset.Note that the model inference is an excellent match to the data.

Figure 14 .
Figure14.Multiplicity functions for the numbers of secondaries in the same bins of luminosity (increasing vertically) and redshift (increasing horizontally) as used in Fig.13.Only bins lying entirely above the flux limit of the survey, and with at least 5 primaries in the down-sampled 'L NS data', are shown.The blue points show the SDSS data, with Poisson error-bars, while the red bands indicate the 95% confidence intervals of the posterior prediction from Basilisk.The red histogram in the inset-panel in the lower-right corner shows the likelihood distribution of 10 4 random realizations of these multiplicity functions for the inferred best-fit model, while the blue, dashed line shows the corresponding value for the actual SDSS data.Although the fit to the data looks very reasonable at first sight, this more detailed comparison of likelihoods shows that the best-fit model is not a perfect fit to the data.As discussed in the text, we attribute this to limited freedom in the satellite CLF model used to describe the galaxy-halo connection.

Figure 15 .
Figure15.Key halo mass dependencies that characterize the galaxy-halo connection in our model, as constrained by Basilisk using SDSS data.In each panel, the brown shaded regions show the 95 per cent confidence intervals inferred by Basilisk.The coloured symbols show constraints from previous studies using galaxy group catalogues(Yang et al. 2008), galaxy clustering(Cacciato et al. 2013) and satellite kinematics(Lange et al. 2019b).From top to bottom, the panels plot the median luminosity of centrals ( Lc ), the log-normal scatter in central luminosity ( c ), the faint-end slope of the satellite CLF ( s ), and the normalization of the satellite CLF ( * s ), each as a function of halo mass.see the text for a detailed discussion.

Figure 16 .
Figure16.The scatter in central galaxy luminosity, as inferred by Basilisk (grey band), shows no mass dependence.This is in tension with the scatter for red and blue centrals, as inferred by a recent analysis of satellite kinematics in the SDSS byLange et al. (2019b), shown by the red and blue shaded regions.This tension can be resolved by noting thatLange et al. (2019b) assumed that all their primaries are centrals.Consequently, their constraint on  c is really a constraint on the scatter in the brightest halo galaxies (BHGs).The black dashed line shows the predicted scatter in log  BHG for our best-fit model, computed using Basilisk.Note that this is in excellent agreement with the results ofLange et al. (2019b) for the red centrals, which are the dominant population of centrals in massive haloes.For comparison, the yellow hatched region shows the constraints on  c (assumed to have no mass dependence) obtained byCacciato et al. (2013) based on an analysis of galaxy clustering plus galaxy-galaxy lensing, while the red and blue plus signs show the constraints for red and blue centrals, respectively, inferred byYang et al. (2008) using a galaxy group catalogue.All uncertainty bands and error-bars shown correspond to 68% confidence intervals.see the text for a more detailed discussion.

Figure 17 .
Figure 17.Same as Fig. 10 but for the SDSS data.Brown and green contours indicate the posterior constraints on the velocity anisotropy of satellite galaxies as inferred from our fiducial (constant-) model, and the massdependent  (  ) model, respectively.

Figure A1 .
Figure A1.Luminosity and halo mass of primaries in our Tier-3 mock satellite kinematics sample, identified using old (left) and new (right) selection criteria.We introduce the new, more restrictive, selection criteria to reduce the number of type-II impurities which are marked with black circles.Note that the new selection criteria especially eliminate the most problematic impurities which are offset from the input mass-luminosity relation by more than 4 c (blue shaded region) and up to as much as 7 c (yellow shaded region).It also distinctively reduces impurities at the high-mass end, which is crucial for obtaining unbiased constraints.

Figure B1 .
Figure B1.Halo completeness, C (  | , ), defined as the fraction of haloes of mass , hosting centrals of luminosity  at redshift , whose centrals are selected as primaries by our selection criteria.Points with Poisson error-bars indicate halo completeness as a function of halo mass in our Tier-3 mock sample.Results are shown for different bins of central galaxy luminosity (different colours, as indicated).For each luminosity bin, the completeness is roughly independent of halo mass at the low-mass end, but with a steep, almost exponential, decline at the high mass end.The yellow shaded region roughly indicates, for each luminosity bin, the 5 to 95 percentile range of halo masses.Note that the exponential decline of C (  | , ) at large masses only affects the top ∼ 5 percent of centrals of a given luminosity.For the vast majority of centrals the mass dependence of C (  | , ) is therefore negligible (which was the assumption we made in paper I).The coloured, solid lines show the theoretically predicted C (  | , ) computed under the assumption that the steep decline at the high-mass end is entirely due to centrals being fainter than their corresponding brightest satellites.As is evident, this is an excellent fit to the data, indicating that we can actually take the full mass-dependence of C (  | , ) into account by forward-modelling the halo occupation of brightest halo galaxies, instead of centrals.see the text for details.
for the full SDSS data consisting of 18,373 primaries with at least one secondary and a total of 30,431 secondaries (see §6), it only takes of the order of 200 milliseconds using a single run-of-the-mill CPU.This is sufficiently fast, that it allows one to run different Monte-Carlo Markov Chains for different assumptions regarding nsat ( |, ), and to find the best-fit radial profile, marginalized over all other model parameters.First, we combine the posteriors from separate MCMC runs on 15 × 15 grid in (, log R)-space, each time marginalizing over the other 14 parameters, to constrain nsat ( |, ) (see Appendix C).Having determined the values of  and R that maximize L tot , we then run a MCMC sampler to infer the full posterior ( |D) keeping  and R fixed at these best-fit values.

Table 1 .
Galaxy-halo connection parameters with brief descriptions (see §4.1.1 for details), along with Basilisk's inference for SDSS DR7 galaxies.Best-fit values and 1 confidence intervals of all parameters quoted in the table, excluding the last two, are for the mass-independent velocity anisotropy model.
given their uncertainties.Note,