The IACOB project IX. Building a modern empirical database of Galactic O9-B9 supergiants: sample selection, description, and completeness

Blue supergiants (BSGs) are important objects to study the intermediate phases of massive star evolution, helping to constrain evolutionary models. However, the lack of a holistic study of a statistically significant and unbiased sample of these objects makes several long-standing questions about their nature to remain unsolved. The present and other upcoming papers of the IACOB series are focused in studying - from a pure empirical point of view - a sample of 500 Galactic O9 - B9 stars with luminosity classes I and II (plus 250 late O- and early B-type stars with luminosity classes III, IV and V) and covering distances up to 4 kpc from the Sun. We compile an initial set of 11000 high-resolution spectra of 1600 Galactic late O- and B-type stars. We use a new novel spectroscopic strategy based on a simple fitting of the Hbeta line to select stars in a specific region of the spectroscopic HR diagram. We evaluate the completeness of our sample using the Alma Luminous Star catalog (ALS III) and Gaia-DR3 data. We show the benefits of the proposed strategy for identifying BSGs descending from stellar objects born as O-type stars, in the context of single star evolution. The resulting sample reaches a high level of completeness with respect to the ALS III catalog, gathering the 80% for all-sky targets brighter than Bmag<9 located within 2 kpc. However, we identify the need for new observations in specific regions of the Southern hemisphere. In conclusion, we have explored a very fast and robust method to select BSGs, providing a valuable tool for large spectroscopic surveys like WEAVE-SCIP or 4MIDABLE-LR, and highlighting the risk of using spectral classifications from the literature. Upcoming works will make use of this large and homogeneous spectroscopic sample to study specific properties of these stars in detail. We initially provide first results about their rotational properties.


Introduction
Massive stars are very important objects in astrophysics because they play a crucial role in the Universe. From their feedback to their circumstellar and interstellar medium (Garcia-Segura et al. 1996;Krause et al. 2013), stellar clusters (Rogers & Pittard 2013) and larger structures such as giant molecular clouds (Matzner 2002) with a global importance in the chemo-dynamical evolution of galaxies (Matteucci 2008), to their use as extragalactic tools (Kudritzki & Puls 2000;Kudritzki et al. , 2008, and their main role as progenitors of gravitational wave emitters (Belczynski et al. 2016;Marchant et al. 2016).
Their different evolutionary phases are of particular interest for studying the different physical processes that are involved. While the progress of understanding each of these phases has been notorious over the past decades, massive stars still gather many questions that have not yet been answered (Langer 2012).  are only available at the CDS via anonymous ftp to cdsarc.cds.unistra.fr (130.79.128.5) or via https:// cdsarc.cds.unistra.fr/viz-bin/cat/J/A+A/674/A212 In particular, the so-called blue supergiants (BSGs) have always been among the less constrained objects from an evolutionary perspective (e.g., Fitzpatrick & Garmany 1990;Lennon et al. 1992Lennon et al. , 1993, and many others).
BSGs can have two very different fates according to single evolutionary models, which essentially depend on their masses (e.g., Brott et al. 2011;Ekström et al. 2012). BSGs can either transition the post-main sequence (MS) and likely become luminous blue variables (LBVs; Humphreys & Davidson 1994;Smith et al. 2011;Weis & Bomans 2020) and ultimately Wolf-Rayet (WR) objects (Abbott & Conti 1987;Humphreys 1991;Nugis & Lamers 2000), or they go through it to become red supergiants (RSgs; Levesque et al. 2005), and if they undergo one or more of the so-called blue loops (Stothers & Chin 1975;Ekström et al. 2012), they return into a bluer and hotter state. In both cases, their lives will end as supernovae (Woosley et al. 2002;Smartt 2009) or faded supernovae (Adams et al. 2017).
One of the still unresolved but most important questions arises from the overdensity of BSGs in the region of the Hertzsprung-Russell diagram. Evolutionary models predict a gap of stars in this region because the stars are expected to evolve rapidly toward the RSg phase after leaving the MS. This overdensity was first shown in Fitzpatrick & Garmany (1990) using photometric data, and it has been observed in more recent spectroscopic studies of large samples of massive stars, such as Castro et al. (2014). Several scenarios have been proposed to explain this overdensity. One of them is the overlapping of BSGs that approach and return from the RSg phase through the above-mentioned assumption of the blue loops. Additionally, this region can also be populated by stars that have undergone binary interaction, resulting in additional hydrogen in their cores and therefore allowing them to extend their MS (Sana et al. 2012;de Mink et al. 2013). Last, the unexpected overdensity might be the result of an incorrect fine-tuning of some parameters in the models themselves, such as overshooting or inflation (Castro et al. 2014;Martinet et al. 2021).
Theoretical models have highlighted the implications that rotation may have in the evolution and properties of massive stars Langer 2012) because rotation can alter the surface composition of massive stars (Heger & Langer 2000;Hunter et al. 2009) as well as the mass-loss rate ) and the final fate . As a continuation of the study performed by Holgado et al. (2020Holgado et al. ( , 2022 on Galactic O-type stars, the key to constraining the most accurate evolutionary models if gathering empirical information from the largest sample of BSGs. In this matter, another important question arises from the observational evidence of a drop in the number of fastrotating stars at approximately 22 000 K (Vink et al. 2010). This drop has been proposed to be caused by enhanced mass loss (stronger angular momentum loss) in the vicinity of the theoretical bi-stability jump (Vink et al. 2000). However, as shown in Fig. 1 of Castro et al. (2014), this drop might be located at the same position as a new empirical terminal age main sequence for stars with masses above 20 M , which may indicate that the drop might also be caused by stars that leave the MS as their rotational velocities also decrease.
To try to unlock this situation, this paper and the upcoming set of papers are focused on the study of the physical and evolutionary nature of Galactic BSGs from a purely empirical point of view, and more specifically, the papers focus on Galactic O9-B9-type stars that evolve from stars that were born with masses in the range ( 20−80 M ) (i.e., those born as O-type stars). For this endeavor, we benefit from a significantly large and homogeneous sample of these stars that were observed with high-resolution spectrographs. In this first paper, we develop a method that is based on the Hβ line profile for selecting the stars independently of previous spectral classifications. We describe the selected sample in detail and identify misclassified stars from the literature, together with double-line spectroscopic binaries. We use the Gaia DR3 (Gaia Collaboration 2016, 2023; Babusiaux et al. 2023) and the Alma Luminous Star catalog (ALS III;Pantaleoni González et al. 2021, and in prep.) to evaluate the completeness of our sample in the solar neighborhood. This is of particular importance to avoid adding potential observational biases when the evolutionary models are constrained. These biases can subsequently introduce further biases in the population synthesis of massive stars (e.g., Voss et al. 2009). Last, we made use of the IACOB-BROAD (Simón-Díaz & Herrero 2014) tool to obtain the v sin i distribution as a preliminary approach to constrain the MS in this range of high masses. Upcoming papers in this series will cover the stellar parameters for the stars we selected here together with abundances derived from quantitative spectroscopic analysis. These papers will also include the analysis of the stellar variability and the full identifi-cation of binaries using multi-epoch data, as well as the examination of their pulsational properties and other related topics that aim to address other unanswered questions.
The paper is organized as follows: Sect. 2 summarizes the spectroscopic data of the input sample of stars and also the data retrieved from the Gaia mission. Section 3 describes the initial motivation and methods we used to build the sample of stars, to carry out the visual inspection of the selected spectra, and to perform the line-broadening analyses. Section 4 determines whether the method can be used to build the resulting sample, describes the sample in terms of spectral classifications, photometric magnitudes, distances, and evaluates the completeness of the sample with respect to the ALS III catalog. The sample is also described in terms of binaries and the different Hβ and Hα profiles because they are affected by stellar winds. First results from the v sin i distribution are also discussed. Last, Sect. 5 includes the summary, conclusions, and the future work to be carried out as a continuation of this paper.

Ground-based spectroscopy
The starting point of our study was to gather high-resolution optical spectroscopy for the largest possible sample of Galactic O9-B9 stars with luminosity classes (LCs) I-II and O9-B3 stars with LCs III-IV-V that could be reached with any of the facilities indicated below using reasonable exposure times. To achieve this, we built an initial list of stars fulfilling these criteria based on the recommended spectral classifications quoted in the Simbad astronomical database (Wenger et al. 2000). This initial list was then complemented with a few other sources that are known to be B-type stars, but are incorrectly quoted as A-type stars in Simbad, plus a small sample of stars for which spectra were available in the IACOB spectroscopic database (see below), but for which either no information of the corresponding LC is quoted in Simbad, or that are simply labeled OB or O (see Sect. 4.2, Tables H.1 and D.2).
We then focused on collecting spectra for as many stars as possible from the above-mentioned list using three main instruments: the FIbre-fed Echelle Spectrograph (FIES; Telting et al. 2014) mounted at the 2.56 m Nordic Optical Telescope (NOT) located at the Observatorio del Roque de los Muchachos in La Palma, Canary Islands, Spain; the High Efficiency and Resolution Mercator Echelle Spectrograph (HERMES; Raskin et al. 2011) at the 1.2 m Mercator semirobotic telescope, also located at the Observatorio del Roque de los Muchachos; and the Fiber-fed Extended Range Optical Spectrograph (FEROS; Kaufer et al. 1997) at the 2.2 m MPG/ESO telescope located at the ESO La Silla Observatory, Chile. All these instruments provide high-resolution spectra with a resolving power ranging from 25 000 to 85 000 and a common wavelength coverage in the 3800−9000 Å range. However, some of the FIES spectra that were taken before the installation of the current CCD (30/9/2016) only cover up to 7200 Å. In addition, FEROS spectra have a wider coverage, from 3500 Å to 9200 Å.
We initially benefited from these FIES and HERMES spectra of northern Galactic OB stars (with a declination −20 • ) compiled by the IACOB project until 2020 (see , for the latest review of the so-called IACOB spectroscopic database). Some of them were obtained between 2019 and 2020 with the NOT during technical and nordic service nights within the scope of the NOT studentship program of the main author. Pursuing the objectives described above, several additional observing campaigns with NOT and Mercator were planned and executed in the framework of the IACOB project between 2020 and 2023. In addition, we searched the ESOarchive to retrieve all public available spectra of the southern stars that are observable from La Silla (with a declination 30 • ) that matched the above-mentioned criteria.
At the time of submission of this paper, we had gathered a total of ≈11 000 spectra of ≈1600 stars, ≈610 of which were observed within the scope of this paper. About 670 of all of them correspond to O9-B9 type stars with LCs I and II, and ≈710 correspond to O9-B3 type with LCs III-IV-V. From the ESOarchive, we downloaded the spectra of ≈330 stars. The list of programs from which these stars were taken is included in the acknowledgments.
In all cases, we directly considered the reduced spectra provided by the pipelines installed at the telescopes. In addition, we applied our own routines to produce a normalized version of each individual spectrum, and used pyIACOB 1 to correct all the spectra for heliocentric velocity, cosmic rays, and some systematic cosmetic defects that are mainly present in the FEROS spectra.
For the purposes of this paper, we selected the best available spectrum for each star with a signal-to-noise ratio (S/N) 30 in the 4000−5000 Å range. We also took any cosmetic or potential issues into account (e.g., normalization issues) that might lead to unreliable results in this or later works. The median of the S/N in the 4000−5000 Å spectral range of the above-mentioned best spectra is ≈170. It reaches up to 300 or more in several bright stars. In addition, we considered all available multi-epoch spectra to search for spectroscopic binaries.

Gaia data
We complemented the spectroscopic data with data from Gaia DR3 (Gaia Collaboration 2016, 2023Babusiaux et al. 2023). In general terms, the Gaia DR3 release provides reliable astrometric and photometric data for most of the Galactic OB stars near the Sun, with the exception of some sources that are too bright (G mag 6), for which either the data are unavailable or the astrometric solution is less reliable due to uncalibrated CCD saturation (Lindegren et al. 2018), which also affects the photometric data. For the latter case, we used Hipparcos (Perryman et al. 1997) astrometric data if available, to have at least a rough estimation. In addition to this, the inclusion of the Radial Velocity Spectrometer (RVS) provides the line broadening parameters (Frémat et al. 2023), which we used to compare with our results out of curiosity. We obtained a poor agreement between the two. This comparison is included in Appendix G.
In particular (see Sect. 3.1), we downloaded available information for the stars in the sample of the astrometric parallaxes ( ) and proper motions (µ α cos δ, µ δ ), of the line broadening measurements, of G, G RP , and G BP band photometry, and finally, of the associated renormalized unit weight error (RUWE) values. The latter are an estimate for the reliability of the astrometric solution (Lindegren et al. 2021a). We did not filter our sample by this value, but we note here that given the relatively close distance (<2500 pc) of most of the targets in the IACOB sample, most of the stars that are not too bright have a RUWE < 1.4. The stars with RUWE values up to 8 can still be used when it is considered that the errors in parallax are larger (for more details, see Maíz Maíz Apellániz 2022). 1 The pyIACOB package is described in Appendix A.
In order to obtain the distances to the stars in the sample, we directly downloaded the corrected distances from Bailer-Jones et al. (2021), who used Gaia EDR3 (Gaia Collaboration 2021) together with a direction-dependent prior and other additional information, such as the extinction map of the Galaxy for the determination of the distances. Alternatively and as a first-order approximation and comparison, we also used the inverse of the parallax (1/ ) by correcting it for zeropoint offset by using the procedure described in Lindegren et al. (2021b) 2 . However, this is a poor estimate of the distance and is only valid for zeropoint-corrected and positive parallaxes with σ / 0.1 (see Bailer-Jones et al. 2021, for more details). Last, we also used preliminary results from Pantaleoni González et al. (in prep.), who use a different prior from Bailer-Jones et al. (2021) that is more optimized for deriving distances to OB stars (see Pantaleoni González et al. 2021, Sect. 3.3, for more details). A comparison between these distances from different methods is included in Appendix B.

Motivation and selection criteria
The left panel of Fig. 1 shows the distribution of a subsample of 246 O-and B-type stars that were investigated by Simón-Díaz et al. (2017) in the so-called spectroscopic Hertzsprung-Russel diagram (sHRD; first used by Langer & Kudritzki 2014). In this figure, we separate stars according to their LC using different colors, and indicate the rough boundary between the location of the O-and B-type stars in the sHRD with the dashed line (see Holgado et al. 2018). In addition, we include a set of non-rotating evolutionary tracks computed with the MESA code 3 (see Dotter 2016;Choi et al. 2016;Paxton et al. 2011Paxton et al. , 2013Paxton et al. , 2015, for references purposes).
As mentioned in Sect. 1, we are interested in continuing the efforts initiated in previous papers of the IACOB series (referring to the O star domain; see Holgado et al. 2020Holgado et al. , 2022, and populate with the largest possible sample, the region of the sHRD that corresponds to the B-type stars that are expected to be the descendants of the O-type stars. To this aim, one possibility would be to select the stars with spectral types in the range B0 to B9 from the initial sample described in Sect. 2.1 that are classified as supergiants (LC I) or bright giants (LC II). However, this approach requires caution because, as we show in Sect. 4.1, the spectral classifications provided by default by Simbad for the case of B-type stars include a non-negligible number of cases in which the LCs are incorrect (or not even provided). In addition, as illustrated in the left panel of Fig. 1, part of the region of interest is also populated by early-B giants (and maybe also subgiants and dwarfs). Therefore, if we follow this selection strategy, we might exclude many stars of interest for our study. To overcome this problem, we decided to follow a more robust (but still simple) strategy that is based on the use of the quantity FW3414(Hβ) as a proxy of the parameter log L (both described in the next section). 3.1.2. Using the width of Hβ as a proxy of log(L/L ) Considering the location of the O-type stars and the behavior of the evolutionary tracks in the sHRD shown in the left panel of Fig. 1, it might easily be assumed that at first glance, we are mainly interested in stars with log(L/L ) 3.5 dex, where the L parameter is defined as T 4 eff /g (Langer & Kudritzki 2014). Ideally speaking, the best approach for selecting our working sample of evolved descendants of O-type stars would be to have estimates of the effective temperature (T eff ) and the surface gravity (log g) for each star in the list of potential targets of interest described in Sect. 2.1. However, although viable, this strategy would be very time-consuming because it would require the full quantitative spectroscopic analysis process for each target before the selection.
As an alternative, we decided to explore to which extent a direct measurement of the profile shape of one of the hydrogen Balmer lines, which are known to be affected by the surface gravity of the star through the effect of the Stark broadening, can be used to perform this selection in a much faster way.
In particular, we chose to work with the Hβ line, which is less affected by blends with other metal lines (as is, e.g., the case of the Hγ line) and is not as heavily affected by the presence of a stellar wind as the Hα line. As illustrated in Fig. 2 (see also the explanation below), we also identified that the quantity FW3414(Hβ), which is defined as the difference between the width of the Hβ line measured at three-quarters and one-quarter of the line depth, is better suited as a proxy of log(L/L ) than the full width at half maximum (FWHM) of the line.
To test the proposed method, we first used the stellar atmosphere code FASTWIND (Santolaya-Rey et al. 1997;Puls et al. 2005) to compute a grid of synthetic spectra corresponding to models covering a range in T eff = 16 000−28 000 K and log(L/L ) = 2.5-4.1 dex. The other parameters, such as the helium abundance (N He ), the microturbulence (ξ), and the windstrength parameter (log Q) 4 , were kept fixed in the computation of the grid to N He = 0.1, ξ = 10 km s −1 , log Q = −14.0. Puls et al. 1996Puls et al. , 2005. By convolving these spectra with different broadening and instrumental profiles, we were able to reproduce (to first order) the effect that different projected rotational velocities (v sin i) and spectral resolutions (R) have on the line profiles. In particular, we considered values of R between 2500 and 85 000, and eight values of v sin i, ranging from 10 to 400 km s −1 . Some illustrative examples of the synthesized profiles are shown in the top panel of Fig. 2.
We then used the corresponding pyIACOB module to measure the full width of the line at three-quarters, one-half, and onequarter of its depth for each simulated Hβ profile. For the same values of T eff = 24 000 K and R = 25 000, the middle and bottom panels of Fig. 2 show that for a given v sin i value, the lower the quantity log(L/L ) (i.e., the higher the surface gravity), the larger the quantities FW3414(Hβ) and FWHM. However, in a more general situation, the use of FW3414(Hβ) results in a more adequate proxy of log(L/L ) than the FWHM because it mitigates the effect that v sin i has on the shape of the line, providing a better indirect measurement of the gravity.
To better illustrate this situation, the top panel of Fig On the one hand, the aforementioned mitigation of the v sin i effect is clearly visible when the resulting differences obtained in FW3414(Hβ) ( 2 Å) and FWHM ( 6 Å) are compared when the profiles labeled A and B are considered. On the other hand, and more importantly, the FWHM produces a much less efficient filtering than the quantity FW3414(Hβ) when samples of stars with a broad range of v sin i are considered. For example, to ensure that a star with a set of parameters similar to those considered in profile B survives the filtering process, we need to assume a filtering value of FWHM ≈ 10 Å. Other stars with log(L/L ) down to ≈2.5 dex would also be selected in this case, however, which would contaminate the sample with many stars that are below the initially considered limit in log(L/L ) of ≈3.5 dex (see the left panel of Fig. 1). In contrast, since profile B has a FW3414(Hβ) ≈ 6.5 Å, considering this as the filtering value would only select stars down to log(L/L ) ≈ 3.5 dex, which corresponds to a star with similar parameters as those considered for profile C.
We note that the value of T eff = 24 000 K was not randomly chosen, but rather approximates the temperature of the majority of the B-type stars that we analyzed in this study (see Sect. 4.2.1). Similarly, the value of R = 25 000 was selected because the spectra used in this study have the same or a higher resolution. We have evaluated the effect of increasing T eff up to 30 000 K (approximate temperature for a B0 I-II star), which shifts our results by 0.5 Å toward higher values of the quantity FW3414(Hβ). Similarly, we investigated the effect of considering different resolving powers. We found that within the resolutions of our spectroscopic data set (see Sect. 2.1) the shift in FW3414(Hβ) is ≈0.2 Å toward lower values. On the other hand, we found an average shift of ≈0.2 Å compared to R = 5000 toward higher values, that is, ≈0.5 Å for R = 2500. In all cases, we obtained very similar slopes as those shown in the middle panel of Fig. 2, which demonstrates that the method can safely distinguish supergiants from low-luminosity counterparts in upcoming lower-resolution spectroscopic surveys such as WEAVE-SCIP (Dalton et al. 2020;Jin et al. 2023) or 4MIDABLE-LR (Chiappini et al. 2019).
As an additional validation of our proposed strategy, we measured the FW3414(Hβ) for all the stars included in the sHRD presented in the left panel of Fig. 1. We present the corresponding log(L/L ) vs. FW3414(Hβ) diagram in the right panel of Fig. 1, using the LC as the color code as in the left panel. Despite the very wide T eff span (more than 30 000 K) and different v sin i and resolutions, we are able to reproduce to a very good level what is shown in the middle panel of Fig. 2. These results allow us to provide an approximate value of log(L/L ) above which all stars very likely are descendants of the O-stars. In particular, if we wish to include all B-type stars with LCs I and II (BSGs), a value of log(L/L ) ≈ 3.5 dex can be used, and this would correspond to a FW3414(Hβ) of 7−8 Å.

Visual inspection of the spectra
The method described in Sect. 3.1.2 assumes a pure absorption profile for the Hβ line, but this is not always the case. Therefore, we decided to also perform a visual inspection of the spectra in order to identify cases in which the measurement of FW3414(Hβ) might be less reliable.
While inspecting the spectra, we did not only concentrate on the Hβ profiles, but also evaluated the global morphology of the Hα line. As a result, we were able to provide a rough classification of the different types of Hα and Hβ profiles that we found in the full sample. In addition, we benefited from the complete set of multi-epoch spectra that were available to perform a visual identification of the double-line or higher-order spectroscopic binaries (hereafter SB2+). Figure 3 shows some examples of the different types of Hα profiles we found in the spectra of the investigated sample of stars. These extend from pure absorption to strong emission (e.g., HD 54025 and HD 199478, respectively), cases in which the red wing is refilled (e.g., HD 15690), a P Cygni-type profile is detected (e.g., HD 206165 and HD 13256), or cases in which the profile appears contaminated by a more or less strong doublepeak emission (e.g., HD 1544 and HD 23478). After a careful inspection of all the Hα profiles in our spectroscopic data set, we decided to establish a morphological classification that is based on five main different features (with some subtypes) as described below.

Identification of different Hα and Hβ profiles
-Absorption (Ab): This classification applies when the line appears in pure absorption. -Core filled (CF): This classification applies when the core of the line is filled with emission, but this emission does not pass the continuum level of the normalized flux. We set sublabels with CF, CF+, and CF++ to indicate when the central emission reaches approximately one-third, two-thirds, or up to the continuum level from the expected depth (see the top right panel of Fig. 3 for the last case).
-Double subpeak (DsP): This classification applies when both wings of the line are filled or are in emission above the normalized flux. This mainly accounts for fast-rotating stars with confined winds (see Petrenz & Puls 1996, Fig. 8 for more information). The labels DsP, DsP+, and DsP++ correspond to situations in which the filling of the lobes reaches one-third, twothirds, or above the continuum of the line depth. The latter generally includes Be stars, in which both sides of the line are in strong emission due to the presence of a disk. Two cases for DsP+ and DsP++ (for a confirmed Be star) are included in the left and right second row of the panels in Fig. 3, respectively.
-Red filling (RF): This classification applies when only the red wing of the line is filled. The labels RF and RF+ separate situations in which the filling reaches one-half, or up to the continuum of the line depth, respectively. An example of an RF+ profile is shown in the left panel of the third row in Fig. 3.
-P-Cygni shape (PCy): This classification applies when the emission is in the red wing of the line profile. In the temperature domain of the stars in the sample, the shape is generally produced by stellar winds with specific characteristics (see, e.g., Castor & Lamers 1979). We use the sublabels PCy, PCy+, and PCy++ to indicate the observed the emission up to ≈1.25, between this and ≈1.5, and any value above ≈1.5 (traditional P-Cygni profiles) from the continuum, respectively. One example of the PCy and PCy++ shapes is included in the right panel of the third row and in the bottom left panel of Fig. 3, respectively.
-Pure emission (Em): This classification applies when there is emission above the normalized flux. Similarly to the P Cygni shape, we set the labels Em, Em+, and Em++ to indicate when the emission reaches up to ≈1.25, between ≈1.25 and ≈1.5, and any value above ≈1.5 from the continuum, respectively. The latter case is found in many Be stars. One example of an Em+ profile is included in the bottom right panel of Fig. 3.
This classification scheme was also applied to all Hβ profiles. Notably, there are many situations in which the labels assigned to both profiles are not necessarily the same (see Sect. 4.2.5). For example, in the case of classical Be stars with circumstellar disks, the Hβ profile may be labeled DsP+ or DsP++, while the Hα profile has Em+ or Em++. A similar situation may occur in the case of stars with strong winds, where the emission in Hα is expected to be more significant than in Hβ.

Identification of spectroscopic binaries
We also carried out a visual identification of SB2+ systems in the sample (see Sect. 3.2). To do this, we inspected the temporal behavior of several diagnostic lines (mainly He i λ5875.62 Å, but also the Si iii λλ4552.62,4567.84,4574.76 Å triplet, or other lines, e.g., Mg ii λ4481, C ii λ4267, or Si ii λ6371) using all the available multi-epoch spectra. Some examples of identified SB2+ systems can be found in Fig. C.1. We also refer to a forthcoming paper (Simón-Díaz et al., in prep.) that provides the detailed identification of the single-line spectroscopic binaries (SB1) using spectroscopic and photometric data.

Line-broadening analysis
Although the full quantitative spectroscopic analysis of the stars in the sample (see Sect. 4.2) will be presented in forthcoming papers, here we followed the guidelines in Simón-Díaz & Herrero (2014) and Simón-Díaz et al. (2017) to obtain estimates of the projected rotational velocities of all stars in our working sample. In brief, we used the IACOB-BROAD tool to perform a line-broadening analysis of a set of adequate diagnostic lines. In particular, for stars with v sin i 200 km s −1 , we used a reduced set of unblended and strong absorption lines optimized for two ranges of spectral types. For stars earlier than B4, we used Si iii λ4567.85 Å, except when the line was weak, in which case, we used Si iii λ4552.622 Å. For B4 and later spectral types, we preferably used Si ii λ6371.37 Å, but depending on the spectrum, we also used Si ii λ6347.11 Å. However, the latter is blended with Mg ii lines, and therefore, we only used it in a very few occasions. For stars with v sin i 200 km s −1 , most of these lines are blended by other surrounding lines or became too diluted to provide reliable results. In this case, we decided to use He i λλ4387.93,5015.68 Å lines, which are less affected by wind than other He lines and are well isolated.

Results and discussion
4.1. Sample selection Figure 4 shows a histogram with the results of the FW3414(Hβ) measurements for the complete initial sample of stars described in Sect 2.1. We point out here that the fitting of the Hβ failed in ≈100 stars. About ≈75% of these correspond to Be stars, ≈15% correspond to SB2+ stars, and ≈10% correspond to hypergiants. The histogram separates different groups according to the LC quoted by default in Simbad and includes two additional groups that comprise on the one hand, the stars for which Simbad does not provide any LC (pink), and, on the other hand, the stars that we identified as SB2+ (gray; see Sect. 3.2.2).
The measured FW3414(Hβ) for the LC I stars peaks at ≈4 Å, and, except for a few cases, this LC group extends to FW3414(Hβ) ≈ 7.5 Å. Instead, stars with LC II, III, and IV are present in a wider range, with FW3414(Hβ) 4 Å, while stars with LC V are mostly concentrated at FW3414(Hβ) 6.5 Å.
Figure 5 helps us to better understand the distribution of the various luminosity class groups in this histogram. Similarly to Fig. 1, the left panel of this figure shows an sHRD, while the corresponding log(L/L )-FW3414(Hβ) diagram is shown in the right panel. This time, both diagrams are populated with a much larger sample, resulting from the preliminary quantitative spectroscopic analysis of a subsample of the stars we used to build Fig. 4, and comprising ≈500 O9 -B6 type stars. Figure 1 includes 9 SB2+ systems in which the effect of the secondary in the spectrum was such that it allowed for determination of some rough spectroscopic parameters. In addition, stars for which Simbad does not provide a luminosity class are represented as pink circles. The sHRD includes an additional sample (gray circles) of 280 likely single and SB1 stars investigated by Holgado et al. (2020).
First of all, by comparing the middle panel of Fig. 2 and the right panel of Fig. 5, we can confirm the good agreement of the model predictions and empirical measurements. Interestingly, fast-rotating stars tend to be located to the right of the rest of stars for each luminosity class, as predicted. Last, we note that the location of the stars that are identified as SB2+ in the log(L/L )-FW3414(Hβ) diagram is not particularly different from the rest of the stars with similar LC. This therefore indicates that the contribution of the two stellar components does not strongly affect the FW3414(Hβ) estimate, at least when one of the two components dominates the spectrum.
The location of the stars in the left panel of Fig. 5 shows that most of the stars with LC I have log(L/L ) ≈ 3.8-4.3 dex. The right panel of the same figure shows that this range in log(L/L ) implies a range in FW3414(Hβ) ≈ 1.8-6 Å, which explains the distribution of LC I stars in Fig. 4. For lower values of log(L/L ), the situation is more mixed. For example, the stars in the log(L/L ) ≈ 3.5-3.8 dex range have LCs V-IV at T eff > 30 000 K, but also II-I at T eff < 20 000 K. Essentially, all stars except for those with LC I lie between log(L/L ) ≈ 2.8 and 4.2 dex, which also explains their wider distribution in Fig. 4.
We also conclude based on the two panels of Fig. 5 that some of the classifications taken from Simbad may not be correct. For instance, they were especially suspicious for stars with LC I and some II with log(L/L ) < 3.5 dex that are embedded in the sHRD, where stars with LCs III-IV-V lie. To confirm this situ- ation, we reviewed the spectral classifications of many of these stars using our own available spectra or alternatively, previous classifications that in our judgment come from reliable references. The outcome of this exercise is summarized in Table D.1. As expected, most of them are early B-type stars that should be classified as giants, subgiants, or dwarfs. We also found that in many cases, we can attribute misclassifications to the fact that they were made using spectra at low resolution, as is the case of many classifications from Houk & Swift (1999) at R 2500.
This result again highlights the risk of using spectral classifications (especially if they come from heterogeneous sources) to select specific groups of stars and makes our proposed strategy much more robust and reliable. Moreover, this strategy has also allowed us to recover a non-negligible number of adequate candidates of interest among the stars quoted as OB or O stars in Simbad, or for which luminosity classes are not provided in Simbad that we would otherwise have missed. Revised spectral classification for these stars, following the guidelines indicated above, are provided in Tables D.1 and D.2.
As mentioned in Sect. 3.1.2, a good threshold value to select the evolved descendants of O-type stars is at log(L/L ) = 3.5 dex. Based on the empirical correlation between log(L/L ) and FW3414(Hβ) shown in the right panel of Fig. 5, we decided to use a filtering value of FW3414(Hβ) = 7.5 Å to select our sample (see below), which comprises a total of 728 stars.
By choosing this value, our selection criteria first became very effective in the selection of stars with LCs I -II, especially now that we have confirmed that almost all stars with log(L/L ) < 3.5 dex quoted in Simbad as LC I and II are LCs III, IV or V objects. In addition, as already pointed out in Sect. 3.1.1, our method also selects a significant number of stars with LCs III-IV-V (mostly early B giants and subgiants), which are of interest for the purposes of our study, which aims to perform an empirical characterization of the B-type stars that evolve from the MS O-type stars.
Due to the scatter of the log(L/L ) -FW3414(Hβ) correlation, we selected some stars with log(L/L ) in the range 3.1-3.5 dex (see right panel of Fig. 5), however. This inherent limitation of the proposed method does now allow us to exclude them without first determining their T eff and log(L/L ), or at least using additional information such as photometric data to filter these stars (see Sect. 4.2.2).
To conclude this section about the selection process, we now define our final (working) sample as the stars that are selected with the above-mentioned strategy, plus some additional hypergiant stars identified from the initial sample based on their characteristic P-Cygni profiles in the Hβ line (Lennon et al. 1992), leading to a final number of 733 stars. Although we were unable to obtain a reliable FW3414(Hβ) measurement for all the hypergiants, they are, in principle, also natural descendants from the O-stars. All of them (12) are listed in Appendix F. All the other stars without FW3414(Hβ) are considered for the completeness of the sample (see Sect. 4.2.3), as also in the forthcoming paper dedicated to spectroscopic binaries in the case of the SB2+ systems, but they are all excluded in the following sections. Table H.1 gathers the relevant information for the 733 stars resulting from the selection process described in Sect. 4.1. It includes an identifier of the star (ID), its B magnitude, and spectral classification (following Simbad); the name of the fits-file in the format of the IACOB spectroscopic database corresponding to the best spectrum (see Sect. 2.1) and its characteristic S/N in the 4000−5000 Å region; the measured value of FW3414(Hβ), whether the star has been identified as a spectroscopic binary (see Sect. 4.2.4); the morphological classification of the Hα and Hβ lines (see Sect. 4.2.5) and, last, the estimated v sin i (see Sect. 3.3). Although in Table H.1 we quote the spectral classifications provided by default by Simbad, we already pointed out that some of these classifications are inaccurate or even incorrect in some cases. The SpT and LC of stars whose spectral classifications we reviewed (see Sect. 4.1) are listed in parentheses. For the rest of this section and subsections, we adopted all the new classifications included in Appendix D to better interpret the results from the selection method.

Sample description
In addition, for all the above-mentioned stars, Tables H.2 and H.3 gather most of the relevant photometric and astrometric information from Gaia or Hipparcos, respectively. The two tables include the same identifier of the star as in Table H.1, the Galactic coordinates (l, b), the parallax ( ) and proper motions (µ α cos δ, µ δ ), and the distance, which in the first case was retrieved from Bailer-Jones et al. (2021; Distance B − J), and in the second case from the inverse of the parallax as a rough estimate (Distance). Table H.2 additionally includes the Gaia G, G BP , and G RP band magnitudes and the RUWE value, while Table H.3 includes the B − V magnitude. In some cases when the available data had very large errors, negative parallaxes, or any other issue, we did not include the distance. These cases usually match with bright targets. Figure 6 shows a histogram summarizing the spectral classifications of the stars in the final sample. As expected from the selection criteria (see Sect. 4.1), the majority of the sample consists of supergiants (364) and bright giants (119) compared to stars of other classes. In particular, it shows that while stars between O9 and B1 cover all luminosity classes, from B2 and especially B3 onward, the relative number of giants and especially subgiants and dwarfs decreases noticeably. From B4 and toward later-type stars, practically all our sources are classified as supergiants. Interestingly, despite the large number of stars included in our investigated sample, there is a clear lack of stars that are classified as B4 supergiants and bright giants.  Figure 7 shows a stacked histogram of the adopted distances for all the stars up to 4000 pc. Fifty percent of the stars have distances derived from Hipparcos. They were excluded because their values of σ / ≥ 0.1 (see also Table H.3). This histogram is complemented by Fig. 8, which shows the distribution of the stars in a polar plot using Galactic coordinates, and centered at the position of the Sun.

Astrometry and photometry
Below 700−800 pc, the total number of stars is relatively low ( 50). These stars are considered to form the Gould Belt system (Poppel 1997), but recent studies suggested that this structure is in fact not real (see, e.g., Bouy & Alves 2015;Zari et al. 2018). Then, the number of stars suddenly increases with distance, as is expected from the volume increase, but it also increases due to the inclusion of several large clusters and association of massive stars such as Cep OB2 (l ≈ 100 • , d ≈ 900 pc; Contreras et al. 2002) The number of stars at 1000−2400 pc remains more or less constant up to ≈2500 pc, when it begins to decrease significantly. The number of stars is expected to increase significantly with increased volume after 1000 pc, but we do not observe this. While one possible and simple explanation might be observational biases (discussed in Sect. 4.2.3), this can also be caused by the lack of structures with young formed regions. This has been observed from distances above 3000 pc, where we also observe a drop. However, at these distances, the stars begin to be too faint to be observed using the facilities that were used to build this sample (see below). Figure 8 shows that in general, there are no regions with average higher errors than others, but only some dispersed stars in the plane and likely with incorrect distances. In addition to the above-mentioned associations and galactic structures, the location of many others extend up to 3 kpc, such as the Per OB1 asso-   Shull et al. 2021). This result tells us about the fact that our sample is concentrated in relative close groups in the Galaxy following star-forming episodes, similarly to what was found by Pantaleoni González et al. (2021, Fig. 5). Additionally, there are large empty areas with a very low density of stars (many of which are likely runaways from the main clusters). While some of the empty areas can be attributed to gaps between the spiral arms in which no star formation occurs, others can be attributed to very high extinction (e.g., toward the direction of l ≈ 45 • ), the latter especially in the direction of the Galactic center (see Lallement et al. 2018Lallement et al. , 2019.  Last, Fig. 9 shows the stars in the sample in a histogram of the apparent B mag magnitude. The stars with distances taken from Hipparcos are all located within the first 100 pc in Fig. 7. The fact that most of these stars are among the brightest ones is not casual, as Gaia is known to have saturation issues for the brightest targets (see Lindegren et al. 2018), thus missing the astrometric and photometric data. This histogram also shows that the number of stars exponentially increases up to B mag ≈ 9.5 and then decreases rapidly. This is expected because fainter stars become less suitable for an adequate use of the considered observing facilities (see Sect. 2.1), in order to reach a balance between the exposure times and S/N. Figure 10 provides a different and complementary view of the sample, this time in a color-magnitude diagram (CMD) constructed using data from Gaia DR3. Similarly to the case of Fig. 5, a gradient in the position of the stars with LC from I-II down to III-IV-V is visible. More interestingly, compared to the expected location of the investigated sample of stars in the CMD (if they were not affected by extinction), namely, between the depicted 9 M and 40 M evolutionary tracks, almost no stars populate this region. This indicates that an important fraction of stars in our sample is expected to be affected by extinction higher than 1 mag (and up to 3−4 mags in some cases). This diagram is inefficient for a classification into O-and B-type stars beyond providing a rough estimate of the luminosity class: the range of variation in G BP − G RP of the evolutionary tracks is negligible compared to the scatter of the empirical sample due to the extinction. More specifically, we accounted for ≈50 O9 -B1 LC I stars with G BP − G RP > 0.80, that is, with a very high reddening compared to the reddening of the same type that is present at G BP − G RP ≈ 0.0.
The top extinction line in Fig. 10 was chosen to locate the approximate position of the stars that evolve from the O-type and contain all the BSGs because it coincides with the approximate separation between O-and B-type stars. Although this approach might also be used together with FW3414(Hβ) to filter undesired stars, we decided not to use it because of the large uncertainties in the distance of some of the stars in the sample (especially for the bright targets). While some stars are incorrectly classified with LC I (Sect. 4.1) some others are likely below this top extinction line due to underestimated distances. The latter is compatible with the scenario presented in Pantaleoni González et al. (2021), who used a different prior that produced longer distances compared to those used in this work from Bailer-Jones et al. (2021). Nevertheless, most of the stars in the sample are located above the top extinction line.
We recall here that our selection method based on the values of FW3414(Hβ) also selects stars with 3.5 log(L/L ) 3.1 dex. Despite the possibility of underestimated distances, most of these stars are located between the two reddening lines, which indicates that by using the correct distances, this method effectively selects stars.

Completeness and observational biases
This section is dedicated to providing an overview of the completeness and potential observational biases that affect our sample. As shown in Fig. 9, this sample was mostly built to be a magnitude-limited sample. In this regard, it is important to remark that any use of this sample to empirically constrain evolutionary models, for instance, must take into account that for a given limiting magnitude, there will be more mid to late B-type stars than early B-or O-type because the first are intrinsically brighter in this band than the latter group, and are therefore observable at larger distances (ignoring extinction). However, as shown in Sect. 4.2.2, many stars within the sample have very different reddening, which dims their brightness up to several magnitudes. As a consequence, the extinction must be taken into account to unbias the sample from potentially fainter stars that are in fact at closer distances. To overcome these biases, one possibility is to investigate to which extent we can build a volume-limited sample (see below).
In order to assess the completeness of our sample within a specific volume or distance, we cross-matched our database with the ALS III catalog (Pantaleoni González et al., in prep.), which not only includes all the stars in our sample, but also provides an increased number of stars compared to earlier releases by including fainter objects (down to B mag ≈ 16) and covering a larger distance (up to 10 000 pc). However, despite the authors' claim of high completeness within the first 5000 pc, the fraction of stars with B mag > 16 within this distance is not negligible due to extinction along the line of sight. For instance, if we consider stars up to 2000 pc and use the absolute V-band magnitudes from Lesh (1968), the intrinsic faintest stars within our sample could have a maximum extinction of A v = 8−9 in order to have B mag ≤ 16 (see further notes in Appendix E). While some star-forming regions within this distance exhibit average extinction values close to this limit (e.g., Cyg-OB2, A v = 5−7; Massey & Thompson 1991), highly obscured massive stars have been found to have extinctions exceeding this limit (see, e.g., Callingham et al. 2020), thus indicating that a complete sample cannot be assumed. Nevertheless, we consider the ALS III to be a very good reference to find missing stars within the observing magnitude limitations. Figure 11 presents a similar CMD diagram as we showed in Fig. 10, but this time, including all stars quoted in the ALS III catalog that fulfill the following criteria: (1) they are classified as O9 -B9 in Simbad, (2) they are located above the extinction line corresponding to a ZAMS star of 9 M , (3) they have a B mag < 11, and (4) their estimated distance, derived as described in Sect. 2.2 for the case of our working sample, is below 4000 pc. We are mostly interested in the stars located above log(L/L ) = 3.5 dex, which correspond to the sample located above the top reddening line (green stars). We therefore Observed O9 -B9 stars are in green, while missing stars from the ALS III are colored in two gray colors: in dark gray, missing stars with B mag < 9, and in light gray, stars with 9 < B mag < 11. For each group, hatched areas indicate those stars with G BP − G RP > 0.5. only discuss the completeness and potential observational biases that affect this region. Before we discuss this, we remark that even considering the full sample of O9 -B9 stars in ALS III, the lack of stars in the region of the diagram in between the nonreddened 15−40 M evolutionary tracks still persists.
To visually evaluate the completeness, Fig. 12 shows a stacked histogram of the selected sample of stars from ALS III that are located above the top reddening line of Fig. 11, indicating the stars for which we have spectra, and separating the missing stars into those with B mag < 9, and those with 9 < B mag < 11. In addition, we highlight the stars with a G BP − G RP > 0.5, that is, we indicate the targets that are more strongly affected by extinction.
This figure is complemented by the information provided in Table 1, were we present a summary of the total number of stars and relative completeness of stars in the above-mentioned region of the Gaia-CMD for four different distance ranges, and we separate stars below or above the B mag = 9 threshold. The corresponding statistics are given for the complete sample as well as for the stars that can be observed from the northern hemisphere (i.e., with δ > −20 • using either the NOT or Mercator telescopes).
Up to 2000 pc, the sample is complete up to 77% of the total stars with B mag < 9. This number is reduced to 58% for the stars that are located farther away. It is important to point out that the IACOB project observes only with the NOT and Mercator telescopes from the northern hemisphere (see Sect. 2.1). In the case of the southern hemisphere, we are currently limited to FEROS objects retrieved from the ESO public archive. In fact, only 20% of the spectra come from FEROS. To state this in numbers, if we only consider the stars with declination above −20 • , we currently miss 30% from those up to 2000 pc (93 stars) with B mag < 11, and only 13% (44 stars) with B mag < 9 up to 4000 pc. These missing (northern) stars are currently in observing queues for upcoming IACOB observing campaigns. Regarding the missing sources with declination below −20 • up to 4000 pc (162 stars with B mag < 9, and 321 with 9 < B mag < 11), we are currently  Fig. 11). In particular, the first two columns indicate the total number of stars with B mag < 9, and the corresponding percentage of observed stars with respect to the total. The last two columns indicate the same, but for stars with 9 < B mag < 11.
in the process of improving the situation through a recently approved proposal. In particular, the awarded time is ≈250 h distributed in two semesters. with FEROS. Last, we evaluated the contamination of classical Be stars in the completeness because these stars are not of interest here. We concluded that while they only represent ≈11% of all the stars in Fig. 12, they are homogeneously distributed in Fig. 11 above the top reddening line (but close to it), and therefore, they do not affect the results in Table 1. Figure 13 provides a complementary view of the available and missing stars, this time showing their location in the Galaxy and following the same color code as in previous figures, but separating the stars by their G BP −G RP value in two groups. Globally, for the region between l = 0 • and 230 • , the completeness is very high, while the missing southern stars are located between 230 • and 0 • . The top panel for stars with G BP − G RP < 0.5 shows the less reddened stars. The reason for separating stars with −20 • in Table 1 is also more evident: The vast majority of the missing stars are concentrated in the Carina region of the southern hemisphere (l ≈ 270 • −315 • , d ≈ 2500 pc). Additionally, most of the missing stars in this panel have B mag < 9 (dark gray circles), while almost no stars with 9 < B mag < 11 (light gray circles) are located within 2000 pc. The bottom panel instead shows the more reddened stars, most of the missing stars of which have B mag > 9. They are located not only in the Carina region, but also in some other specific regions such as the Sagittarius region (l ≈ 340 • −25 • , d ≈ 1000−2500 pc), the Cygnus region (l ≈ 80 • , d ≈ 2000 pc), and the Cassiopeia region (l ≈ 115 • −135 • , d ≈ 2500−3500 pc). The fact that many stars are observed in some of these regions shows the very different existing degrees of extinction (see, e.g., the Cygnus region). Interestingly, most of the missing stars in the northern hemisphere are spread throughout a wide space in the Cassiopeia region.
Regarding the observational biases within the spectral classifications, we carried out homogeneous observations in the northern hemisphere trying to ensure completeness within the B0-B3 I-III and B3 -B9 I-II type stars up to B mag = 9. In fact, the percentages given in Table 1 increase by 5% to 10% when only Fig. 13. Two polar plots in Galactic coordinates of all the stars above the top reddening line of Fig. 11, separating in the top and bottom panels those with G BP − G RP ≶ 0.5. In both panels, light gray or gray circles indicate missing stars separating those with 9 < B mag < 11 or B mag < 9, respectively, and with green circles the stars for which spectroscopic data are available. the stars of LCs I and II are considered. Of the stars with LCs III-IV-V, ≈80% are already observed, and we expect this percentage to reach ≈95% in the next upcoming campaigns, taking into account that in this case, a non-negligible number will be Be stars, which not of interest here.
To summarize, we can confidently say that we have achieved a homogeneous and unbiased sample with a high degree of completeness with respect to the ALS III catalog, except for stars with δ −20 deg, and this also shows the success of the IACOB project within the past ten years of observing in the northern hemisphere.

Identification of double-line spectroscopic binaries
As explained in Sect. 3.2.2, we also visually inspected all the multi-epoch spectra in our serach for potential SB2+ systems. Figure 4 shows that the fraction of SB2 systems is smaller toward higher FW3414(Hβ) values, and there are almost none such systems below 4.5 Å. Within the stars that fit the selection criteria (see Fig. 6), we account for 56 SB2+ systems, and it is unclear for additional 7 systems whether they are SB2 systems or if the features in the diagnostic lines are caused by long profile variability. These systems are indicated as LPV/SB2? in the SB column of Table H.1 and will require additional spectroscopic observations and possibly photometric analyses to determine whether they are binary systems or if the features observed in the spectra correspond to variability, such as pulsations. Fife of the 56 SB2+ systems have a third component (formally SB3 systems). Twenty-one systems correspond to B-type stars and the rest are O9-type stars. We identified 8 new systems. The new identifications are marked with an asterisk in the SB column. Some of the cases (HD 7252, HD 12150, and HD 46484) were already proposed as multiple systems based on the astrometric anomalies (Kervella et al. 2019), or they were photometrically observed as eclipsing binaries (HD 142634, HD 153140, and HD 170159). Figure 6 also shows that compared to the number of binary systems with an O9-type primary star, the number of systems with a B-type primary within the sample is much lower. Although several studies have estimated multiplicity for O-type stars (e.g., Barbá et al. 2010;Sana et al. 2013), not as many have explored the B-type domain, and most of them only covered the dwarf star regime, with many classical Be stars within their samples (e.g., Evans et al. 2006;Bodensteiner et al. 2021;Banyard et al. 2022). The number of studies dedicated to cover the B-type supergiants is even smaller (Dunstall et al. 2015;McEvoy et al. 2015). We have found that the fraction of O9-type binaries in the sample is 23% and there are 3.6% B0 -B9 stars in the sample (i.e., mostly including I-II type stars), which is about six times less. This observed decrease is expected from the natural evolution of these objects because some of them may either have become much more luminous than the companion (by evolution or mass transfer ;Langer 2012;de Mink et al. 2013), merged into a single more massive object (e.g., Podsiadlowski et al. 1992;Vanbeveren et al. 2013), or were even separated after supernova (e.g., de Mink et al. 2011de Mink et al. , 2014, and only the less evolved companion remained. Figure 14 shows four histograms indicating the number of stars as a function of SpT and LC for which the various types of Hβ and Hα profiles presented in Sect. 3.2.1 are identified. We note that because the stars labeled CF, DsP, PCy, and RF still have a strong absorption shape, they were joined with the stars labeled Ab (Absorption) for the purposes of this figure. We also removed all the SB2+ systems here.

Morphology of the Hβ and Hα profiles
As illustrated in the bottom two panels of Fig. 14, the majority of the stars have Hβ profiles in absorption, with the exception of a group of stars labeled RF+ (red wing filled) and PCy/PCy+ (P-Cygni shape), both evidencing the presence of winds (Lucy & Solomon 1970;Castor et al. 1975). This group is concentrated within SpT O9 -B2 and LC I (and subtypes). Most of those labeled PCy+ correspond to hypergiants (Ia+), as shown in the bottom right panel (see also Appendix F).
The two panels at the top show a more mixed situation for the Hα line. This is because the effect of the stellar winds on the Hα profile is stronger than on the other hydrogen lines of the Balmer series. Interestingly, the relative fraction of stars with Hα in absorption with respect to the total for each bin decreases toward later spectral types (it is ≈50% on average). A similar decrease occurs in the luminosity classes from dwarf stars toward the supergiants. In particular, most of the stars with LCs Ia+, Ia, Iab are labeled RF+ and PCy/PCy+, which is also the case of the B8-B9 type stars. The extended presence of P Cygni shapes is expected because it is well known that B-type supergiants have stronger stellar winds than other less luminous stars (Kudritzki et al. 1999;Markova & Puls 2008). Stars with Hα in emission (7% of the total) are present in all spectral types, but are mostly concentrated in LC I. In fact, nearly 90% of the stars with LCs IV and V have absorption profiles in Hα. This indicates that stars affected by winds are located at higher luminosities. Last, profiles with sub-double-peak emission make up 10% of the total, and those in which the core of the lines is filled make up only 2%.

First hints about the rotational properties of the sample
The top panel of Fig. 15 shows a histogram of the projected rotational velocities of our working sample (excluding the stars labeled SB2+ and some problematic cases), highlighting in different colors the different LCs of the stars, and including seven stars with v sin i > 340 km s −1 in the last bin. This distribution has a mean and median values of 80 km s −1 and 51 km s −1 , respectively (64 km s −1 and 47 km s −1 for stars with LCs I and II), and a standard deviation of 74 km s −1 (50 km s −1 ), with ≈85% of the sample having values below 150 km s −1 , which is followed by a tail of fast-rotating stars. In addition, the bottom panel of the same figure shows the corresponding cumulative distribution. There, stars with brighter luminosity classes reach higher fractions earlier than those that are less luminous. In particular, for the supergiant stars they reach 80% within the first 70 km s −1 alone. The rest increase more or less gradually.
Stars with LCs I and II, which are the main drivers of our study, are shown in the main plot of Fig. 16. For each SpT in the O9 -B9 range, the median of the v sin i distribution and upper and lower limits corresponding to percentiles 90%, 75%, 25%, and 10% are plotted. The main characteristic of the figure is that the mean values for both LCs gradually decrease down to 30 km s −1 for the B3-type stars and remain in that value up to the B9-type stars. This behavior also holds at the lower limits of the error bars (slowly rotating stars). The second main characteristic is that the upper limits of the percentiles (mainly corresponding to the faster-rotating stars) also follow the same trend up to the B3-type, where their number suddenly drops. In more detail, the panel inside includes histograms of the v sin i distribution for the B1-type stars separating LCs I from II, illustrating the abovementioned characteristic. There, the main distribution is centered at ≈50 km s −1 , followed by a relatively smooth tail of these fastrotating objects, with v sin i extending up to 300 km s −1 .
To compare our results with some other results of v sin i from the literature, we mainly considered studies that also used large samples of O-stars and BSGs. For instance, Howarth et al. (1997) obtained a median value of 91 km s −1 for 373 stars, with an average difference between O-and B-type of 25 km s −1 , that is, higher for O types. Compared to our results, the slightly higher values obtained there can be explained when the effect Results of the number of stars against their spectral type (two leftmost panels) and luminosity class (two rightmost panels), represented in histograms. Each bin stacks the number of stars with different morphologies found for the Hβ and Hα profiles following the classification in Sect. 3.2.1 and grouped as in the legend. Stars labeled within the "Absorption" label include profiles CF, DsP, PCy, and RF. The sources with LC I without subtype in the literature are grouped into a separate bin. The sources classified as SB2+ are removed from the bins. of macroturbulence broadening is taken into account, which can shift v sin i by ≈25 km s −1 (Simón-Díaz & Herrero 2014). This is shown in Fraser et al. (2010), who found a ≈30 km s −1 offset with the stars in common with Howarth et al. (1997) from their sample of 57 Galactic BSGs. These authors obtained mean values of ≈60 km s −1 with a standard deviation of 50 km s −1 , which is very similar to our results (see also Hunter et al. 2008;Simón-Díaz & Herrero 2014, for more examples including BSGs).
Similarly to Fig. 16, Fraser et al. (2010) and Simón-Díaz & Herrero (2014) included distributions of v sin i with respect to the SpT for stars with LCs I and II, although the latter only reached B2-type stars. In both cases, their results are very similar to ours. On the one hand, in all the cases v sin i values decrease toward mid-B-type stars. However, our significantly larger sample (comprising seven to eight times more stars) clearly extends the presence of fast-rotating stars with v sin i ≥ 100 km s −1 up to the B3-type stars. On the other hand, the slow decrease given by the mean values might be explained by the natural evolution of the stars, decreasing their v sin i as they increase their radii. However, this decrease does not continue after the B4-type stars, suggesting that either they loose enough angular momentum during their evolution (e.g., via stellar winds; Vink et al. 2010) or that our methodology does not allow us to measure the actual v sin i below a certain threshold limit (see Simón-Díaz et al. 2017).
We also compared our results for the v sin i with those from Holgado et al. (2022) for the O-type stars (see the gray dots in Fig. 5) to show the differences with the BSGs as the first are the progenitors of the second. A simple comparison with Fig. 15 indicates a similar distribution that includes a main group of stars at lower v sin i values, followed by a tail of fastrotating stars. Moreover, the stars with LC I are concentrated at v sin i 150 km s −1 , similarly to our sample of stars of the same LC. Moreover, the cumulative distributions for this LC also look very similar. Last, the results from Holgado et al. (2022) for stars located between the 20−32 M evolutionary tracks (i.e., those evolving into the early B-type III-IV-V stars) also show higher v sin i values than those directly above, which is the same situation as we showed in Fig. 15, even though our sample is biased by the relative number of stars with LCs III-IV-V with respect to I and II.
These results clearly indicate that the v sin i values do not significantly decrease during the transition from O-type stars toward the BSGs (and the additional B-type III-IV-V stars), thus favoring evolutionary models in which the braking effect is weaker and/or the angular momentum transfer from the cores to the outer layers is very efficient and counteracts the effect (see Brott et al. 2011).

Summary, conclusions, and future work
We have built a homogeneous spectroscopic sample of 733 O9-B9 stars, the majority (483) of which have luminosity classes I and II, while the rest (250) are late O-and early B-type stars with classes III-IV-V. We benefited from over two decades of observations using the same three high-resolution spectrographs of high stability, resolution, and precision.
To create the sample, we explored the possibility to use one simple measurement on the Hβ line to select stars above a certain log(L/L ) in the sHRD, filtering out other undesired stars from the initial sample of stars with mixed luminosities. With the exception of some stars that are strongly affected by disks or stellar winds, for which the Hβ line could not be measured (e.g., hypergiants, classical Be stars), this new method has been proved to be of much utility using either low-or high-resolution spectra. Furthermore, althoug we limited the applicability of this method to O9 -B9 stars, its usage can also be extended to also include the A-type supergiants, or it can include all the O-type stars.
We have also performed a morphological classification of Hβ and Hα lines, identified 56 SB2+, and studied the distribution of v sin i among the stars in the sample, and in particular, those with LC I-II, finding that the presence of fast-rotating stars extends up to B3-type stars.
Compared to other works using large samples of stars observed at high-resolution, our sample of BSGs is ≈3.5 times larger (see Simón-Díaz et al. 2017). Compared to other studies in which spectroscopic parameters for Galactic BSGs are also derived (e.g., Crowther et al. 2006;Markova & Puls 2008;Weßmayer et al. 2022), our sample comprises between ≈8−25 times more stars.
With the only limitation of stars that are too faint to be observed with the facilities used, we have evaluated the completeness of the sample (Sect. 4.2.3) and found that for stars below B mag ≤ 9, we have ≈90% completeness of those in the northern hemisphere (δ > −20) up to 4000 pc, or ≈80% within the first 2000 pc in both hemispheres. We are carrying out large observing campaigns for the missing stars in both hemispheres to be complete up to B mag = 9.
Some main conclusions can be extracted from this work. First, the method based on the FW3414(Hβ) will not only (but certainly) help to filter new observations, but will also be very useful in next generation spectroscopic surveys such as WEAVE-SCIP (Dalton et al. 2020;Jin et al. 2023) or 4MIDABLE-LR (Chiappini et al. 2019) because these surveys will increase the available spectra of observed BSGs in the Milky Way by some orders of magnitude, and the BSGs that will be observed have been never classified before. Second, by using this method, we have also identified the risk of using spectral classifications from the literature. In particular, we found that many stars were classified as supergiants in the regions of the sHRD in which subgiants and dwarfs are located. Last and very importantly, the sample of ≈750 O9-B9 stars introduced in this work is large enough to set (with enough confidence) new empirical constraints to be used as anchor points in our understanding of the nature of massive stars, and more specifically, in the regime of the Galactic BSGs. In particular, the spectroscopic parameters for these stars derived from the quantitative spectroscopic analysis will contribute to improving the state-of-the-art evolutionary models (e.g., those from Brott et al. 2011;Ekström et al. 2012) by constraining some input parameters of their convective cores (e.g., the core overshooting; Schootemeijer et al. 2019;Martinet et al. 2021), and also by helping us to constrain the location end of the main sequence (Vink et al. 2010;Castro et al. 2014). The analysis of the multi-epoch spectroscopic data and the identification of the SB1 and SB2 systems together with additional empirical information from Gaia (e.g., the proper motions) can also help to constrain the fraction of binaries that can interact or even become mergers at some point in their evolution (Sana et al. 2012;de Mink et al. 2014). Moreover, spectroscopic variability in combination with the pulsational properties derived from photometric light curves can also be used to continue the work by Burssens et al. (2020) of studying stellar interiors (Saio et al. 2013;Bowman 2020;Aerts 2021). The inclusion of derived abundances can also help us to disentangle the different populations of stars that approach or return from the RSg phase (Georgy et al. 2021).
In this first paper, we have given a glimpse of the potential of the sample by obtaining first results of their rotational properties, and in particular, the v sin i. Interestingly, we found fast-rotating stars of LCs I and II up to B3-type. Considering the results from Crowther et al. (2006), Weßmayer et al. (2022), B3-type stars can have temperatures down to ≈16 000 K for LC I. This means that our results extend the presence of fast-rotating stars by some thousand kelvin below what has been observed previously (e.g., Fraser et al. 2010;Vink et al. 2010), probably as a consequence of the much increased statistics. Although further analyses using derived T eff and log g will provide additional information, as pointed out in Vink et al. (2010), the presence of fast-rotating stars could have a very important role in locating the terminal age main sequence, which is an open question regarding the BSGs (see also Castro et al. 2014). Nevertheless, as also pointed out in Vink et al. (2010), additional mechanisms (e.g., enhanced mass losses; Vink et al. 2010) might also slow the rotation of these stars down. We recall that some of them might be the result of binary interaction ). also applies to stars with a very low error over parallax. In this case, stars with larger errors also have larger discrepancies, but we attribute this to the particular cases mentioned above that affect the reliability of the initial data. As a conclusion, the prior used in Pantaleoni González, et al. (in prep.) gives larger distances as a function of the distance itself, populating the sample of stars up to ∼4 000 pm rather than ∼3 000 pc as in Bailer-Jones et al. (2021). If this is correct, it would imply that on the one hand, we might miss fewer stars than previously shown in Fig. 12 between 1 000 -3 000 pc, which therefore increases our completeness. On the other hand, it would also imply that we may miss additional stars that would move up in the green area in Fig. 11, which therefore would likely increase our number of missing stars.

Appendix E: Further notes about the completeness
In Sect. 4.2.3 we provided some information about the usability of the ALS III as a reference catalog in which to check the completeness of our sample of stars up to a given distance. Figure E.1 shows the apparent magnitudes of a set of stars covering the space of SpT and LC in our sample against different distances and for two values of total extinction, A v = 0 and A v = 8. There, we indicate three threshold magnitudes at m v = 9, 11, and 16. The first and second constrain the magnitudes that are hardly (but not impossibly) accessible from the observing facilities we used to build this sample, while the latter corresponds to the magnitude from which point the ALS III catalog may begin to miss start. At m v = 16, in the case of A v = 0 (highly ideal), we would be able to observe all stars up to even 4 000 pc with an apparent magnitude m v < 9. However, as pointed out in Sect. 4.2.3, the extinction in the Milky Way can be severe in certain sight lines (A v > 4). When we focus on distances below 2 000 pc, no stars would be missing even at A v = 8. However, we refer to Fig. 5, which shows that the coldest LC II-III stars above log(L/L ) = 3.5 dex lie below the 20 M track, which indicates that these stars likely do not evolve from the O-type stars. When we take this into account, we could extend the maximum value of A v up to A v = 10 -13 (for LC I), where still we would not expect to miss many BSGs, except for for those that are extremely reddened.

Appendix G: Comparison of the line-broadening
We compared our results for the v sin i and those obtained with the Radial Velocity Spectrometer (R ≈ 11 500) on board Gaia to account for the line-broadening (Vbroad). As pointed out in Frémat et al. (2023), the parameter Vbroad does not account for other mechanisms affecting the line profiles, neither for the macroturbulence (e.g., Aerts et al. 2009;Simón-Díaz et al. 2010), which is assumed to be 0 km s −1 , nor for the microturbulence (e.g., Cantiello et al. 2009;Grassitelli et al. 2015) as in our case. Moreover, the hottest stars for which radial velocities are derived in the DR3 reach only 14 500 K (Blomme et al. 2023), and therefore, we did not expected to obtain accurate measurements of Vbroad but only a rough estimate. Figure G.1 compares the two measurements for the sample of O9 -B9 stars with a FW3414(Hβ) < 7.5 Å, excluding those with a poor determination of v sin i (see Sect. 4.3) and the SB2+. We note that only ∼15% of the stars had values of Vbroad. The comparison shows that on the one hand, only very few stars had values for Vbroad above v sin i 70 km s −1 compared to the number of stars with v sin i above that value (see Fig. 15), but for stars with v sin i 120 km s −1 , the agreement is lower than 10 km s −1 .
On the other hand, the O9 -B4 stars with v sin i 70 km s −1 have systematically higher values of v sin i with differences 120 km s −1 in several cases, thus indicating that in this domain, the values for Vbroad are not reliable. The only case in which the results from Vbroad are similar to ours correspond to the late-B stars with v sin i 40 km s −1 .