A typology of consonant-inventory gaps

: This article provides a new precise algorithmic definition of the notion “ phonological-inventory gap ” . On the basis of this definition, I propose a method for identifying gaps, provide descriptive data on several types of consonant-inventory gaps in the world ’ s languages, and investigate the relationships between gaps and inventory size, processes of sound change, and phonological segment borrowing.


Introduction
When discussing processes of change in segment inventories, phonologists often invoke the notion of inventory gaps, a notion which is used to describe a situation when certain segments are expected to be present in an inventory but are not there. 1This notion can be interpreted in two basic ways.
On one hand, there is a set of consonants that are found in most languages of the world, and it is notable when those are absent.This understanding is reflected in Maddieson (2013a), which deals with the absence of cross-linguistically frequent sounds in phonological inventories.For example, voiced pharyngeal fricatives are cross-linguistically rare, and most linguists would not consider their absence to be remarkable.However, a language missing a voiced bilabial nasal /m/ is unusual enough to be noted in descriptions of languages like Wichita (wich1260; Garvin 1950).
On the other hand, an inventory is often considered to contain a gap when it contains several segments whose specifications in terms of voice-onset time (VOT), this constructed segment is lacking.Thus, it would not be particularly surprising to find a language without a voiceless labio-dental fricative /f/. 2 However, if this language also has bilabial stops /p b/ and a voiced labio-dental fricative /v/, the absence of /f/ becomes unusual: labio-dental fricatives usually co-occur with bilabial stops, and given that the language has both voiceless and voiced bilabial stops /p b/ and the voiced labio-dental fricative /v/, one would expect it to have the voiceless labio-dental fricative /f/ as well.Similarly, it cannot be said that /b/ constitutes a gap in an inventory without voiced stops, but it is a gap in a language with /p t k d ɡ/. 3  The hypothesis that languages tend to lack gaps of the second type gave rise to Hockett's (1958: 109) "Principle of the Neatness of Pattern" in phonemic analysis, as well as later theories such as the Principle of Feature Economy (Clements 2003) and Geometric-Constraints theory (Dunbar and Dupoux 2016).In a diachronic setting, this hypothesis was used to explain tendencies in sound change processes (Martinet 1952) and phonological borrowing (Maddieson 1985).
Despite such widespread use of the notion of inventory gaps, no comprehensive statistical data on the distribution of gaps have been published.The aims of this paper are (i) to provide a new algorithmic definition of this notion, (ii) to present typological findings, and (iii) to highlight possible connections between inventory gaps, sound change processes, and consonant borrowing.
The paper is organised as follows.Section 2 surveys related work on the subject.Section 3 provides definitions of several classes of inventory gaps and an algorithm for extracting them from datasets; it also describes the dataset used for this study.Section 4 describes within-manner gaps, i.e. gaps in stop, fricative, and affricate inventories established based on VOT and PLACE distinctions.Section 5 describes between-manner gaps, i.e. gaps in stop, fricative, and affricate inventories established based on MANNER and PLACE distinctions.Section 6 briefly investigates the connection between the number of gaps in an inventory and its size.Section 7 gives an overview of the distribution of gaps of different types across macro areas.Section 8 discusses the connections between inventory gaps and sound change, while Section 9 discusses the role of "gap filling" in segment borrowing.Section 10 concludes the paper and discusses possible avenues for future research.
2 VOTsensu strictu is only defined for stops because it is measured from the moment of the opening of the primary oral closure, which does not occur with fricatives.However, there is no real difference between how stops and fricatives are described in practice: both classes of sounds can be voiceless, partially or fully voiced, or aspirated.In this paper, I do not refer to exact vot values and take this category as corresponding to voicing. 3 There is also a notion of distributional gap describing, e.g., the inabilities of English /ŋ/ to appear word-initially (Iverson and Salmons 2005).This use of the term "gap" does not concern us here.
The problem of how to explain gaps and asymmetries in segmental inventories has attracted researchers for a long time.An important treatment of this subject was given by Gamkrelidze (1973Gamkrelidze ( , 1975)), who proposed a markedness theory of phonological gaps: he observed that voiced stops become progressively more marked the further back they are pronounced (with /ɡ/ being a frequent gap), while voiceless stops display the reverse tendency (with /p/ being a frequent gap).A similar tendency was observed by Sherman (1975).Based on the analysis of voiceless and voiced stops in a sample of 571 languages, he found that /p/ was lacking 34 times, /t/ and /k/ zero times, and /b d ɡ/, 2, 21, and 40 times respectively.This directional interpretation, although given more weight by laboratoryphonology research (Ohala 1983), was later questioned by Maddieson (2013b), who noted that the absence of /p/ in particular seems to be an aerial phenomenon characteristic of a group of languages concentrated around the Sahara desert.
Another line of work concentrated on why languages are expected to lack gaps and whether this is indeed the case.Thus, Lindblom and Maddieson (1988, 70) noted that "small paradigms tend to exhibit "unmarked" phonetics whereas large systems have "marked" phonetics", and Clements (2003, 287), in his influential paper on the feature-economy principle, hypothesised that "languages tend to maximise the ratio of sounds over features".This thesis was upheld as a statistical universal in several later publications (Coupé et al. 2009;Dunbar and Dupoux 2016;Marsico et al. 2004), but it also came under criticism because it fails to adequately predict actual inventory structures (Nikolaev and Grossman 2020).From the point of view of the analysis of gaps per se, these approaches give only an aggregate view of the prevalence of gaps in inventories, and some of the phenomena that depress economy/symmetry indices (e.g., the fact that a language has a single lateral segment) do not correspond to gaps as they are customarily thought of.
An attempt to decide which of the two theories, the markedness approach or the feature-economy approach, better explains the gaps found in the inventories in the PHOIBLE dataset (Moran and McCloy 2019) has been recently made by Wang, who defines a gap as "the absence of an [α voice] stop/fricative in a certain place of articulation when a [−α voice] counterpart exists in the inventory" (Wang 2019: 195). 4Wang fitted several classification models that, given two versions of an inventory-one with the gap and one where the gapped consonant replaced the actually present "foil" consonant (e.g., with /k/ replaced with the missing /ɡ/)-decided which was the real one.Markedness-based models performed better than feature-economy based ones.This outcome is not surprising given that the markedness theory was explicitly formulated in order to explain frequent gaps, while the feature-economy-based theories regard gaps as deviations from the predicted inventory shapes, which should be explained based on articulatory and auditory constraints (Clements 2003: 288-289).The latter constraints, however, were not modelled explicitly by Wang, nor was there a systematic survey of identified gaps.

Defining and identifying gaps
This section provides new formalised definitions of two kinds of inventory gaps and proposes algorithms for their identification in datasets (Section 3.1).The results reported in the following sections were obtained using an IPA-featureparser script and a fully parsed subset of PHOIBLE.They are described in Sections 3.2 and 3.3 respectively.

Within-manner and between-manner gaps
Enumeration of feature combinations made "available" by the make-up of an inventory but not realised in practice is a computationally intensive combinatorial task because additional articulations can be nearly arbitrarily combined.In order to make the process more manageable, it is necessary to restrict the combinations of interest by selecting features that can vary and their possible values.In this study, I begin with the simplest possible feature combinations that can produce typologically interesting gaps.I consider place of articulation (PLACE), manner of articulation (MANNER), and voice-onset time (VOT).Furthermore, I leave place of articulation unrestricted but only consider voiceless and voiced stops, fricatives, and affricates.
In order to define and identify gaps I introduce the notion of gap-generating triplets: A GAP-GENERATING TRIPLET (GGT) is a triplet of consonants, such that two pairs of consonants from it are featural minimal pairs with respect to two different features and the members of the third pair differ in the values of both these features.
For example, /b d z/ is a gap-generating triplet (GGT) because /b d/ differ only in their place of articulation, /d z/ differ only in their manner of articulation, and /b z/ differ in both place and manner of articulation.Similarly, /p t d/ is another GGT, because /p/ and /t/ differ only in their place of articulation, /t/ and /d/ only in terms of VOT, and /p/ and /d/ in PLACE and VOT.
In order to construct or identify GGTs for this study, it is necessary to fix the value of one of the three selected features.When one fixes the manner of articulation, one obtains within-manner GGTs; when one fixes the VOT, one obtains between-manner GGTs.For example, /b d z/ is a between-manner GGT on voiced stops and fricatives, and /p t d/ is a within-manner GGT on stops.All other values of all other features must agree for the GGT to be well formed.Thus, /bʲ dʲ zʲ/, /bʷ dʷ zʷ/, and /bʲ ː dʲ ː zʲ ː/ are GGTs, but /bʲ ː dʲ zʲ ː/ and /b dʷ zʷ/ are not, because not all segments agree in length in the first, or in labialisation in the second.
GGTs are the main tool for identifying gaps: any such triplet implies the potential existence of the fourth consonant that will "close the square" by making all consonants in the resulting quadruple participate in two featural minimal pairs each. 5In order to check if a GGT is indeed associated with a gap, I test all other segments in the inventory, and if there is no segment that can be associated with a GGT ("fill the GGT"), a gap has been found.The same gap can of course be associated with several GGTs, so it is necessary to take care to avoid overcounting.
It is possible to manually associate every possible triplet with the corresponding gap-filling consonant, but this is very tedious even with the low number of feature values included in the analysis.Instead, I adopted a three-stage procedure: 1.I enumerated all GGTs for all languages in the sample not filled by segments from the same inventory.2. For all GGTs, I checked if they can be associated with a segment from any other language in the sample.A small minority of GGTs demanded relatively exotic segments missing from all inventories, such as /ʃˤː/.These GGTs were filled by constructing segments by hand.As these gaps are very rare, they eventually did not figure in the analysis.63. Lists of GGTs for all languages were replaced with sets of segments filling these GGTs.
This algorithm corresponds to the following definition of a gap: A consonant inventory has a GAP if there is a segment that can be associated with at least one GGT in this inventory but is not present in it.
According to this definition, for example, the GGT /p t d/ produces a gap if the segment /b/ is not found in the same inventory.I also distinguish between two types of between-manner gaps, direct and inverse gaps.Given the tendency for stops to be found at more places of articulation than fricatives in any given language, and for fricatives to be found at more places of articulation than affricates,7 one would expect to find an asymmetry between stops lacking corresponding fricatives and fricatives lacking corresponding stops, and a similar asymmetry for fricatives and affricates.I call the gaps where the missing segment belongs to a less typologically diverse class direct gaps and the gaps where it belongs to a more diverse class inverse gaps.

Feature parsing
In order to identify GGTs and check if they can be filled, it is necessary to have feature specifications of all segments in an inventory.The largest available segmental dataset, PHOIBLE (Moran and McCloy 2019), provides feature specifications for phonemes.However, the feature system used there, based on the one proposed by Hayes (2008), is not well suited for capturing GGTs.For example, it does not have PLACE and MANNER features but instead provides binary features, such as CORONAL and DELAYED RELEASE.For the present purposes, it is more convenient to use the feature system underlying the tables illustrating the International Phonetic Alphabet and most often utilised in phonological descriptions of inventories, where PLACE, MANNER, and VOT are nominal features with values such as labiodental, stop, or voiceless.However, in principle, other feature sets could be used.
For this study, the IPA Parser script powering the feature-search feature in the EURPhon database (Nikolaev 2018) and published together with it 8 was used.It parses segments in the IPA notation into a structured format, 9 which makes it easy to extract GGTs.The script can parse all consonants in PHOIBLE except for clicks and several segments written in non-standard notation.These segments were excluded from the analysis.

The dataset
The latest version of PHOIBLE (Moran and McCloy 2019) was used for the analysis.PHOIBLE often includes several descriptions (doculects) of the same language.Each doculect has its own unique inventory ID.For each language (identified by its glottocode 10 ), the highest inventory ID was taken.This is an arbitrary choice, but it is assumed that higher IDs correspond to more recent descriptions, which, in my experience, tend to be more phonetically accurate.Only consonant inventories where all segments can be parsed with IPA Parser were selected.The resulting sample consists of 1,694 languages (out of 2,186 different languages).A breakdown in term of macro-areas and language families is given in Tables 1 and 2 respectively.
As might be expected, a "total" sample like the one provided by PHOIBLE is dominated by several large language families.In particular, Atlantic-Congo and Indo-European are well-represented, while Austronesian languages are

Family Count Family Count Family Count
underrepresented, and many phyla are represented by a single language.When discussing different types of gaps below, I address whether there are noticeable connections between gap types and phyla and/or macro-areas.
Another pitfall inevitable in a study of this kind is the reliance on aggregated and normalised data.Many important distinctions and commonalities between segments across inventories are almost surely hidden or distorted due to notation conventions employed by different scholars (e.g., aspirated segments are often recorded as plain voiceless ones if they are contrasted with voiced stops, and stops with a near zero VOT can be treated as voiced, devoiced, or voiceless, etc.).A somewhat optimistic assumption that this analysis, like all large-scale work on phonological typology, is based on is that the noise in the data will lead to increased variance of the results but will not bias them.
4 Within-manner gaps I begin with within-manner gaps, which are operationalised as cases when there are GGTs unfilled with respect to VOT and PLACE, with MANNER fixed to stop, fricative, or affricate.In other words, a within-manner gap is a situation when a stop, fricative, or affricate is missing from in an inventory that has the same stop/fricative/affricate but with different voicing and a pair of voiced and voiceless stops/fricatives/ affricates at some other place of articulation.

Stops
Statistics for within-manner stop gaps occurring 10 and more times in the dataset are given in Table 3.Note that the first and third positions in the ranking are occupied by the velar and uvular voiced stops.This is unsurprising, as it has been A typology of consonant-inventory gaps hypothesised that the oral cavities associated with their articulation are too small to sustain prolonged vocal-fold vibration (cf. a discussion in Ohala 1983, 195-196).Moreover, it seems that gaps are overall much more likely to be found among the voiced part of the stop inventory.
It may be pointed out that the mildly frequent gapping of /d/ and /b/ contradicts the feature-economy/symmetry theories (Clements 2003;Dunbar and Dupoux 2016): there are no evident articulatory or auditory reasons for these segments to be absent, which depresses the feature-utilisation ratio in respective inventories. 11The fact that /d/ is gapped more often than /b/ contradicts the markedness-cline interpretation of Gamkrelidze (1975).
The voiceless bilabial stop /p/ is also known to be a frequent gap (Maddieson 2013a; see also Ohala [1983: 195] and the discussion in Section 8).A strong areal tendency is evident: /p/ is quite often lacking in Atlantic-Congo (N=45) and Afro-Asiatic (N=28), which accounts for its being a gap in 12.4% of African languages, while this gap is comparatively very rare in Eurasia (1.2%) and South America (2.8%).Gapped /p/ is also rather frequent in Papunesia (10.8%), but the sample size there is smaller.Out of 102 North American and 335 Australian languages in the sample, none have /p/ as a gap.
Returning to Table 3, it is notable that /ɟ/ and /c/ are nearly tied.The main reason for gapping here seems to be the tendency of these stops to become affricates.12Other notable stop gaps include murmured voiced stops, the retroflex stop, labialised voiceless stops, and voiceless nasal stops. 13

Fricatives
Statistics for within-manner fricative gaps occurring 10 and more times in the dataset are given in Table 4. 14   The top spot is occupied by /ɦ/, which is a rather rare segment (59 occurrences in the sample), perhaps due to its easy confusability with /h/.It is mostly found in Africa and Southeast Asia. 15The whole top five consists of voiced fricatives, which shows that VOT skewedness in this class is even stronger than with stops.
The most frequent gap in the domain of voiceless fricative is /x/, which is only slightly less frequent than /ɣ/.The situation here seems to be the mirror-image of that of palatal stops described above.A diachronic interpretation suggests itself: while palatal stops tend to become affricates, velar stops tend to lenite to fricatives (Kümmel 2007).13 The inclusion of murmured stops in this table follows from the traditional interpretation of murmur as aspiration.As there are no breathy-voiced voiceless consonants and true voiced aspirates are extremely rare, they will not be able to form within-manner GGTs on the alternative interpretation.Nasality is considered a privative feature, and nasal and oral stops do not participate in the same GGTs in this study.
14 There is a strong argument in favour of treating /ʕ/ as an approximant (Laufer 1996), in which case it should not belong in this table.I include it in the analysis following the traditional IPA practice of placing it into the fricative row.It is also highly variable between and across languages, possible realisations including an aryepiglotto-epiglottal stop, a trill, a fricative, and a tap.Another caveat is that the <x> glyph is often used in descriptions for the voiceless uvular fricative /χ/.15 Cf. the map in PHOIBLE: https://phoible.org/parameters/010B4A2C6472D9909A0AE8C30-FA45593\#2/16.4/158.3.
A typology of consonant-inventory gaps

Affricates
Within-manner affricate gaps are infrequent: only /dz/ (N=55) is gapped more than 10 times, /ʈʂ/ 5 times (if one includes cases where it corresponds to /ɖɽ/), and /dʒ/ and /bv/ 4 times each.The case of /ɖɽ/∼/ʈʂ/ is special, and it is otherwise evident that it is nearly always voiced affricates that are missing./dz/ is the second most frequent voiced affricate in the sample (N=211), and the corresponding gap is created when the inventory has only the much more frequent /dʒ/ (N=539) together with some other affricate pair.The fact that the absolute gap frequency for /dz/ is comparable to the one for /d/ (72) indicates that languages are much more likely to have a gapped coronal affricate inventory than a gapped coronal stop inventory.Affricate inventories tend to be small compared to stop inventories (the median number of affricates in an inventory in the sample is two while the median number of stops is 12), so ceteris paribus the probability of affricate gapping should also be much lower.

Between-manner gaps
Between-manner gaps-which is operationalised as a situation when an inventory lacks a voiced or voiceless stop/fricative/affricate at some place of articulation where it has a segment of another manner with the same voicing, while it has a full pair at some other place of articulation-stem from two different sources.On one hand, there is a tendency (see fn. 7) for fricatives to be found at fewer places of articulation than stops; there is a similar disparity between affricates and fricatives.These tendencies are likely to be reflected in direct gaps in fricatives and affricates (i.e. when fricatives corresponding to stops and affricates corresponding to fricatives are missing).
On the other hand, there is a mismatch between the number of places of articulation available for stops and fricatives: labio-dental, interdental, and pharyngeal stops are not found in inventories and some postalveolar stops are extremely rare. 16As a consequence, some inverse between-manner gaps for stops (i.e.stops missing corresponding fricatives) are trivial.However, inverse gaps for stops that usually do have corresponding fricatives are no less intriguing than within-manner stop gaps, because in "well-behaved inventories" striving to maximise the ratio of distinctive features over segments, one would not expect to see them at all.E.g., an inventory with /s/ lacking /t∼t̪ / or an inventory with /x/ lacking /k/ is highly unexpected.
A special case, at least in terms of frequency, is presented by /ʔ/, which, unlike the corresponding fricative, does not participate in a VOT/voicing opposition and therefore cannot be a within-manner gap.It is an extremely frequent inverse between-manner gap, found in 541 languages: it is triggered whenever a language has /h/ but not the glottal stop and some other consonant with a corresponding stop.However, it must be taken into account that in many cases, e.g. in some varieties of German, /ʔ/ is present in a language as a "sub-phonemic" segment ensuring that there are no zero-onset syllables.

Direct gaps
Some of the fricatives can only be within-manner gaps because they do not have a corresponding stop.Such fricative gaps appearing five or more times in the sample are show in Table 5.
Conversely, there are fricatives that are found only as between-manner gaps (Table 6), but these are rather rare.
Statistics for common (20+ occurrences) between-manner gaps that are also found as within-manner gaps are given in Table 7.An important result is that /x/, /f/, and /ɣ/ are found much more frequently as between-manner gaps than as Turning to less frequent gaps, a still significant disparity is found for /ʐ/ (70 vs. 26) and /χ/ (69 vs. 17).This can be contrasted with /s/, a mid-frequency betweenmanner gap, which is found as within-manner gap only five times.It is notable also that /z/ is more often found as within-manner gap ( 156) than as between-manner gap (65).
Overall, within-manner gaps underline the huge disparity in the frequency of gaps in voiceless vs. voiced fricatives.Between-manner gaps, on the other hand, indicate that there are disadvantaged places of articulation: languages statistically tend to acquire voiced versions of coronal fricatives before they acquire voiceless labiodental or velar ones.

Inverse gaps
In contravention of the feature-economy principle, which demands that there are no unused possible combinations of VOT and PLACE features, inverse gaps occur with most stop types, with only special groups of stops found only as within-manner gaps (Table 8).
At the same time, one can see that inverse between-manner gaps that are not also within-manner gaps in the same languages are never really frequent (Table 9).Moreover, they fall into two distinct groups: /p/ and, to a smaller extent, /ɢ/ are often found both as within-manner and between-manner gaps in the same languages, while other stops are exclusively or predominantly found either as within-manner or between-manner gaps.This points to the existence of rather profound differences in how subinventories are structured in different languages.As an illustration, let's consider the case of the voiceless retroflex stop /ʈ/.It is a rather common between-manner gap in inventories with the retroflex fricative /ʂ/ but without retroflex stops.Such inventories are characteristic of, e.g., Uralic languages and Sino-Tibetan languages of the Circum-Tibetan area (cf. the Uralic language Erzya, erzy1239, whose stop + fricative inventory consists of /b p t t̪ d̪ k ɡ v f s̪ z̪ ʂ ʐ x/, and the Sino-Tibetan language Rgyalthang Tibetan 17 with /p pʰ b t tʰ d k kʰ g ʔ s z ʂ ʐ ɕ ʑ h/, data from EURPhon [Nikolaev 2018]).At the same time, it is a mildly common within-manner gap in Africa, where languages often have a single retroflex segment, /ɖ/.This distribution of gaps shows that the "retroflex component" of сonsonant inventories has at least three distinct variants: (1) a group of one or two fricatives; (2) a single stop; (3) a full-blown subsystem with stops, fricatives, and potentially affricates. 18

Direct gaps
Affricates can potentially be formed at all pre-pharyngeal places of articulation utilised by fricatives, but in terms of frequency, languages favour coronal Table : Major inverse between-manner stop gaps also found as within-manner gaps.

Phoneme
Within-manner Between-manner Both 17 Glottolog does not list this variety but has a code for the dialect group, Khams Tibetan, kham1282, to which it belongs.
18 See Nikolaev and Grossman (2018) for more details about the latter type.
A typology of consonant-inventory gaps affricates.

Inverse gaps
Inverse between-manner fricative gaps found five or more times are shown in Table 11.The presence of both voiceless and voiced retroflex fricatives in the top three indicates that the connection between retroflex fricatives and affricates is not particularly strong.Overall, however, these gaps are clearly marginal.

Gaps and inventory sizes
One question that arises when looking at data on gaps is whether there is a limiting ratio of gaps to the number of segments in an inventory that languages converge to.
In other words, it is natural to assume from basic probability and combinatorics that large inventories will on average have more gaps than smaller ones, but how skewed can large inventories become?Interestingly, it seems that there is a limit to the skewedness and that it is in fact small inventories that are the most skewed in relative terms.Figure 1A shows the dependence of the number of within-manner stop and fricative gaps (the contribution of affricate gaps being marginal) on the inventory size in absolute counts, and Figure 1B shows the relationship between the inventory size and the ratio of the gap count to the inventory size.While the absolute number of gaps grows in a linear fashion with the inventory size (the maximum value of 13 is attained in Tashlhiyt Berber, tach1250), the ratio of the number of gaps to the inventory size quickly stabilises to ≈ 0.08 (IQR = 0.07).In other words, large inventories are not more skewed than small ones, and the processes of inventory growth that are detrimental to feature economy (Nikolaev and Grossman 2020) are at least partly balanced out by different "gap-filling" processes.

Inventory gaps and macro-areas
In this section, I provide statistics regarding the most frequent within-manner stop and fricative gaps in different macro-areas.As will be shown below in Section 9, these data are relevant for an understanding of segment borrowing processes.

Fricatives
The five most common within-manner fricative gaps for each of the macro-areas (except Australia, which, again, has few fricatives and hence few within-manner fricative gaps) are given in Table 13.An even higher degree of uniformity can be seen than with stop gaps, with /ɦ ʒ z v x ɣ f/ being the most frequent gaps across macro-areas.The prevalence of this set of fricative gaps constitutes a strong statistical universal in the sense advocated by Dryer (1989).The relationship between inventory gaps and processes of sound change is rather complex.On one hand, gaps may arise due to sound changes (the loss of /p/ in proto-Celtic being a classic case [McCone 1996]).On the other hand, it has been often argued that processes of sound change can be shaped by structural asymmetries in inventories, thereby performing a "gap-filling" function (Boersma 1998;Salmons and Honeybone 2014).Testing the latter hypothesis is outside the scope of this paper as it entails looking for gaps not in contemporary but in historical and reconstructed inventories and checking whether they have since been filled.
Testing if there are connections between present-day gaps and sound changes is more straightforward, as one only needs access to a sample of sound change processes.
Considerable progress has been made in this area through the work by Kümmel (2007) and the participants of the UniDia project. 19A combined dataset of 13,095 sound change processes has been compiled by me based on the data from these sources, augmented with additional data on Sino-Tibetan languages.The sound changes in these data-set are of three types: (1) deletion of single segments; (2) emergence of single segments due to epenthesis; (3) single-segment to singlesegment change (such as k > tʃ / _{i,j}).The sample is heavily skewed towards African and Eurasian languages with some data from South America, so the results should be interpreted with a grain of salt.
Using these data, one can test if a particular segment is more likely to be a source of a sound change (and thus disappear from a particular type of context) or a reflex of a sound change; again, this gives only a rough approximation of the processes that create gaps, since in most cases sound changes affect only some of the contexts where a segment is found.I computed reflex/source odds for all segments encountered more than 10 times in the data.
The results are very different for stops and fricatives.Consonants corresponding to major within-manner stop gaps are indeed more likely to undergo a change than to be a reflex of a change: the odds for /ɡ p d b ɟ c k/ 20 are all smaller than 0.5 (there are no data on /ɢ/).As for /ʈ/, it is only encountered as a reflex of sound changes, and it seems likely that African language having only /ɖ/ did not lose /ʈ/ due to rare sound changes but never acquired it in the first place.to co-opt new articulatory regions (e.g., labio-dental fricatives or affricates of any kind).

Conclusion
The main aim of this paper has been to draw attention to the fact that it is possible, and quite promising, to systematically study inventory gaps in the world's languages.My approach, in addition to providing new descriptive findings, has shown that theories of consonant-inventory structure advanced in the literature demonstrate only partial agreement with the data.Thus, Gamrkelidze's markedness cline (1973; 1975), while correctly pointing to /p/, /ɡ/, and /ɢ/ as most frequent gaps found at the opposite side of the front-back axis, fails to predict that /d/ will be a more frequent gap than /b/.It can be also noted that /p/ is not a frequent gap in Eurasia and the Americas, which makes Gamkrlidze's claim universally valid only for voiced stops.Clements's featureeconomy Principle (2003) fails to predict a noticeable number of inverse between-manner gaps such as /p/, /t/, and /q/ (i.e.cases when there are fricatives with place of articulation and voicing corresponding to those of missing stops) showing that even articulatory and auditory advantageous combinations of feature values can be underutilised.Moreover, the statistical analysis of the degree of skewedness of inventories presented in Section 6 indicates that there is a degree of asymmetry that inventories are ready to tolerate and that this degree is higher for smaller inventories.The latter finding is in line with the conclusions of Nikolaev and Grossman (2020), who noted that smaller inventories, mostly consisting of the members of what they call the Basic Consonant Inventory and the First Extension, are highly likely to violate the feature-economy Principle.The results presented above are based on a formal algorithmic definition of an inventory gap, which makes it possible to automatically extract these gaps from existing segmental databases.However, this operationalisation of the notion of a gap is not the only one possible, tied as it is to the conservative IPA feature set, and there is more theoretical work to be done in this area.More data are also needed on phonological inventories (especially from the Pacific region and North America) and more data on historical and reliably reconstructed sound change processes in order to test in a more rigorous way hypotheses about the role that inventory gaps play in the history of phonological systems.Finally, a whole class of segmentsclicks-is not covered by this method because the extant IPA parsers do not handle it and due to a lack of primary data.This is another important avenue for future work.

Data-availability statement
The dataset and the code can be downloaded from a GitHib repository: https:// github.com/lingtypgapsubmission/consonant-gaps

Figure 1 :8
Figure 1: (A) Gap counts by inventory size.(B) Ratios of gap counts to inventory sizes by inventory size.Grey areas indicate 95% confidence intervals of a loess (local polynomial regression) approximation.

Table  :
Macro-area breakdown of the sample.

Table  :
Family breakdown of the sample.

Table  :
Within-manner stop gaps.

Table  :
Within-manner fricative gaps without stops at the same place of articulation.

Table  :
Between-manner fricative gaps also found as within-manner gaps.

Table  :
Within-manner stop gaps not found as inverse between-manner gaps.
Table 10, which includes direct between-manner affricate gaps, demonstrates how unlikely languages with /f/ and /x/ are to have /pf/ and /kx/ (or /bv/ and /gɣ/ if they have any voiced affricates).The presence of the rather frequent affricate /ts/ in the top five is explained by the fact that nearly exactly half of languages with voiceless affricates (563 out of 1,122) have only one, and more than half (398) of the latter have /t̠ ʃ/.The distribution is even more skewed in the domain of voiced affricates: out of 471 languages with a single voiced affricate, 338 have /d̠ ʒ/.

Table  :
Most commonly borrowed consonants in different macro-areas.