Characterization of the firm–firm public procurement co-bidding network from the State of Ceará (Brazil) municipalities

Fraud in public funding can have deleterious consequences for societies’ economic, social, and political well-being. Fraudulent activity associated with public procurement contracts accounts for losses of billions of euros every year. Thus, it is of utmost relevance to explore analytical frameworks that can help public authorities identify agents that are more susceptible to irregular activities. Here, we use standard network science methods to study the co-bidding relationships between firms that participate in public tenders issued by the 184 municipalities of the State of Ceará (Brazil) between 2015 and 2019. We identify 22 groups/communities of firms with similar patterns of procurement activity, defined by their geographic and activity scopes. The profiling of the communities allows us to highlight organizations that are more susceptible to market manipulation and irregular activities. Our work reinforces the potential application of network analysis in policy to unfold the complex nature of relationships between market agents in a scenario of scarce data.

in the public procurement process (Fazekas et al. 2018). Te increasing availability of open data concerning public administration activities (Curado et al. 2020) has recently renewed the scientific community's efforts to uncover hidden connections between participating agents and how their relationships can link to fraudulent activities (Herrera 2019;Kertész and Wachs 2021).
One of the most challenging aspects of identifying corruption in public procurement contracts is the lack of labeled data. Indeed, it is largely impossible to know which instances stem from corruption . However, a fundamental principle in public procurement is that of transparency in bidding (Adjei-Bamfo et al. 2019;Angulo Garzaro 2018;Nowrousian 2019;Spagnolo 2012), and efficiency can be obtained through independent and open competition between firms. In that sense, past works have approached this problem from an unsupervised learning perspective, meaning that they look to extract information about the relationships between the involved parties and, thus, flag groups of agents with patterns that might be linked to a high risk of corrupting activities. Indeed, firms can achieve leverage to manipulate the tender process by establishing the right relationships among themselves and coordinate their activity (Hanák and Serrat 2018).
An open issue remains, can communities of firms obtained from co-bidding patterns allow us to highlight groups that are more susceptible to collusion and market manipulation? The use of network analysis for the study of corruption is not new (Lauchs et al. 2011;Chang 2018;Grassi et al. 2019). In the context of public procurement, past studies can be divided into two main groups: (1) works that explore bipartite relationships between public bodies and firms (Fazekas and Tóth 2016;; and (2) studies that explore firm-firm co-bidding relationships in public tenders (Toth et al. 2014;Reeves-Latour and Morselli 2017;Morselli and Ouellet 2018;Wachs et al. 2020). Both approaches have their merits, and each is suitable to identify different mechanics underlying the manipulation of the procurement process. For instance, bipartite relationships are suitable for identifying fraud stemming from bribes and influence ties, while firm-firm relationships are more suited to identifying cartels and collusion. Despite these, the use of network analysis to study the relationship between firms in procurement bids is a relatively new venture (Reeves-Latour and Morselli 2017). More evidence is required to understand the universality of the existing patterns and mechanics across cultural and socio-economic contexts.
Here, we use network science and complexity sciences methods to map and characterize the co-bidding network Piccolo et al. 2018;Ramalho et al. 2020;Reeves-Latour and Morselli 2017) between firms that participated in public tenders issued by the 184 municipalities of the state of Ceará (Brazil). In that sense, we characterize of the relationships between competing firms and identify the major communities of firms that often compete for tenders with a similar scope. Moreover, we argue that some such communities have characteristics that place them at a higher risk of market manipulation and irregular activities often associated with corruption.

Data
We used data from the State of Ceará Audit court authority-Tribunal de Contas do Estado Ceará (Brazil) covering public tenders issued by the 184 municipalities of the State of Ceará between 2015 to 2019. Each observation informs about a firm's bid to a tender and whether the bid was one of the winning bids. It also includes information about the municipality that issued the tender, and whether a firm won a contract. Hence, the data is naturally represented through a bipartite nature (Fierăscu 2017), which connects firms to tenders (see Fig. 1a). The data set contains 196,608 observations that account for the bids of 45,502 firms to 84,835 tenders.
Information about the firms and tenders is anonymized, and bidding values are not available. Moreover, the data set does not contain information about which contracts/ firms have been investigated for irregularities in the past.

Network inference
Since we are interested in studying the relationships between firms we focus on the Firm-Firm projection. We estimated the projection from the co-bidding patterns of firms (Piccolo et al. 2018) using the Jaccard similarity coefficient (Veech 2013;Mainali et al. 2017;Chung et al. 2019;. Figure 1a shows a graphical illustration of the data structure and depicts the steps conducted to infer the Firm-Firm network from the original Tender-Firm bipartite structure. In order to infer the Firm-Firm co-bidding network, we started by discarding all firms that did not bid at least once during each year under analysis. By doing so, we were able to extract the core of active firms, while removing firms with sporadic activity. Figure 1b,   Fig. 1 Panel a, graphical representation of the process employed to infer the Firm-Firm co-bidding network. Panel b, comparison between the frequency of bidders per tender in the original data set (gray) and in the working data set (red) after filters have been applied. Panel c, comparison between the frequency of bids per firm in the original data set (gray) and in the working data set (red) after filters have been applied. In panels (b) and (c) dashed line represents the OLS regression lines, the domain of the line indicates the domain used for fitting the curve c compare the original ( P ALL ) with the filtered data set ( P Sample ). In particular, it shows that filtering removes excess participants from tenders while not affecting the distribution of the number of bids done by each firm. Likewise, we refer to firms present in the firm-firm co-bidding network as Established firms. The final working data set includes 1906 firms, which account for 72,078 bids to 39,523 tenders.
Hence, next, we compute the centered Jaccard/Tanimoto coefficient (Chung et al. 2019) between each pair of firms. The centered Jaccard coefficient measures the similarity between the bidding of two firms, accounting for the occurrence probabilities of each firm. Formally, it can be computed as: where b it is one if firm i made a bid to tender t, being zero otherwise; and p i is the fraction of tenders in which firm i participated ( p i = k b it ). The second term in Eq. (1) provides the expected number of observations when the bids from both firms are independent and identically distributed through a Bernoulli process (Chung et al. 2019). Hence, the centered Jaccard coefficient allowed us to distinguish between positive and negative associations between firms, accounting for their individual level of activity.
Finally, we estimate the significance of the observed J c ij , to test the hypothesis that J c ij > 0 . To that end, we bootstrapped a null distribution ( Ĵ ij ) of centered Jaccard coefficient for each pair of firms by generating an ensemble of 1000 randomizations of the initial bipartite network. Data was randomized in order to ensure that the number of bids observed per firm and per year remained constant while preserving the number of firms bidding to each tender. Then, we estimate the one-tailed p-value associated with J c ij by calculating the upper tail probability of obtaining a value equal or greater than J c ij from the cumulative frequency of the null-distribution Ĵ ij (Gotelli 2000). Links with a p-value greater than 0.05 were discarded.
The resulting firm-firm co-bidding network contains 1529 nodes and 12,892 edges. Relationships are treated as undirected and unweighted. The network exhibits an average degree of 16.86, with a cluster coefficient of 0.52 (Newman et al. 2020), and 56 connected components. Figure 2 shows the Degree Distribution ( Fig. 2a) decays exponentially with the degree, which suggests that the underlying mechanics of co-bidding can be approximated by a random attachment process (Albert and Barabási 2002). Alternative distribution (e.g., power-law) has been tested but shown a worst likelihood given data. However, the average clustering coefficient shows a power-law inverse relationship with the degree (Fig. 2b), suggesting the existence of some level of hierarchy in the structure of the network. It is noteworthy to mention that the largest connected component contains 1141 nodes, 10,630 edges, and a clustering coefficient of 0.43. Figure 3 presents the giant component of the firm-firm co-bidding network. Using the Louvain algorithm (Blondel et al. 2008) we identified 22 communities with a modularity of 0.66. We refer to these communities as C 1 , C 2 , . . . , C 22 , and they are indexed in descending order in respect to their size. For readability, we have colored the eight largest communities in Fig. 3. However, the high modularity of the network is unsurprisingly

Results and discussion
and can be explained by the fact that the network primarily represents competing firms that are specialized in supplying different services-works, goods, services, etc-in different regions. Hence, the most prominent communities divide the network into two major groups of firms that operate mainly in the northern (Red, Blue, and Green) and southern (Purple and Yellow) regions of the state of Ceará, but also on contracts that supply Food services (Blue and Purple) or construction works (Red and Yellow). Interestingly, the remaining communities operate at a state-wide level (Pink and light Blue), and there is one particular community (Violet) that operates exclusively in the mesoregion of Jaguaribe only supplies Food services. In some cases, firms form densely connected sub-graphs (e.g., C 14 ). These structures can be a first indicator to flag groups of firms that present a high risk of collusion and procurement manipulation. As such, next, we explore possible additional metrics to classify each community of firms through their activity in order to further narrow which groups of firms might deserve a more profound investigation by audit officials.

Activities diversity
We started by looking at the regional diversity on which firms performed their activities (e.g., bid on tenders in order to supply services) and the diversity of the type of contracts for which they bid. While a firm with low diversity in both regional reach and contracttype can simply indicate a firm that is narrow in both scope and domain; the existence of groups of connected firms (i.e., a community of firms) that share a low diversity in both dimensions can highlight a more troublesome scenario. In particular, it can indicate the conditions for firms to coordinate and cooperate to control a specific market and regional context, and should be investigated with further discernment.
To that end, we estimate the Simpson's diversity index 1 for each community. The Simpson's diversity index ( ) measures the probability that two randomly sampled elements from a set share a given characteristic in common. In that sense, = 0 is associated with the highest diversity possible, and = 1 with the lowest diversity. Formally we estimate the Simpson's index for each community as where p t C i corresponds to the fraction of bids done in a procurement contract type γ :{Consumables Health, Services, Construction, Events, Food, Fuel,…} or mesoregion γ :{Metropolinana, Norte, Sul, Noroeste,…} by the firms in community C i , ∀i ∈ {1, 2, . . . , 21, 22} . The quantity p t C i is normalized per community, so that t p t C i = 1.0 . We estimated C i cat independently for each community ( C i ), and for contracts according to the region that issue the tender and the tender contract type (e.g., services, food, tenancy, construction, etc). Our choice of the Simpson's index over other alternatives (e.g., entropy) is due to its straightforward interpretation in our context: the probability that two bids made by firms within the same community share the same characteristic (e.g., region or contract type). Figure 4a illustrates the empirical distributions ( p t C i ) of procurement activity for the ten most prominent communities. We show the results for both the Regional distribution of activities and by Contract Type. Blue colors denote a low relative frequency of bids, while red identifies a high frequency. These indicators allowed us to infer the degree of specialization and agglomeration of a community. In particular, we found that Community 8 ( C 8 ) activities are agglomerated in a single region (Jaguaribe) and firms (2) In some fields the Simpson's index is also known as the Herfindahl index.
specialize in one type of contract (Food). The same conclusion can also be inferred from the high levels of C 8 cat , which means that Community 8 has low diversity of activity distribution. Figure 4b compares all the 22 communities in terms of the two diversity indicators defined above. We find a clustering of communities in the bottom left quadrant-a low level of agglomeration and specialization-that we associate with healthy markets composed of firms that, on average, have a diversified portfolio of activities and regional distribution. In contrast, in the top right quadrant, we found communities that relied on procurement contracts of a single type and agglomerated in a small number of regions.
The combination of these two diversity indicators, at the community level, provides a powerful feature to identify groups of firms that can dominate over a niche market or, in the worst case, develop undesirable leverage, as a group, in negotiating procurement contracts. Hence, lowering the desirable efficiency that public procurement aims at achieving in the tendering process. However, it is important to stress that these metrics are just indicative of potential problems, and thus the true nature of the activities of the firms in each community should be carefully investigated by the corresponding local authorities.

Bidding coordination
To further investigate the risk/susceptibility of market manipulation by firms, we next looked at the propensity that each community has in participating in "single bidder" contracts. Another pattern often associated with corruption and loss of efficiency. Hence, what is the susceptibility of each community to such practice? To answer this question, we started by investigating the average number of times, per community, that a firm is the single bidder of a tender. Figure 5a shows the results for all 22 communities in the most significant component of the Firm-Firm network. Traditionally, a high Fig. 4 Characterization of the ten largest communities by the diversity of bids done by region and type. Panel a shows the distribution of bids within each community by mesoregion (left) and contract/tender type (right). For each community we compute the Simpson's diversity index ( reg and serv ). The full and official names of the mesoregions are: Jaguaribe; Noroeste Cearense (Noroeste); Metropolitana de Fortaleza (Metropolitana); Sul Cearense (Sul); Norte Cearense (Norte); Centro-Sul Cearense (Centrol Sul); and Sertoes Cearenses (Sertão). We use simplified references to these names for visualization purposes. Panel b compares communities by their diversity of contracts in terms of regional span and type. Note that in panel a we only show results for the ten largest communities, which are representative of the results. Communities not identified by a color code in Fig. 2 are shown in gray in panel b, in the particular community C 14 corresponds to the gray clique easily identifiable in the bottom left of the network in Fig. 3 level of single bids can be an indicator of firms acting with some level of informal advantage in the tendering process or due to lack of competition in a specific market. At the community level, such an indicator can be indicative of unusual activity from a group of firms. Hence, low levels of single bidding indicate the risk of coordination (e.g., firms participating coherently in the same contracts) while high levels can sign the prevalence of less competitive markets or informal advantage in the tendering process. Overall, of the largest ten communities, only Community 8 exhibits low levels of single bidders, a pattern that extends to Communities 14 and 21 as well. In contrast, we saw that community 12 strongly deviates from the baseline with an average value of single bidding that is roughly four times that of a typical firm.
In addition, we looked at the average number of bidders per tender in order to assess the potential existence of coherent behavior, that is, coordination between the firms in a community. To that end, we estimated the average number of bidders per tender for each community, which we normalized by the size of the community (i.e., the number of firms in a community). Interestingly, Fig. 5b shows that in Community 8, firms tend to participate in tenders with several firms that match almost exactly the community's size. While, in some cases-Communities 14 and 19-firms tend to bid to tenders that are several times larger than their communities. Noteworthy to mention that this analysis is biased by the size of the communities, so the expectation would be to see a smoothly increasing relationship, with the largest community achieving the smallest value, and in the limiting case of a community with a single firm we would obtain the maximum. However, it is clear that in some cases-Communities 8, 14, and 19-there are apparent deviations.

Conclusions
In this manuscript, we explored the potential of mining a large data set of public tenders collected from firms' activity to compete for procurement contracts issued by the municipalities of the State of Ceará (Brazil). By matching firms with similar bidding patterns, we have inferred a firm-firm network comprising a total of 1141 nodes and 10,630 edges. We showed that we were able to identify communities of firms with similar bidding patterns. The network exhibits a high modular structure partitioned in 22 communities. We compared the values of each company with the average of the entire population of firms (horizontal red line). Panel b shows the average number of bidders in tenders in which firms within a community typically participate. We normalized the value obtained for each community by the number of firms in that community. The horizontal red line shows the threshold that marks the size of the community These communities cluster firms that have a similar scope in procurement activities both in the nature of the contracts they celebrated and in the regional reach of their activities. Moreover, we looked at two diversity indicators-regional diversity and procurement contract nature diversity-as a sign of the potential of certain communities to develop leverage over the procurement process. In other words, in affecting the expected efficiency of the market. Finally, we looked at the sizes of the tenders, first by looking at the abundance of single bidders in communities, and secondly by looking at the average number of bidders in each tender. Overall we identified a particular community (Community 8) that combines several undesirable properties. Community 8 involves a group of firms that offers Food services in the region of Jaguaribe. They have an unusually low number of single bids; the average number of participating firms per tender matches the number of firms in the community, and they e xhibit a high specialization and agglomeration in their activities. Nevertheless, having said that, such an odd combination of characteristics can be useful to narrow down the activities of audit officials, but it does not allow us to conclude much about the true nature of the activities of firms involved in Community 8.
Finally, it is essential to highlight some shortcomings in our analysis and future working directions. The lack of pre-labeled data on past corruption cases significantly limits our ability to make any causal link between the network structure, its motifs, and the location of firms in the network with irregular procurement behavior. In that sense, our results are merely exploratory and show the potential of combining network science methods with descriptive statistics to highlight relevant groups of firms according to their activity pattern in a data-scarce environment. Future works should look at the evolution of the network, that is, if a larger temporal window is available, to capture the evolution and segregation of communities of interest but also of their parametric path in terms of the diversity of their activities.