Practices of public procurement and the risk of corrupt behavior before and after the government transition in México

Corruption has a significant impact on economic growth, democracy, and inequality. It has sever consequences at the human level. Public procurement, where public resources are used to purchase goods or services from the private sector, are particularly susceptible to corrupt practices. However, government turnover may bring significant changes in the way public contracting is done, and thus, in the levels and types of corruption involved in public procurement. In this respect, México lived a historical government transition in 2018, with the new government promising a crackdown on corruption. In this work, we analyze data from more than 1.5 million contracts corresponding from 2013 to 2020, to study to what extent this change of government affected the characteristics of public contracting, and we try to determine whether these changes affect how corruption takes place. To do this, we propose a statistical framework to compare the characteristics of the contracting practices within each administration, separating the contracts in different classes depending on whether or not they were made with companies that have now been identified as being involved in corrupt practices. We find that while the amount of resources spent with companies that turned out to be corrupt has decreased substantially, many of the patterns followed to contract these companies were maintained, and some of those in which changes did occur, are suggestive of a larger risk of corruption.


Introduction
Because of its impact on economic growth [1,2], democracy [3], and inequality [4], one of the biggest challenges that a government has to deal with is corruption. Transparency International defines corruption as the abuse of public power for private benefit [5], i.e. it assumes that corruption involves the participation of public officials. Considering this definition, a niche in which corruption can arise naturally is in public procurement; where public and private sectors interact through contracting, mostly to purchase goods or services. Corruption at the level of public-private contracting has high costs in many areas. For example, if buyers favor some suppliers over others through corrupt decisions, bribes or patronage (clientelism), then market competition is affected [6,7]. This lack of competition leads to a misallocation of resources, affecting areas such as budget composition [1], military and technological spending [4], social care [7], and may even change the market structure and dynamics [8,9]. However, owing to the complexity of the contracts, the large sums of money involved, the number of participants, as well as the inherent complicity of public officials, this kind of corruption is difficult to identify, track, and prevent [10][11][12].
A government turnover, i.e. a change in the individuals and/or parties in power, due to elections or otherwise, may bring significant changes in the way public contracting is carried out, and thus, in the types of corrupt behavior that may occur in the context of public procurement [13]. Broms et al. propose that frequent elections with uncertain outcomes may compel corrupt elites to pursue predatory strategies; however, if there is a well-established party system, regular electoral uncertainty may motivate corrupt elites to exercise restraint [14]. In this context, Fazekas et al. found that fair electoral contest and heterogeneous power-sharing may have the potential to mitigate corrupt market distortions, even in systematically corrupt places [9]. On the other hand, while it seems to be true that government turnovers can diminish corruption in public procurement [9,14], there is also evidence that a change in government can maintain corrupt behaviors, only changing the favored suppliers [15]. An example of this was found by the Mexican Institute for Competitiveness (IMCO by its Spanish acronym, [16]), which analyzed public procurement data from México, finding that in the change of government that took place between 2012 and 2013, the favored suppliers also changed, but not the amount of resources and contracts given to the new suppliers. This phenomena was called "the compadres' change" [17].
For years México has occupied low levels in the score of corruption perception index given by Transparency International [18,19]. Mexican citizens consider that corruption is one of the biggest problems in the country, only behind violence and insecurity [19]. For years, government and corporate structures have been created, maintained, and adapted to obtain private benefits from public resources, which has carried huge consequences to the economic growth and human well-being in the country [20].
In 2018, México lived the largest electoral contest of its history, with more than 56.6 million voters, representing 62.62% of citizen participation [21]. In this election, a "leftist" candidate won the presidency for the first time. He was the most voted winner in Mexico's history, obtaining 30.1 million votes (representing 53.1% of the voters), and his party won 388 congress positions of the 554 available (70% of the seats) [21]. This event not only marked a government turnover, but an ideological and structural change in the government's goals and methods [20,22,23]. In this paper, we study how this government change affected the characteristics of public contracting, and the extent to which these changes affect the forms in which corruption takes place. To do so, we analyzed data from more than 1.5 million contracts between many agencies that conform the Mexican government and private suppliers, corresponding to the period 2013-2020, which includes the first two years of the new government (2019-2020) [24].
Many states have taken advantage of the new technology era to record their administrative activities. This data, which can be used to analyze the projects undertaken by an administration, their success and shortfalls, mainly describes the state's procurement practices [25]. This large amount of administrative data collected by governments allows studying corruption from a new data driven perspective [26]. A common approach to quantify corruption is by building risk factors from contract data [8,9,15,[27][28][29][30][31][32]. For example, single-bidder contracts have been shown to be effective in identifying and predicting corruption risk [8,[32][33][34] in various contexts; including studies on the relationship between corruption and political incumbency [33], and the effect of campaign contributions on corruption [34]. Other approaches include the application of network theory to measure the impact of single-bidder contracts on local corruption [8]. Network theory was also used to identify corruption risk distribution among countries. For those studies, the actors (buyers and suppliers) were represented by network nodes; these nodes were linked if they entered into the same public procurement contract, and the weight of the link was the fraction of single-bidder contracts between each pair [32].
Here we propose a somewhat different approach, taking advantage of specific public data that lists suppliers that were investigated and have been identified as having incurred in corrupt practices. Since 2013 México's government collects a list of companies that have been caught providing invoices for simulated operations (or EFOS for its spanish acronym "Empresas que Facturan Operaciones Simuladas") [35]. These companies sell fake receipts to buyers who use them to avoid taxes or to cover acts of embezzlement, among other things. There is also, since 2013, a list of all the suppliers that have been caught doing one or several types of corrupt activities when participating in a public contract. For example, presenting fake documentation in order to win a contract, overcharging their services, breaching contract, or diverting resources. These companies are labeled as "sanctioned suppliers and contractors" (or PCS by their Spanish acronym "Proveedores y Contratistas It should be noted that to appear in these lists, these contractors were subject to a legal investigation, and some of the contractors are appealing the decision. Thus, these lists may change slightly as time goes by as a result of companies winning their appeals, which removes them from the lists, as well as due to conclusions of long lasting investigations, which may add new companies. Nevertheless, from the available records, we identified the contracts in which companies, which have been suspect of corrupt activities, participated. We use this data to achieve the following goals: i) Our first goal is to develop a statistical framework to analyze whether there were changes in public contracting practices as a consequence of the government turnover in México, both in those contracts suspect of corruption (those in which EFOS or PCS companies participated) and those in which companies free of official corruption charges participated. To achieve this goal, each contract has to be described by a set of quantitative variables based on the available data.
ii) The second goal is to use the data to build risk factors following the framework proposed by Fazekas' group, and to test whether these indicators successfully describe and identify those contracts in which companies suspect of corrupt activities participated. We then use these indicators to determine whether the change in the procurement practices due to the government turnover brought a lower or higher risk of corruption, where risk of corruption, in this work, is to be understood as the fraction of potentially corrupt contracts.
The results in this paper are based solely on the study of the data, trying to avoid any political bias, and without use of any prior knowledge about the contracts' participants.

Data
The list of public contracts from 2013 to 2020 was taken from the electronic Mexican system of public governmental information on public procurement CompraNet [24]. This system is operated by an administrative unit designed by the Mexican federal agency for budget and public debt (SHCP for "Secretaría de Hacienda y Crédito Público"). The registry of contracts in this electronic system is mandatory for all those that operate with resources from the federal budget. 1 The contracts on this list have a specific set of variables or entries that describe them. The particular entries we use in this work are shown in Table 1.
The original data lists consisted in 1.6 M of contracts. These records were curated to standardize the information available in each of them. We homogenized all the string variables to avoid issues with special characters or spacing, 2 we also deleted all the entries in which an important variable was omitted (for example, the buyer name, the amount spent, the type of procedure, etc.). The contracts with this kind of problems were approximately 60 K (representing ∼4% of the data). Thus, from the 1.6 M contracts available in the original data lists, we consider 1.54 M. As mentioned above, from these 1.54 M of records, we took only the variables shown in Table 1. The original set includes other variables that are not of interest for this work, such as the name of the buyers' legal representative, or the suppliers' webpage. Hereinafter, we will refer to this curated list of 1.54 M of contracts as the source list.
The list of companies identified as EFOS is available on the site of the Mexican tax agency (SAT for "Secretaría de Administración Tributaria") [35], 3 and the list of PCS contractors is available on the official Mexican open data site [36]. The only variable of interest in these lists is the supplier's name, which we use to identify the contracts in which these companies participated.
All data sets mentioned in this section, and other tools to reproduce the results shown here are available on [37].

Contract classes
Our first step was to identify those public procurement contracts won by companies labeled as having been involved in corrupt activities. Since these companies have gone through a process to be labeled as either an EFOS or a PCS, we assume that they may be suspect of having incurred in corrupt behavior in all the contracts they participated in, independently of whether the contracts occurred before or after the company was labeled. Thus, all the contracts in which the supplier is an EFOS or a PCS, are classified as possibly corrupt and we assign the corresponding label EFOS or PCS to them.
Following the principle of presumption of innocence, we label as NC (for Non-Corrupt) all the other contracts in the source list in which the supplier is free of official corruption

Spending
N/A Amount of money spent by the buyer (in USD PPP) c a As mentioned previously, the state-owned companies are exempt to report their activities as suppliers to other government agencies [38,39]. b The Mexican Economic Secretariat proposed this size classification based on the amount of resources produced by the company and its number of employees [40]. c Since most of the Spending is reported in Mexican currency (MXN), we converted the amounts to USD PPP using the equivalences given in [41]. Of course we are aware that it is very likely that corrupt contracts went undetected and end up in our NC class, however we expect that they will have little statistical weight in this class that represents the vast majority of the contracts.

Contract description variables and risk factors
Once we have identified the class to which each contract belongs, we use the information given in the source list to build two variables that are descriptive of the buyers: the maximum number of contracts assigned by the buyer to the same supplier, and the maximum total amount spent by the buyer with the same supplier in each year. We call these variables T.Cont.Max and T.Spending.Max. These variables give us an idea of the budget managed by each buyer and its activity in the public procurement market ( Table 2 -Type  ii)). Unfortunately, the available data in the source list is not detailed enough to evaluate exactly the risk factors developed in [27,28,42], thus, we propose approximate versions of the factors that give similar information about the features of the relationship between the buyers and suppliers, and we refer to these variables as Type iii) (See Table 2). Below we explain in detail the meaning of the risk factors we use: 1. RAD: As we mentioned above, single-bidder contracts have been shown to be effective identifiers of corruption risk, particularly because these direct processes hinder the possibility of competition, and the selection of winners may be influenced by illicit agreements.  [39]. Given these considerations, we consider as a risk factor the "Fraction of single-bidder contracts", proposed in [32] and defined as: where T.AD is the total number of single-bidder contracts assigned directly by a buyer to a supplier in a year, and T.Cont, the total number of contracts between the specific buyer and supplier that year. If RAD ≥ 0.5, that is, if more than half of the contracts between a supplier and a buyer are single-bidder, we consider that the risk of corruption is high [32]. Strictly speaking, the variable is descriptive of the relation between each buyer and supplier each year. However, in what follows, we assign to each contract the value of RAD corresponding to the relation between the buyer and supplier celebrating the contract. 2. Fav: Clientelism, or favoritism, is also a red flag for corruption. This practice refers to the situation in which the buyer favors a particular supplier by giving it an atypically high volume of contracts and money. While it is hard to detect other than in very obvious cases, the IMCO proposed a quantitative way to measure favoritism as follows [42]: with, T.Cont and T.Spending the number of total contracts and total spending made by the buyer with the supplier in a year, and T.Cont.Max and T.Spending.Max the variables that are descriptive of the buyer as explained above. This factor would give a score of 1 to the supplier that is the most favored by the buyer in both the number of contracts and money spent. The rest of the suppliers are then scored with respect to this most favored possible supplier. If Fav ≥ 0.9, then we consider that that relationship buyer-supplier has a large risk of corruption [42]. As above, we assign to each contract the value of the Fav corresponding to the relation between the buyer and suppler celebrating the contract. 3. CPW and SPW: The IMCO also identifies a particular form of corruption in which a buyer and a company manipulate the conditions of a contract procedure, dividing an expensive contract into multiple smaller ones that are assigned to the company in a very short time interval (sometimes the same day). These smaller contracts are easier to assign directly to a single-bidder, and are less likely to be scrutinized [42].
To identify this behavior, we consider two risk factors: "Contracts per Active Week" and "Spending per Active Week" defined as: where T.Cont and T.Spending are the same variables explained before, and ActiveWeeks is the number of weeks in which a supplier was assigned contracts by a buyer in each year. Then, if a large contract was fractioned into many small ones, we expect that CPW would be large. We consider that there is a large risk of corruption if CPW ≥ 5 (i.e. that the company received more than one contract per day) and SPW ≥ 10K USD PPP. 4 As above, we assign to each contract the value of the CPW and SPW corresponding to the relation between the buyer and suppler celebrating the contract. Therefore, considering the variables available in the source list, and the variables related to buyers' features and risk factors, there are three types of items in the final data set: i) Items that describe the features of the contracts. For example, the procedure by which the supplier won the contract (single-bidder, public contest, etc.), the amount allocated to the contract, or the week in which the contract began (Table 1). ii) Items that describe the features of the buyers. These include the maximum amount spent by a buyer with a supplier, or the maximum number of contracts carried out by a buyer with a single supplier. iii) Items that give information about the relationship between the buyers and suppliers. These items are expected to work as risk factors for corruption. Examples of these are the fraction of single-bidder contracts awarded by a buyer to supplier, or the favoritism of a buyer for a supplier. In what follows, the metrics computed for variables of type i) and iii) will be the fraction of contracts that satisfy certain property (e.g. the fraction of contracts in which a small company participated, or the fraction of contracts between buyers and suppliers characterized by a given value of RAD), and the metrics related to variables of type ii) will be related to the fraction of buyers with certain features (e.g. the fraction of buyers with a maximum spending larger than 5K USD PPP). Table S1 in the Additional file 1 shows the most common descriptive statistics of all the variables for each contract class.

Statistical analysis
One of the central goals of this work is to determine whether the government turnover brought along a methodological change in public procurement. To achieve this goal we first need to analyze whether, within the same government period, there exist significant differences between the three contract classes we defined. Specifically, we aim to verify that each contract class presents statistical differences in their characteristic variables (i.e. variables of types i) and ii)), when compared with the other classes. This will provide further justification for separating contracts into the three classes, and will help associate each class with certain characteristic description variables. After that, we compare the statistical profile of each class between the different governments, which will tell us whether there were significant changes in public procurement practices within the classes with the change of government. And finally, we test the risk factors proposed above, and determine if the government transition resulted in a higher or lower risk of corruption in public procurement.

Binomial Test and Kolmogorov-Smirnov Test
Since each contract has two kinds of descriptive variables, dummy variables and nondummy variables, we use different tests to compare each kind of variable between classes and periods. To measure differences between classes we take the following procedure: for the dummy variables set we use the Binomial Test (B-Test) [44,45], and for the nondummy variables set we use the Kolmogorov-Smirnov Test (KS-Test) [46]. 5 The specific steps to do this are the following: 1. We separate the data corresponding to the two different government periods. The 1st period covers from 2013 to 2018, the 2nd period from 2019 to 2020. 2. For each contract class we extract the data for each variable for all the contracts belonging to the class in each period. 3. We compare classes by pairs in the same period performing the two-sample B-Test or the two-sample KS-Test for each variable. 6 4. We consider that there are significant statistical differences between contract classes in those variables for which: a. The B-Test results in a p-value p v ≤ 0.05 and where the difference between fractions of the dummy variables is ≥ 0.1. b. The KS-Test results in a statistic D ≥ 0.1 and a p-value p v ≤ 0.05.
With these values we ensure that at least 10% of the contracts of one class present, in these variables, a different behavior from the contracts of the other class. The sample sizes for each variable type in each class and period, are in Table S2 in the Additional file 1.
The results of these comparisons are shown in sub-Sect. 3.2.

Measuring differences between government periods
To measure the differences between government periods, we take a slightly different path than that taken above since there is a natural variability in the distribution functions within a period, and we seek to detect differences beyond this variability. Thus we perform the comparison as follows: 1. We separate the data of each government period by year. 2. For each contract class we compute: a. For each dummy variable: the fraction of contracts in which the variable is present, over all the contracts belonging to the class in every year. b. For each non-dummy variable: the cumulative distribution function (CDF) of the variable 7 over all the contracts belonging to the class in each year. 3. Then, for non-dummy variables, we compute the confidence interval (CI) at 99% 8 for each distribution, using the data from each year of the 1st period. For dummy variables we use a boxplot of the data from 2013-2018 for comparison. 9 4. Now, using the data from the contracts under the new government, those dummy variables whose fractions are outside of the minimum and maximum ranges of the boxplot for the corresponding data from the first period, and those non-dummy variables with at least 25% of their cumulative distribution curve lying outside of the CI of the corresponding data of the first period, are considered to have significant 5 We choose these particular tests because a balance between samples sizes is not required to obtain reliable comparison results. 6 To compute the B-Test we follow the standard procedure to compute the z -score and the p -value given in [44,45,47] and for the KS-Test we use the R function ks.test from the dgof package [48,49]. 7 To make this computation we use the ecdf R function from the stats package [48]. 8 We considered this value for the confidence interval because of the small amount of data we have to compute it. For the computation of the CI we use the CI R function from the Rmisc package [48,50] 9 To generate the boxplot we use the R function boxplot from the stats package [48]. Table 3 Public data per year in the period 2013-2020. Approved federal budget (FB -extracted from [51-58]), total spending made on public procurement (TS -extracted from the source list), total number of contracts made (TC-extracted from the source list), and the ratio TS/FB. FB and TS are reported in USD PPP using the equivalences given in [41] Year 5.02 · 10 11 5.55 · 10 11 5.64 · 10 11 5.64 · 10 11 5.48 · 10 11 5.75 · 10 11 6.28 · 10 11 6.51 · 10 11 Total Spending (TS) 6.43 · 10 10 8.97 · 10 10 7.86 · 10 10 6.78 · 10 10 8.68 · 10 10 5.52 · 10 10 3.03 · 10 10 4.08 · 10 10 statistical differences beyond the natural variability of the distributions. Conversely, those dummy variables whose fractions are inside the corresponding boxplot's ranges, or the non-dummy variables for which at least 75% of their cumulative distribution curves are within the corresponding CI, will be considered statistically equivalent, meaning that the behavior corresponding to this variable is similar between periods. This method provides a quantitative tool to identify the differences (and similarities) of the different contract classes between periods.
The results of these comparisons are shown in sub-Sect. 3.3.

Context for comparison
To make a fair comparison between government periods and between different contract classes, we present how many resources were spent in each class, as well as how many contracts belonged to each class per year. Table 3   This decrease may be a consequence of the politics of "republican austerity" imposed by the new government [59].
Considering these data, we compared the number of contracts and the amount of money spent on each contract class for every year (Table 4 -Rows 2 and 3, and Fig. 1-Center). It is noticeable that the number of contracts in the EFOS class in 2019 and 2020, and the amount of money spent on them, fell by an order of magnitude. In contrast, the number of contracts and the total spending in contracts in the PCS class remained roughly in the same scale over several years, and only suffered a significant decrease in 2020. For contracts in the NC class, the numbers are similar between periods.
We normalized the absolute numbers of contracts in each class by the corresponding TC and TS to compare the percentage of the Total Contracts (%TC) and of the Total Spending  (Table 4 -Rows 4 and 5, and Fig. 1 -Bottom). We see that the NC class represents approximately 90% of the Total Spending and Total Contracts for all years. Also 184,767 Spending 1.06 · 10 8 5.75 · 10 9 6.20 · 10 10 9.63 · 10 7 6.17 · 10 9 8.05 · 10 10 3.48 · 10 7 6.11 · 10 9 4.90 · 10 10  for the contracts in the PCS class there were no major changes between years in %TC and %TS up to 2020, where %TC decayed roughly by half to 1.02% and %TS fell to 1.78%. However, for the contracts in the EFOS class, there was a large decay in the fraction of contracts assigned, and a corresponding decrease in resources spent when the government changed. Then, the first important change in public procurement due the government transition, was a marked decrease in public spending and in the number of contracts made. This decrease was especially noticeable in the EFOS class.

Comparison between contract classes
To verify whether the three different contract classes have intrinsic, statistically significant, procedural differences between them, we computed a B-Test (for the set of dummy variables) and a KS-Test (for the non-dummy variables) comparing by pairs each variable that describes the contracts. This comparison was made between all the contract classes in the same government period.
First we compared the EFOS class vs. the PCS class. For the 1st period, Table S3 and Fig.  S1 in the Additional file 1 show that there were 17 variables for which the statistical tests showed significant differences. These variables were of type i), i.e. variables that describe the features of the contracts. For example, we found that most of the contracts in the PCS class were made with companies classified as "large supplier" (labeled S.NOM), while the contracts in the EFOS class were mostly done with micro, small, or medium companies. 10 Figure 2 shows a subset of those variables for which contracts carried out with companies labeled as EFOS and PCS were significantly different in the 1st government period For the variable called Spending (Left), we found that the contracts in the EFOS class (red circles solid line) tend to be more expensive than the contracts in PCS class (blue triangles dashed line). We can also see in Fig. 2 -Right that the PCS companies obtain contracts of short duration (less than 3 weeks) more frequently than EFOS companies. However, in both classes there are a few contracts of large duration (more than one year). Finally, Fig. 2 -Center shows that the 1st period government made contracts with EFOS mainly during the 2nd half of the year (BeginningWeek), while companies labeled as PCS were contracted mostly during the 1st half of the year. For the 2nd government period, the comparison EFOS vs. PCS showed 14 variables with significant differences, all of them of type i) as well (Table S4 and Fig. S2 in the Additional file 1). Interestingly, 11 of these variables were also present in the set of variables that displayed differences in the 1st government period. Variables such as fraction of contracts with large suppliers (S.NOM) and Spending appear with similar behavior as in the 1st period. Also, even when the PCS contracts tend to be shorter than EFOS contracts, some of the PCS contracts reached duration times of up to two years, versus one year for the longest EFOS contracts (Fig. S2 in the Additional file 1). The remaining variables with significant differences are related to procedure type, procedure character, or contract type. For example, almost all the PCS contracts were for acquisitions (CT.ADQ), while less than half of the EFOS contracts were of this type (Table S4 in the Additional file 1). Also, the distribution of the variable for the single-bidder procedure (PT.AD) shows that this procedure is more frequent in the PCS class than in the EFOS class (Table S4 in the Additional file 1). These results highlight the fact that each of the corrupt contract classes has a different statistical profile, which in turn justifies keeping the corrupt classes separated. Further, given that the PCS contracts far outnumber the EFOS contracts every year (see Table 4), if we join both classes in a single "corrupt" class, the statistics of this "joint" class would mostly reflect the statistics of the PCS class.
Regarding the comparison between EFOS and NC contracts, 10 variables presented significant differences for the 1st period, these variables were of types i) and ii) (Table S5 and Fig. S3 in the Additional file 1). It is noticeable that the two variables that showed the most prominent differences were those that characterize buyers' features (i.e. variables of type ii)); these were the maximum number of contracts awarded by a buyer to a supplier (labeled T.Cont.Max), and the maximum amount of money spent by a buyer in contracts with a supplier (T.Spending.Max). These variables show that the EFOS class had a higher proportion of buyers with a proportionally stronger market activity (i.e. buyers characterized by having awarded a maximum number of contracts to a supplier T.Cont.Max ≥ 5) and larger budget (T.Spending.Max ≥ 1.9M USD PPP) than the buyers in the NC class (Fig. S3 in the Additional file 1). For the 2nd period the statistical tests gave 16 variables (again of the two first types) with significant differences (Table S6 and Fig. S4 in the Additional file 1). Here, also the variables T.Cont.Max and T.Spending.Max, were those that showed the most prominent differences. The behavior of these variables was similar to that observed in the 1st government period, giving the same differences between EFOS and NC classes for the buyers' features.
The PCS vs. NC contract classes' comparison for the 1st government period (Table S7 and Fig. S5 in the Additional file 1) showed significant differences in 10 variables, again distributed among the variable types i) and ii). For example, the variable for national procedure character (labeled PC.N) shows that the NC contract's fraction made under national regulations, i.e. the contracting protocols that followed Mexican laws, was much higher than in the PCS class. The variable for large suppliers (S.NOM) shows that this kind of company was proportionally contracted with higher frequency in the PCS class than in the NC class. The variables for buyers' features (T.Cont.Max and T.Spending.Max) presented differences similar to those discussed above: the PCS class had a higher proportion of buyers with a proportionally stronger market activity and larger budget than the buyers in the NC class. For the 2nd government period, the tests showed 13 variables with significant differences, again within the variable types i) and ii) (Table S8 and Fig. S6 in the Additional file 1). For example, the variable for large suppliers (S.NOM) has a similar behavior as in the 1st government period: large suppliers were more frequently contracted in PCS class than in the NC class. Finally, Fig. S6 in the Additional file 1 shows that the PCS contracts tend to last more than the contracts in the NC class.
Thus, we see that our three contract classes do present significant differences when compared among each other in the same government period, and that these differences are similar between periods. These results provide support to the separation of the contracts in these three classes, and also to the hypothesis that the corruption that occurs through EFOS and PCS have different procedural patterns. At the same time, these two contract classes have differences with the NC contracts. The next step is to identify whether each class presents differences between government periods.

Comparison between government periods
As mentioned before, to identify differences and similarities in public procurement between government periods, we compared the variables that define the contracts either by computing the confidence interval (CI) generated by the years of the 1st period (2013-2018) and verifying whether or not both curves corresponding to each year of the 2nd period (2019-2020) lie within the CI, or, for the dummy variables, checking if the results for the second period fall inside the box plot of the data of the first period. Since one of our goals is to try to determine whether the government transition that occurred in México at the end of 2018 brought a change in the procurement practices, we compared the data for each period in the three different contract classes. Our analysis of the data sets produced the following results.  Figure 3 shows the most significant differences between both periods for the three contract classes: EFOS, PCS, and NC. First, we observe that in the 1st period, ∼ 20% (on average) of the suppliers classified as EFOS were micro-companies ( Fig. 3 -Top Left). In contrast, in the 2nd period, this fraction grew to 50% in 2019, and to 100% in 2020. This signals an important change in the way interactions with EFOS were carried out between both periods. Another important difference in this class was the contract duration ( Fig. 3  -Bottom Left). In the 1st period, only 20% (on average) of the contracts with companies labeled as EFOS lasted more than ten weeks, while for the 2nd period, 40% of these contracts had a duration of more than ten weeks in 2019, and in 2020 this fraction grew to almost 90%. Even when the government of 2nd period invested less money on EFOS, and granted them fewer contracts (Fig. 1), the majority of these contracts were of long duration. Fig. S7 in the Additional file 1 shows the remaining variables in which there were significant differences between periods. For example, we notice that the new government tends to contract EFOS at the beginning of the year (BeginningWeek), while the previous government did so mostly during the second half of the year.

Main differences between both periods
For the PCS class we observe that in the 1st government period, the PCS contracts were made mostly (57%) with small and medium companies (Fig. 3-Top Center). In contrast, in the 2nd government period, the majority of these contracts passed to large suppliers. On the other hand, with the new government, PCS contracting decreased in the middle of the year; i.e. the 2nd period government had a slightly lower tendency to contract PCS between weeks 20 and 40 than the 1st period government (Fig. 3 -Bottom Center). Fig.  S8 in the Additional file 1 shows the remaining variables for which contracts in the PCS class presented significant differences between periods. For example, in the 1st period, 1% of contracts in the PCS class were leases (labeled CT.AR), this fraction grew to almost 1.5% in 2019, and to 2% in 2020.
Next, for the NC class, we notice that the 2nd period had an increase of 15% in the number of contracts made under the rules and regulations of the North American Free Trade Agreement (NAFTA) (Fig. 3 -Top Right). On the other hand, the two variables associated with buyer features had differences between periods. Figure 3 -Bottom Right shows that almost 80% of the buyers in the 1st period had a maximum spending (T.Spending.Max) less than 2K USD PPP, while in the 2nd period this percentage fell to almost 60%. Fig. S9 in the Additional file 1 shows that the maximum of contacts made by a buyer with the same supplier in the 2nd period remains relatively close to the CI generated by the 1st period. Fig. S9 in Additional file 1 shows the remaining variables for which NC contracts presented significant differences between periods. For example, the variable for single-bidder procedure type (PT.AD) shows that there was also a slight increase in the percentage of contracts won through this procedure from 75% to 80% from one period to another.

Main similarities between both periods
Having identified the main differences between the contract classes in both government periods, it may also be helpful to identify the similarities, i.e. identify those variables that did not experience significant changes due to government transition. This will give an idea of which aspects of the practices related to EFOS and PCS contracts may have remained. Figure 4 shows the main similarities between governments for the three contract classes. We notice, for example, that for the EFOS class, the percentage of contracts assigned to single-bidders (PT.AD) did not experience significant changes between periods, staying within the range of 50% to 60% (Fig. 4 Top Left). There are also similarities in the percentage of contracts for leases (CT.AR, Fig. S10 in the Additional file 1).
For the PCS class, we found similarities in only one variable. For the Spending variable we found that 80% of the contracts were for less than 1M USD PPP in both periods, but in both periods there were a few contracts that reached 4M USD PPP (Fig. 4 -Center).
For the NC class, we found similarities in the procedure character, in which 7-11% of the contracts were made under international rules for both periods (Fig. 4 -Right). In Fig.  S12 in the Additional file 1 we can observe that there are also similarities in the contract type. We find that almost 35% of the contracts were for services in the 1st period, and in the 2nd period this fraction remains at 38% in 2019, and 32% in 2020. It is also noticeable that the NC contracts for both periods had the same distribution for the duration variable (labeled EBWeeks), where, while most of them lasted less than 40 weeks, there were some that reached more than 200 weeks of duration.
These results show that corruption related with EFOS and PCS suffered important changes in specific features, such as the kind of contracted supplier, the duration of the contracts, and the time of the year in which the corrupt companies won the contracts; and there were other features that did not change between governments, such as the fraction of contracts in the EFOS class won by single-bidder, or the distribution of resources spent on contracts of the PCS class. Also, while the NC class is in principle not legally related to corruption, the differences and similarities of the features that characterize this class of contracts from one period to the other, provide an idea of how the practices of public procurement changed during the government transition.

Testing the risk factors
To reach our second goal, we tested the risk factors proposed in Sect. 2.1.2 and presented in Table 2 (labeled Type iii)). To do this we performed two experiments: 1. First we measured the accuracy of each risk factor as a descriptor of the contracts corresponding to the corrupt classes (EFOS and PCS). This was done by computing the fractions of contracts within each corrupt class for which the value of the variables proposed as risk factors exceed the corruption thresholds. If either fraction is larger than 0.5, then the risk factor may be considered as a useful descriptor of the contracts in the corresponding corrupt class. These fractions can be interpreted as the conditional probabilities that a contract exceeds the corruption threshold of the variables proposed as risk factors, given that it belongs to the corresponding corrupt class. 2. Second, we measured the accuracy of each risk factor to identify those contracts in which EFOS or PCS participated. To do this we compute the fractions of contracts that belong to either the EFOS or to the PCS class respectively, within the contracts for which the values of the variables proposed as risk factors exceed the corruption threshold. If either fraction is larger than the probability to find a contract of the corresponding corrupt class at random from the complete set of contracts, we consider that the risk factor is helpful identifying contracts in the corresponding class. Again, these fractions are the conditional probabilities that a contract belongs to either the EFOS or the PCS class, given that it exceeds the corresponding corruption threshold of the variables proposed as risk factors. These experiments are done without separating the contracts between both government periods, i.e. we considered all data from 2013-2020 to test each risk factor. Once we determine the risk factors that most accurately describe and identify the corrupt contract . The dashed grey lines indicate the threshold for each risk factor and the value of the CDF in which it was crossed. This value in the CDF indicates the probability to find a contract with the risk factor above the marked threshold given that the contract belongs to the corresponding class classes, we determine whether the government turnover brought along a higher or lower risk of corruption using the same framework proposed in Sect. 2.2.2 for non-dummy variables. Figure 5 shows the results of the first experiment, namely, the probabilities that a contract exceeds the corruption threshold (i.e. that the value of the variable RAD, which we have assigned as a descriptor of each contract, exceeds 0.5; and favoritism Fav, exceeds 0.9) given that it belongs to each corrupt class respectively. We observe that the risk factor RAD has a nearly 60% accuracy describing contracts in the EFOS class and a nearly 80% accuracy describing contracts in the PCS class. On the other hand, less than 1% of the contracts in both classes presented a favoritism greater than 0.9. Fig. S13 in the Additional file 1 shows the accuracy for the remaining risk factors. It is noticeable that the presumably corrupt behavior which combines more than 5 contracts and 10K USD PPP spending per active week, which should describe the practice of dividing expensive contracts into smaller ones, showed an accuracy of less than 2% for contracts in the EFOS class and less than 30% for the PCS class. Fig. S14 of Additional file 1 shows the results of this 1st experiment from the perspective of a recall curve 11 over a larger range of thresholds. Figure 6 shows the results of experiment 2, these are the fraction of EFOS (solid red line) and PCS (solid blue line) contracts within all those contracts that exceed a certain threshold for a range of possible threshold values for the RAD and Fav variables (Left and Right, respectively). The dashed lines represent the probability to randomly find in the source list an EFOS (red) or PCS (blue) contract. We can observe that the RAD risk factor is not a good identifier for neither of the corrupt classes, since none of the solid lines are above their respective dashed line. The opposite happened for the Fav risk factor. Here we found that the probability to find an EFOS or PCS contract within those with a favoritism larger than the threshold (solid lines) is indeed higher than the probability to randomly find a corrupt contract in the source list (dashed lines). Fig. S13 in the Additional file 1 shows that CPW and SPW variables are good identifying PCS class but not EFOS class. As above, Fig. S14 in the Additional file 1 shows the results of the 2nd experiment from the perspective of a precision curve 12 over a larger range of different thresholds.
Thus, even though Fav is not a good descriptor of corrupt behavior since only 1% of the EFOS and PCS contracts have a favoritism larger than 0.9; it is a good identifier since the fraction of both classes of corrupt contracts, among the contracts with a value of Fav greater than 0.9, is significantly higher than their concentration in the complete sample. In contrast, the RAD variable is good as a descriptor of EFOS and PCS contracts, since more than 50% of these contracts occur in buyer-supplier relationships for which RAD ≥ 0.5. However, RAD is not a good identifier since the probability to find an EFOS or PCS contract within all of those that exceed a given threshold for RAD is never larger than choosing at random from the complete sample. Surprisingly, the fraction of contracts in the EFOS class among the contracts with RAD ≥ 0.5 is actually lower than their concentration in the complete source list. Now, to determine whether the government turnover brought a change of procurement practices associated to corruption, we need to analyse which of the risk factors is correlated to an increase in the fraction of potentially corrupt contracts. To do this, we compute linear models between the total fraction of contracts between a buyer and a supplier for which RAD ≥ 0.5 per year as independent variable, and the fraction of contracts by single-bidder in each of the corrupt classes per year as the dependent variables. A similar procedure was done with the contracts with Fav ≥ 0.9. 13 To take into account the heterogeneity between the number of contracts in each class per year, we weighted each Corruption risk between periods. CDF for the RAD variable. The orange line corresponds to the mean of the 1st government period, the green area represents the CI at 99% generated by the data of the 1st period, and two purple colored curves correspond to the first two years of the second government period observation by the number of contracts assigned to each class per year. Figure 7 shows the results of these analyses for RAD risk factor. We observe that the increase in the fraction of contracts characterized by a RAD ≥ 0.5 is linearly correlated with an increase of the fraction of potentially corrupt contracts in both classes, with a R 2 = 0.72 for contracts in the EFOS class and R 2 = 0.82 for those in the PCS class. Both R 2 are statistically significant with a p v 0.01. For the Fav risk factor the analyses show that there is no significant correlation between the fraction of contracts with Fav≥0.9 and the fraction of potentially corrupt contracts (Fig. S15 in the Additional file 1). To confirm that RAD variable is the best (among our variables) to predict corruption risk related to EFOS and PCS, we compute several multivariate regression models considering, one by one, variables such as procedure character PC, contract type CT, supplier's size S, and the remaining risk factors. This experiment shows that the impact of RAD in the prediction of corruption risk is always significantly larger than the other variables (Tables S9 and S10 in the Additional file 1). Thus, we conclude that RAD is a good indicator of corruption risk, which is in line with previous results found in related works [32].
Then, considering the variable RAD as the best we have to describe corruption related to EFOS and PCS, and given its correlation with the risk of corruption, we can use the framework used in Sect. 2.2.2 to test how the risk of corruption changed with the government turnover. Figure 8 shows the CDF of the RAD variable, separated by periods. We can observe that the RAD variable in the 2nd period (purple colored lines) lies outside of the CI of the distributions of the old government for RAD values from 0.5 to 0.75, indicating that though less numerous, due to the large amount of contracts assigned to single bidders, the practices in the 2nd period present a higher risk of corruption than in the first period.

Discussion
México lived a historical government turnover in 2018 when, for the first time, a leftist candidate won the presidency, being the most voted candidate in México's democratic history. The new government appeared to represent a complete change of regime, credibly promising to break with the widespread corrupt practices of previous governments. In this work we presented a statistical framework to identify to what extent public procurement practices underwent significant changes due to the government transition, focusing on those contracts related to companies labeled as EFOS (companies that provide invoices for simulated operations) or as PCS (sanctioned suppliers and contractors) due to having incurred in some other kind of corrupt behavior. To do this, we analyzed data from more than 1.5 million government contracts corresponding from 2013 to 2020. We classified each contract in one of three different classes: EFOS or PCS if the contract had been carried out with a company identified as incurring in corrupt practices by the Mexican tax agency (SAT), or Non-Corrupt (NC) if it had not been identified as such. Once classified, we characterized each contract class by a set of variables that give information about the contract itself (i.e. the contract type, the supplier company's size, the amount spent, etc.) or the buyers' features (for example the maximum number of contracts given by the buyer to a single company in a year). We also proposed risk factors constructed considering the framework developed in [29,32]. We tested which of these risk factors had the best accuracy to describe and identify the companies labeled as EFOS or PCS, and then compared between government periods to determine whether the risk of corruption decreased, increased or remained the same after the government transition.
Before discussing the detailed comparison between the two governments, a first important difference between them concerns the number of presumably corrupt contracts per year carried out during each government, as well as the total amount of resources spent in these contracts. In the first year of the new government, the number of EFOS contracts fell by a factor of four from the last year of the previous government (or a factor of eight with respect to the average of the six years of the previous government), and another factor of four, to a total of only 13 cases, in the second year. A decrease in the number of EFOS contracts represented a reduction by a factor of three with respect to the previous year (or a factor of approximately 11 with respect to the average of the previous government) in the amount of resources spent in this kind of contracts. Also, and representing many more resources, the number of contracts carried out with companies identified as PCS during the first year of the new government was comparable to those of the previous government, however, the amount spent on these contracts during the first year of the new government fell by a factor of two with respect to the last year of the previous government. In the second year, the number of contracts fell by a factor larger than 2, and the resources spent fell by another factor of 4. Thus, the data showed that there has been a significant reduction in both the total number as well as the relative fraction of corrupt contracts, and the corresponding resources spent on such contracts. Whether this reduction was due to an effective crackdown on corruption, a consequence of the austerity program undertaken by the new government, or some other reason, is a question the data cannot answer.
However, our purpose in this work was to try to go beyond the analysis of the total amounts of resources spent in contracts that were suspect of being corrupt. We attempted to establish whether the practices and warning flags regarding suspect contracts actually changed from one government period to the other. To do so, using the variables characterizing each contract, we began by verifying that each contract class presented statistically significant differences between them in the same government period. Thus, we were able to assign a rough statistical profile to each class of contracts in each government period. For example, we found that contracts in the EFOS class were characteristically not carried out with large companies, and that this class had a large proportion of contracts for public work in comparison with the PCS and NC classes. On the other hand, the PCS class profile was distinguished by having very few contracts with micro, small, and medium size companies, and that the main activity in this class was for acquisitions. Finally, both corrupt classes were identified by having a higher proportion of buyers with a proportionally stronger market activity and larger budget than the buyers in the NC class. These characteristic statistical profiles were maintained between periods. Having verified that the various classes -defined by whether or not the supplier had participated in corrupt activities, and if so, what kind of activities-also had significant statistical differences between them, the next step was to analyze whether, as a consequence of the government transition, the practices in public procurement suffered changes within each class.
Regarding contracts with EFOS, we identified that the new government tended to favor micro-suppliers more than the older government. Companies this size obtained only 20% of EFOS contracts (on average) in the 1st government period. This percentage grew to 50% in 2019 and to almost 100% in 2020. The reduction in the fraction of large EFOS suppliers may be due to government action targeting these companies as they represent a large drain of resources. Also, EFOS contracts' duration suffered a significant change: the proportion of EFOS contracts longer than ten weeks grew from 20% to 40% in 2019, and to almost 90% in 2020. On the other hand, there were also similarities between periods for this class of contracts. For example, the percentage of the contracts assigned to singlebidders remained between 55%-65% in both periods.
In regards to contracts in the PCS class, these showed differences in the kind of suppliers that were favored, exhibiting a relative increase (from 40% to 65%) in the fraction of large companies that won contracts between both government periods. Moreover, the fraction of single-bidders in this class also increased from one government to another. On the other hand, the variable Spending showed that the way resources were distributed in PCS contracts remained roughly the same between periods.
The NC class presented differences mainly in the contract procedures and in the features of the buyers involved. In the 2nd government period, there was an increase of ∼ 20% of contracts made under NAFTA procedures compared to the 1st period. We also found that the kind of buyers that participated in the contracts in this class, tend to spend more money with a single company in the 2nd period than in the 1st period.
Our framework to analyze a set of contracts through certain characteristic variables allowed us to identify specific changes and similarities between contract classes and between government periods. But this approach did not tell us whether the risk of corruption increased or decreased after the government transition. To do this we considered specific indicators, namely RAD (fraction of single-bidder contracts), Fav (favoritism), CPW (contracts per active week) and SPW (spending per active week), based on previously proposed risk factors of corruption [27,28,42]. These can be used as red flags for those contracts for which these variables exceed a certain threshold. For example, a buyer-supplier relation (and all contracts in it) should be red flagged as risky if, say, RAD ≥ 0.5 [32], or if Fav ≥ 0.9 [42]. Our study showed that for the EFOS and PCS classes, the risk factor Fav performed well identifying corrupt contracts among those that exceeded the threshold of 0.9. Actually, it was twice as probable to find EFOS contracts in the subset of contracts with Fav≥0.9, than in the whole source list, and almost five times more probable to find PCS contracts. However, it had little accuracy as a descriptor since Fav exceeded the threshold 0.9 in less than 1% of companies labeled as having participated in corrupt activities. The variables CPW and SPW were not useful risk factors, since only a few EFOS and PCS contracts had more than five contracts per active week and expenditures for more than 10K USD PPP per active week, thus we were unable to identify the corruption scheme in which a buyer assigned many small contracts in a short period of time to the same supplier [42]. This may also suggest an important limitation of the effectiveness of these variables to predict corruption with the available data. Actually, having access to more specific information about the contracts and their participants, would be helpful to re-compute these risk factors and to re-test their effectiveness to describe and identify corruption.
On the other hand, the risk factor RAD was accurate in describing (on average) more than 50% and 80% of the companies labeled as EFOS and PCS, respectively. However, this result also implies that many corrupt contracts were won in open contest. This suggests that in our corrupt classes there may be two kinds of illegal schemes; one in which public officials colluded with the supplier and frequently assigned them contracts as singlebidders (i.e. RAD ≥ 0.5), and another scheme in which the contracted companies committed a fraudulent action without the government being involved. This second scheme is likely to occur in EFOS and PCS contracts that are mostly assigned through open contest (so RAD < 0.5). Here we focus in schemes of the first kind, as corruption implies government complicity [5]. We also found that the RAD variable was not efficient in identifying corrupt contracts among those with RAD≥0.5. Actually, the probability to find a PCS contract in this subset is the same as the probability to find it choosing randomly in the complete source list, and the probability to find an EFOS contract among those with RAD ≥ 0.5 is significantly less than for the random search in the complete source list. This result is explained by the fact that the single-bidder contracts are very common in the NC class (Fig. 5 Right), as common as in the PCS class and even more common than in the EFOS class. Thus, even when the single-bidder contracts should be only exceptional [38,39,43], they have become an extremely common procedure in México's public procurement practices. Further, in line with previous findings [32], we found that the RAD variable is positively correlated with an increase of the risk of corruption. Given this correlation, the picture that emerges is that even though the absolute number of corrupt contracts has been reduced, the risk of corruption related to EFOS and PCS contracts has not