Trends in crowd accidents based on an analysis of press reports

Crowd accidents – defined as situations where mass gatherings of people lead to deaths or injuries – have become a frequent occurrence on a global scale. Given the recurring nature of these accidents, it is essential that their characteristics are analyzed. To this end, an important step would be documenting these records. Here, a database of crowd accidents is developed for the period of 1900–2019 through a comprehensive investigation of the press and media reports. The analyses focus mainly on temporal trends of their frequency and injury/casualty in each accident, as well as their geographical distribution and classification based on the purpose of gathering. Results show that the frequency of crowd accidents has been unambiguously on the rise over the last 120 years. Also, there was no indication that larger crowd sizes increase the risk of injury or death per person. In fact, the opposite was the case, although a causal relation between crowd size and risk of injury/death is impossible to establish. Over time, the share of sport events in crowd accidents has declined, and instead, religious gatherings have become more notably present in the statistics. An interesting observation is the association of accident rates to the income level of the countries where they happen, with low-and-middle-income countries being more represented in the records. India and (to a lesser extent) West Africa, in particular, appear to be hot spots for crowd accidents. Finally, it is argued that the exponential increase in crowd accidents of the last century was only partially real, with technology also playing a role in making information more accessible for recent accidents. After the internet (and SNS) became widespread, the trend for reported crowd accidents does not show anymore an exponential increase although it is difficult to conclude whether their frequency is stable or not. The insights obtained from this study can pave the way for developing diagnostic knowledge and raising awareness about the ubiquity of crowd accidents.


Introduction
People have organized themselves in groups since the beginning of civilization, even earlier than that primitive hunters used to gather in groups (Page and French, 2020).However, the willingness and need of people to stay close to each other has taken a different spatial and temporal dimension in the latest centuries.Population is rapidly moving from rural to urban areas.Besides, even when within urban area, people tend to move toward city centers.Although the pace has started to slow down in some countries, the worldwide trend is still toward an increasingly dense urban lifestyle (The World Bank Group, 2020b).A consequence of this evolution is that human crowds get larger and are observed with higher frequencies in very different form of gatherings.Developments in air and land transportation are also making easier to occurred often have lasting psychological consequences knowing that other people passed away only a few meters from them but could do nothing to stop the collective motion under such critical conditions (especially in terms of density).
In this regard, it should be clear that preventing the occurrence of such accidents is an important goal to guarantee crowd safety, especially, but not only, during mass events.It should be also added that an improvement in terms of safety often results in an higher comfort unless extreme safety standards are set which would lead to the cancellation of any event as risks can be never fully eliminated.Specifically, flow lines for moving crowds need to be carefully planned to reduce the probably of stagnation (eventually leading to deadly densities) and avoid collisions among participants with the aim to achieve a smoother motion and also reduce perceived stress.
To better understand the mechanisms of collective crowd motion and what leads to an increase in density (being the ultimate cause for tragedies), research on pedestrian dynamics has seen a boom in the last decades (Haghani, 2021).A large number of simulation models are proposed every year (Yang et al., 2020) and recently also the number of experiments is increasing at a fast pace (Haghani and Sarvi, 2018;Corbetta and Toschi, 2023).However, quite surprisingly, despite several studies mention prevention of crowd accidents as a motivation for their research, to the authors' knowledge, a systematic review on crowd accidents is missing to date.Lists reporting crowd accidents are to be found in some books (Still, 2014;Feliciani et al., 2021), online (Wikipedia, 2023;Still, 2019) or in some reviews which are however often only marginally covering crowd accidents and usually focus on a specific aspect.For example, a review on crowd accidents can be found in the frame of the use of ''panic'' (Rogsch et al., 2010), for religious events in India (Illiyas et al., 2013), while discussing health issues at the Hajj (Ahmed et al., 2006), safety standards in the garment industry in Bangladesh (Akhter et al., 2010) or accidents which occurred during soccer games (Elliott and Smith, 1993;Darby et al., 2005).In addition, case studies covering a specific accident can be also found in the literature (Nicholson and Roebuck, 1995;Bowley et al., 2004;Wise, 2004;Zhen et al., 2008;Vendelo and Rerup, 2009;Helbing and Mukerji, 2012;Hsu and Burkle, 2012;Wagner et al., 2013), but a complete review covering the subject of crowd accidents from an historical, geographical and macroeconomic perspective is not to be found.
A review on crowd accidents is important because research on crowd dynamics has reached a degree of maturity such that methods created by it can and should be transferred to stakeholders and policy makers involved in crowd management.Also, it is of foremost importance to get an overall image on whether accidents are on the rise or not to judge whether efforts put forward during the last decades have paid off in helping the implementation of preventive measures to reduce the accident probability.Several policies have been created to make crowd facilities safer, the most known being probably the socalled ''Green Guide'' (Great Britain.Department for Culture and Media and Sport, 2008) which set guidelines for sport grounds (soccer in particular).More recently, efforts are being made to create standards to validate crowd simulators, typically for fire evacuations like the ISO 20414 standard (International Organization for Standardization, 2020), but also extending the scope to more ''general'' evacuations, such in the case of the RiMEA guidelines (Rogsch et al., 2014).Earlier, Fruin (1971) and Weidmann (1993) also proposed some standards to be used for pedestrian facilities, although their focus was mostly on guaranteeing comfort rather than safety.In light of the above, considering that both knowledge and coded guidelines are starting to emerge to regulate crowd management, it is necessary to assess which type of accident is most typically occurring to help focusing on the most urgent aspects and deliver the message to people who can be more relevant in establishing safe practices.
Finally, as already briefly mentioned, we should remark that it is not easy to get an idea on whether crowds accidents are globally on the rise or not.When large accidents occur both media and the scientific community focus on the subject, sometimes leading to conflicting results in public perception.From one side it is often mentioned how such large accidents are relatively rare, but, on the other side previous accidents are brought to attention making people aware that similar events occurred in the recent past.It is also not difficult to find scientific works mentioning that crowds accidents are on the rise and thus justifying the need for more research on crowd dynamics.Therefore, although challenging, this work will also try to provide some initial evidence to discuss on whether worldwide crowd accidents are on rise or not.
In light of the above this work aims at achieving a twofold goal, namely: (1) determining whether crowd accidents are on the rise and provide a general image of the type of crowd accident which is occurring with the highest rate to address future research; (2) provide a general picture of the geohistorical changes in regard to crowd accidents to understand which change in attitude/policy has been beneficial in reducing the occurrence of the same.
It is worth noting that, in the current work, we follow a topdown approach in which crowd accidents are studied by examining previously occurred events with the aim of identifying trends and informing policymakers of the recurring patterns and circumstances where accidents happen.A bottom-up approach describing mechanisms leading to crowd accidents is alternatively described in several sources (see for example Still (2014) or Feliciani et al. (2021)).In other words, in the current work, we focused on the macroscopic trends of crowd accidents as opposed to discovering root causes, addressing an existing gap in the former area.We believe that causal analysis of crowd accidents is also equally important.There is currently a limited number of studies that have tried to reconstruct some of the best-known crowd accidents and understand the underlying causes leading to the accident (Helbing and Mukerji, 2012;Jiayue et al., 2014;Sieben and Seyfried, 2023) and we believe that more such analysis is needed in relation to other (less investigated) accidents.
This paper is organized as follows.Section 2 presents the methods and criteria used to collect and classify data on crowd accidents.Numerical and geohistorical results of the analysis of the collected dataset are presented in Section 3. Section 4 provides a more general discussion based on the results where limitations of this work are also listed.Section 5 concludes this paper summarizing the main findings and providing advices to address future research.

Data collection and analytical approach
In this section, we will detail the methods used to prepare the crowd accidents' dataset and outline the main principles on which the analysis is based.To keep the presentation simple, only relevant aspects necessary to understand the results and the relative discussion are outlined here with details provided in the appendices of this work.

Data collection
The most important part of this work is represented by the dataset including the information for all crowd accidents.As a consequence, compared to the analytical tools used in the analysis, methods and criteria employed to prepare the dataset play a more relevant role.This section will briefly explain the methodology used to search for details about crowd accidents, the sort of data which was extracted and the criteria used to select information among several sources.The content presented here should be sufficient to understand the general approach; readers interested in details may refer to Appendices A and B.
The approach taken to generate the dataset is summarized in Fig. 1 and will be explained as follows.The starting point is represented by already existing lists on crowd accidents.For instance, a Wikipedia page is available on the subject and several lists are provided on specific topics (see Appendix A for a detailed list of references).Because of the large number of accidents, there are several works reviewing soccer tragedies or focusing on accidents in religious events in India.Making use of this already existing material, we collected several lists and compared them to create an initial dataset composed of unique accidents being reported somewhere.
In the next step, we started to individually look for details on each accident to check whether the considered item satisfies the inclusion criteria to be considered as a ''crowd accident''.Accidents which were not related to crowd motion were excluded from the dataset.In general, we consider as crowd related accidents where fatalities or injuries were ultimately caused by the crowd itself (regardless of responsibility or the reason why that happened).In particular, we excluded accidents where people got injured/killed through weapons or whether there is evidence that all victims were due to fire or smoke intoxication.On the other hand, we considered accidents related to overcrowding regardless of whether a structural failure occurred or not (if a wall breaks under crowd pressure it means either crowd size was more than what estimated or the wall was not designed to withstand the pressure; both are misjudgments in terms of crowd management).In addition, we excluded those accidents resulting in no fatalities and having less than ten people injured.
In the process of confirming and documenting an accident (more on data extraction will be presented later), it is not uncommon to find a different accident which was not included in the main list at that moment.For example, local newspapers tend to report previous similar accidents which occurred in the region (if any) and soccer-or religiousrelated accidents are often accompanied by a table with past tragedies.This could therefore lead to a new discovery while investigating an already known accident.Although many new accidents emerged through this mechanism in the early stages of the investigation, all mentioned events were already in the list toward the end of the data collection work.This is clearly not a definitive proof of validity, but, at least, it shows that further investigation may bring only very few additional results, thus confirming that the dataset is fairly complete.When no new accident emerged despite further research, we deemed the dataset complete and focused on the analysis.
So far we only discussed the methods to prepare a list of crowd accidents, but, after confirming the existence and the nature of each accident, information was extracted.A summary of the data used in this work is given in Fig. 2.Among the most important information is the number of people killed, those injured and the estimated crowd size.Among the three, fatalities are typically the most accurate, with limited differences among the several sources.But even for this case, it is usually difficult to verify which source is the most reliable and, especially in accidents having a political significance, differences in reported fatalities may be large.To take a consistent approach, we therefore decided to use the highest value reported for fatalities, number of people injured and crowd size.Correspondingly, the reported number of people injured was used regardless of whether victims required (prolonged) hospitalization or were treated on-site.Similarly, crowd size was taken regardless of whether it is used to report the number of people in a wide open space or confined in a building or an enclosed structure.Other than the numbers relative to the victims, date, country and location (latitude, longitude) were also extracted for each event.In addition, each accident was labeled based on the purpose of gathering to understand the nature of the event, e.g.sport event, religious gathering, etc.Finally, income level relative to the country where the accident occurred was obtained through the categorization of the World Bank (The World Bank Group, 2020a).
The final result is a dataset including 281 events from 1900 to 2019 and based on more than 800 individual sources.Especially due to the large number of sources (and their nature: many are internet pages) we decided to provide both the dataset and the sources in an electronic format, available at Feliciani (2023).
It is of course impossible to determine whether the dataset contains all accidents which occurred in the surveyed period and we will try to address this issue later on.But it is nonetheless possible to state with a certain confidence that it probably contains all accidents which are accessible through the internet using mostly English as the main working language (although other languages were also used).

Analytical approach
Before discussing the analysis of the collected dataset it is important to make a few consideration on the methods used.Because crowd accidents are rather rare, trends can only be observed when long time periods are used.In this work, we will mostly consider decades and summarize all data relative to each decade from the start of the 20th century.Technically, any other time period could be also used with its choice influencing the final results.Yet, we believe that decades represent a reasonable choice given a number of considerations.Several statistical datasets relative to population or macroeconomic indicators are (or were) released with a 5-years interval, thus making comparison possible only for large time periods.Also, data relative to 2020 and 2021 may be difficult to interpret given the restrictions imposed by the COVID-19 pandemic, thus making data collection only reliable until 2019.If decades are used the whole time period from 1900 could be divided into 12 intervals, making analysis clear and complete.Finally, one should consider that crowd accidents have remarkably increased over the last century.So, if a time period of 20 years would be more appropriate for the beginning of the 20th century, a 5-years period is accurate to describe trends over the last 50 years (and will be used in part).To conclude, we believe that the division into decades represents a solid approach to describe trends in crowd accidents and we will present the results accordingly.

Analysis of press reports
The results of this study are presented in this section.We will follow a logical structure, starting from a generic analysis presenting trends over large periods of time at the global level, and later focus on more specific aspects such as the purpose of gathering, macroeconomic factors and population density.Finally, we will address the question on whether the number of crowd accidents is actually increasing or whether this could be related to a reporting bias created by the easiness in information retrieval due to technological changes.

Historical trend and statistical analysis
At first, the most important analysis concerns the trend relative to accident frequency and the related fatalities and injuries since this allows getting a general picture of trends in crowd accidents.To this scope, the dataset was divided into decades (the reason for this choice was already explained earlier) and numbers relative to each indicator computed, with the results shown in Fig. 3.
The number of crowd accidents per year has been generally on the rise over the last 120 years with even faster growth over the last decades.However, it is interesting to note that although the number of accidents has an almost monotonic increase (especially over the last 50 years), a different trend is seen for the number of fatalities.Explaining this trend is not straightforward because associated events are generally uncorrelated both geographically and in terms of attributes.For instance, the increase in the 40 s can be associated with two large accidents occurring during the World War II when people rushed into improvised underground shelters during air raids. 1 The peak of the 50 s is mostly associated with a large accident occurring in India during the Kumbh Mela religious event, with up to 800 people reportedly being killed.Another large accident occurred in a shrine in Japan (124 fatalities), while people got reportedly killed when a huge crowd attended Stalin's funeral in 1953 (although the number of fatalities is unclear; with the highest reported figure being 109).Finally, it is remarkable to note that almost all fatalities relative to the 60 s are relative to soccer matches (three events occurring in Peru, Turkey and Argentina accounts for the almost totality of the victims during that decade).Further, it is worth mentioning that almost all accidents of the 70 s are also largely due to sporting events, but fatalities are lower (despite the increase in frequency).This could be related to the introduction of stricter regulations for stadium design and crowd management.For instance, the first edition of the ''Green Guide'', addressing safety in sport venues in the UK, appeared in 1973 (De Quidt andThorburn, 1998), showing an increasing awareness toward crowd safety around that period.
Nonetheless, the limited amount of information available before the 70 s does not allow a systematic analysis and, as such, a discussion is only possible based on the total numbers.On the other side, after the 80 s, more than 20 accidents were reported in each decade, thus allowing a simple but yet significant statistical analysis.We therefore considered more in detail the period between 1975 and 2019 and divided it into bins of 5 years used to compute simple statistics.
Results presented in Fig. 4 show that it is possible to identify a trend in relation to the typical number of fatalities associated with accidents occurring in different time periods.For instance, the size of accidents (in terms of median number of victims) has increased from 1975 until the 90 s and later a decreasing trend is observed.This observation can be associated to two hypotheses: (1) a real transition between rare large accidents to frequent small ones (refer also to Fig. 3 for the frequency) and/or (2) a change in communication technology making it possible to easily collect information about small accidents using internet and online (social) media.In regard to the latter hypothesis we should remind readers that until the 90 s, media were not organized to target individuals and thus television and newspapers typically did not report about minor remote events, thus limiting news to large tragedies.We can therefore speculate that information about small accidents are more likely to appear now when internet searches and online sharing is possible.The potential bias caused by the change in media reporting, along with other bias potentially affecting our analysis, are discussed in detail in Appendix C.
Next, we wish to take also crowd size into account and discuss whether that plays a role during accidents.The motivation is to investigate whether larger crowd sizes are associated with higher risk of injury or death posed to individual people.To this purpose, we computed what we label as the fatality ratio, i.e. the number of people killed divided by the respective crowd size.For example, if 10 people get killed during an accident occurring in a crowd of one hundred thousands, then the fatality ratio will be 10 −4 .Similarly, the injury ratio is defined by dividing the number of people injured by the crowd size for each accident.Only periods with more than 5 accidents were considered here.Median value is given as a red line inside the box showing the 25th and 75th percentiles on the bottom and the top, respectively.Minimum and maximum are represented by the extrema of the whiskers with outliers represented using a red cross.In general, from the end of the 20th century accidents resulting in a small number of fatalities are getting more frequent.
Considering the large variation in the reported numbers and also the fact that crowd size is often estimated in terms of magnitude, all results (shown in Figs.5(a) and 5(b)) are presented using a logarithmic scale.
Results for fatality and injury ratio show an evident trend, namely that the larger the crowd the less likely is that a single person may get hurt in case of an accident.On the one hand, this result can be seen as a logical consequence of the fact that the number of victims must be always smaller than the size of the crowd.So, thousands of victims are only possible in large events, when usually several thousands of people gather.But, on the other side, this artifact has also been known as ''safety in numbers'' (Elvik and Bjørnskau, 2017;Elvik and Goel, 2019) and has been also observed, for example, in relation to bicycle or pedestrian traffic where the ratio of accidents to the number of road users typically decreases as more users of that mode are on the streets.
This result should not be mistaken with a causal relation between crowd size and number of victims or even its ratio.In fact, only events resulting in an accident are considered here, so there is no way to conclude whether events having larger crowds are statistically safer or not.It is only possible to state that, should an accident occur, the probability of being involved among all participants are lower for larger events.There is no causal evidence to conclude that accidents are more or less likely to happen in larger crowds.
Finally, we wish to conclude the statistical analysis by comparing the number of people injured and killed in each single accident.The comparison is presented in Fig. 5(c) where the diagonal is used to indicate equality between both quantities.As clearly shown, the number of people injured is typically larger than those killed, with most dots appearing in the top-left side of the graph.Specifically, in the dataset examined, 71.6% of the accidents reported an higher number of people injured compared to the fatalities. 2

Purpose of gathering and event's type
In this section, we will consider crowd accidents from an historical perspective by also taking into account their context.More specifically, we will distinguish each accident based on the event's type, or, more in general, the gathering purpose.For brevity, we will present the results without discussing definitions used for each event's type.Labels used should nonetheless be self-explanatory and readers are addressed to Appendix B for details.Results showing the share for different purposes of gathering based on accident frequency, number of fatalities and people injured are shown in Fig. 6.
2 Accidents having only injuries were excluded in this calculation.Comparison between fatality and injury ratio, defined as the number of people killed or injured to the total size of the crowd.(c) Comparison between the number of people injured and killed for each accident.The diagonal line is added to show the asymmetry between both values indicating that, on average, more people are likely to get injured compared to the number of fatalities.Fig. 6.Share in the number of accidents and relative fatalities/injuries depending on the purpose of gathering.Only the four most common events are considered in each graph and the rest are summarized as ''Other/Unknown''.Labels are ordered in descending order based on the total relative to each quantity.Only accidents occurring after 1980 are considered due to the limited data available for the previous time period.
Regardless on the quantity considered it is possible to observe that accidents relative to sport events (typically soccer matches) are declining in proportion (absolute number is fairly constant).The shift is particularly strong in the share of fatalities, with sport events now playing a very marginal role.On the other side, accidents are on the rise in religious events, especially when fatalities or people injured are considered.This may be caused by a variety of factors.As we will see later, many of these accidents occurred in India, where the population is quickly rising thus making crowd management a challenge.But the increasing trend in religions events may be also understood when compared with the decline observed for sport events.Stadia are now often designed taking pedestrian traffic into consideration and crowd management is built upon experience gained by staff on-site.The same approach is difficult for religious facilities which are often old buildings difficult to modify and where events are held on an irregular basis with dispatched staff possibly unfamiliar with the location.
Except for the opposite and rather clear trend shown for sport and religious events, it is hard to find any other clear trend.Accidents occurring during entertainment events (mostly concerts) are also quite common, although their proportion is rather stable regardless of the indicator considered.A slight increase is observed in giveaway events, but only observable in terms of frequency because this kind of gathering usually attracts a limited number of people.The slight increase can be explained by a number of accidents occurring in Islamic areas during Ramadan, when donation plays an important role and wealthy people sometimes setup improvised giveaway events with little or no crowd control.But, as we will see next, such kind of accidents are more generally on the rise in developing countries where economic disparities are widening and urban population exploding.

Geographical and macroeconomic factors
In this section, we will take a closer look to the data relative to each accident and consider the geographical and macro-economical context where they occurred.It is well known that traffic-related accidents are more common in developing countries (Jarawan et al., 2004;Feliciani et al., 2020;Haghani et al., 2022) and it is therefore possible to expect that a similar trend should appear for crowd accidents, which are closely related to the management of pedestrian traffic.For a first-hand evaluation it may be therefore useful to check location and year of each accident on a map, as presented in Fig. 7.
C. Feliciani et al. Fig. 7. Crowd accident location, year of occurrence and number of fatalities.To simplify the visualization the considered period has been divided into five colors considering the number of accidents occurring in each relative period (i.e. a smaller time period was used for recent events).As the map shows, recent accidents mostly occurred in India and in West Africa, although crowd accidents have occurred on all continents (if the almost uninhabited Antarctica is excluded).
The map of Fig. 7 shows that a lot of recent accidents occurred in India and West Africa, which are rapidly developing regions, with a quick increase in population and where infrastructure is struggling to keep pace with the inflow of people from rural to urban areas.Northern India, in particular, is a densely populated area with solid religious traditions leading people to gather in millions over short period of times.Almost 70% of the accidents (33 out of a total of 48), which occurred in India between 2000 and 2019 were related to religious events.Many of those accidents occurred close to rivers or in areas close to water because bathing takes an important role in Hindu rituals.Accidents have frequently occurred on bridges (which act as a bottleneck), at ferry terminals (basically a dead end) or on riverbanks (where people enter the water to later reverse their direction, thus creating a complex and conflicting motion pattern).But train stations or transportation terminals have been also often the theater for disasters in India.
A large number of accidents can be also seen in the UK, 3 especially in the period between 1900 and 1979, which account for almost 40% of the accidents reported worldwide.Almost all of them were related to soccer matches in a period when spectators were allowed to stand and ''simple'' surges (such for a goal) would often result in accidents.From 1994, all clubs in the English Premier League and Championship have been required to provide all-seated accommodation 4 and the Hillsborough accident of 1989 resulted in the revision of stadium design and management (Woodhouse, 2021).As a result of these changes in policy regulation, crowd accidents in soccer games in the UK have not been reported over the last 30 years.
A similar trend can be seen in China at the turn of the century.A total of 10 out of 13 accidents in China for the period between 2000 and 2010 occurred in schools, often in staircases.Although another two accidents were reported in schools between 2011 and 2019, the number is significantly lower, possibly hinting to an improvement of safety in schools' design, although there is no documentation hinting on a casual relationship between regulations and safety improvements in this case.
But the examples enlisted above also show some bias potentially affecting this study: language (English was mainly used while searching for accidents) and public awareness (the public discussion generated 3 Overlapping of the dots make visualization difficult because several accidents occurred at the very same location, such as the Ibrox Stadium in Glasgow. 4 Starting January 2022 standing was allowed again in a limited number of stadiums under particular conditions (Woodhouse, 2021).from a large accident may led to the resurgence of almost forgotten events).These bias, along with countermeasures taken in this work, are discussed in detail in Appendix C.
Nonetheless, despite the limitations listed above, we believe that the collected dataset allow a solid investigation of trends, especially under a macroscopic perspective.We will therefore consider macroeconomic factors, which, as already explained, are likely to be correlated with trends observed in crowd accidents.Fig. 8 presents statistics for crowd accidents by taking into consideration the income group of each country in the year when the accident occurred (income group classification by the World Bank is used (The World Bank Group, 2020a)).
When accident frequency is analyzed, an increase in the share relative to lower-middle income countries can be observed (with the exception of the 1990-1995 period, which, however, contains ''only'' 12 accidents).A qualitatively similar, yet less clear, trend is also observed for fatalities, although the 2015 accident in Saudi Arabia (over 2'000 people were reportedly killed in a single accident) contributes in biasing the result relative to the latest time period.Data relative to people injured does not show a particular trend, but confirms that countries belonging to the lower-middle income groups are most typically associated with crowd accidents.
The recent increase of accidents in countries with a lower-middle income can be also observed in the map of Fig. 7, where, in addition to India, another region having a large number of accidents over the last 20 years is West Africa.Crowd accidents during soccer games have been also common in African countries.In particular, there have been a number of accidents in relation to games played by the national teams for the African and World Cup (Zambia, 2007;Liberia, 2008;Ivory Coast, 2009 andSouth Africa, 2010).But, more importantly, the region of West Africa has experienced a rapid increase of population over the last decades, especially in urban areas.
The rise of population and, in particular, the concentration in urban areas can indeed be considered a factor that can increase the probability of crowd accidents.Although it takes a very high density of people in a defined space to create a critical situation, such condition can happen easier in cities where large crowds move and gather on a regular basis.To allow a more critical discussion, Fig. 9 compares the location of the accidents between 2000 and 2019 with population density.It becomes evident that many accidents occurred in Northern India, which is the most densely populated area of the world.Nonetheless, it should be also clear that crowd accidents may occur also in remote areas, where events can also gather a large number of people.For example, an accident occurred in Morocco in 2017 in a remote village having a population of 8'000 (thus smaller than many soccer stadia) and 15 people lost their lives.In this map, the American continent has been excluded to focus on the locations where crowd accidents have been more frequent in the last 20 years (see also Fig. 7).Accidents are more common in densely populated areas (India and West Africa in particular) although remote areas are not excluded.

Critical assessment on the trend of crowd accidents
From the results presented above it should be clear that the rise of population can play a role in the increase of accidents.But it is also necessary to remind readers that crowds events are quite rare and especially small ones do not attract attention from the media.It is, therefore, possible that many accidents occurring in the first half of the 20th century, when global communication systems were still not fully developed, were only reported by local newspapers and did not emerge in our searches.Thus, in this final section, we wish to test this hypothesis and try to answer the question on whether crowd accidents are actually on the rise or not.
At this scope, different indicators were chosen to represent both the trend in population and the ''availability of information''.To describe the former trend the total world population and the urban population were used.Urban population allow to check whether the distribution of population over the globe (and not solely the total) could possibly play a role in making crowd accidents more common.Finding an indicator to represent the capability to retrieve information over a period of 120 years was, on the other side, a much more difficult task.To the best knowledge of the authors, such indicator is not available or not in a form simple enough to be used here.We, therefore, decided to focus on two datasets which should somehow describe the change in communication over the last 120 years.One is the worldwide number of post shipments.This number is the result of many factors: transportation network (from ships to jets), writing technology (handwriting, typewriter, PC, etc.) and ultimately also population.The other indicator is the number of paper published per year.Although post shipment decreased after the introduction of e-mails, scientific publishing has adapted to the technological changes.What used to be printed on paper and sent across the globe for review is now digitally available and shared.Similarly, if one had to go to the library to check for the latest publications, nowadays, the same is done online.For this reason, the number of scientific papers published worldwide can somehow describe the easiness in accessing to information, while, again, also accounting for the increase in population.
The four indicators were compared with both the frequency of crowd accidents and the total number of victims for each decade, with the former being the main goal to check the hypothesis presented earlier.The correlation with fatalities is mostly given for reference and comparison, with frequency being our main focus.Results are presented in Table 1.Generally, both population and informationrelated indicators can depict quite well the trend in crowd accidents.However, it is interesting to notice that population-based indicators are better in explaining the trend in fatalities, possibly because when (urban) population is larger gatherings tend to attract more people and result in more victims in case of accident,5 .
But, from here on, we will focus only on accident frequency.In that regard, the number of paper published fits well with the number of crowd accidents per decade.Post shipments also achieve a good correlation coefficient, but only when the period until 2000 is considered; because the later widespread use of e-mail dramatically slowed down the increase of post shipments.We can therefore conclude that (1) post shipment is possibly not a valid indicator of information sharing and (2) the number of scientific publications is possibly a better quantity taking into consideration the change of information sharing technology over time.
Given the discussion above, we will therefore consider the number of scientific publications as the best indicator for information availability and conduct a more detailed analysis.The graphs in Fig. 10 compare both quantities for the whole period of our dataset (120 years) and the last 30 years.The graph of Fig. 10(a), where decades are used, show that the exponential increase in crowd accidents correspond to the same trend seen for scientific publications, hinting at the fact that a larger number of people on the planet sharing information more efficiently could have contributed to the rise of crowd accidents being reported.
When the last 30 years are analyzed using a smaller bin size6 (five years), it is possible to note that the number of papers published stops increasing exponentially and crowd accidents declined over the last 10 years.When the preliminary data regarding deadly accidents between 2020 and 2022 are included (20 accidents reported at the moment of writing, 10 in 2022 alone (Wikipedia, 2023)), we can venture that the figure for the current five years period is not going to represent an exponential increase compared to the previous periods,7 although restrictions which applied during the COVID-19 pandemic are hard to take into account.
The discussion above is intended to hint at the observation that the number of reported crowd accidents likely increased over the last century due to a mixture of a number of factors: total and urban population, technology used for information sharing and, ultimately, because such accidents have been indeed on the rise.However, the number of reported accidents has been fairly steady after the internet and SNS became spread all over the world (roughly after 2010) and the global population increased by ''only'' 10% in the last 10 years.Also considering that crowd accidents on a global scale are not anymore rare events (an accident is reported every 1-2 months) thus making viable to consider trends over smaller periods, we can therefore conclude that although it is not possible to say whether there is a slow increase or the trend is steady, we can definitely exclude an on-going exponential increase.

Discussion
A study like the one presented here does not come without limitations and in this section we will briefly discuss them to get a better understanding on the impact they could have on results (a more structured analysis on biases is also presented in Appendix C).
An important aspect to consider is that the source of information is represented by press reports compiled by people not necessarily experts on crowds and possibly employing different methods to estimate crowd size.The issue was partially overcome by taking the larger figure available, ensuring that if there is a conflict between different parties in minimizing/maximizing a number, the effect on the results is minimal, since criteria employed are consistent through the whole dataset.However, there are also cultural/political aspects to consider in reporting sensitive events like crowd accidents.Cultural background influences how an accident is reported, some cultures taking a dramatic and/or exaggerated approach others minimizing potential uneasiness among readers.Also, some political regimes are more likely to systematically reduce numbers of fatalities and, if possible, hide accidents, thus making the ''highest-value approach'' biased (for instance, there is no confirmation about the crush which occurred during Stalin's funeral and the number of victims remain a mystery).Nonetheless, we believe that considering the wide distribution of accidents over the globe and the large time period employed, the errors created by partially biased sources would compensate and are acceptable in the frame of the analysis performed here.
Another aspect to discuss concerns the method used to determine whether the trend in crowd accidents is apparent or real.The number of scientific publications was used to perform a comparison given the absence of an ad hoc indicator on ''information availability''.The high correlation coefficient obtained should not be mistaken as a way to validate the method, it simply tells that both trends are similar.However, we should remind readers that the goal was to identify possible factors contributing to the increase of accidents to monitor future trends.Our analysis hints that population and information sharing both contributed.But, although information sharing has reached a speed and a geographical penetration which will likely not continue to increase exponentially, the population is expected to keep growing at a fast pace over the next few decades.In such a context, would the number of crowd accidents stay constant, we can imply that proper measures are being taken to ensure safety during mass events.This is possibly a safe statement, despite the several limitations.The data presented here could allow more detailed predictions, but those should be seen as inaccurate speculations.On the other hand, should the number of crowd accidents increase, it may be hard to speculate whether the rise is in line with past trends or not.But even in this case, the methods presented here could help make a more systematic and critical analysis.

Conclusions
In this work, information about crowd accidents occurring from 1900 to 2019 was collected into a dataset, which was later analyzed.For each event date, country, location, number of fatalities, people injured and estimated crowd size were extracted.In addition, purpose of gathering and income group of the specific country were obtained to enable a more specific analysis.
The total number of accidents per decade was found to be steadily rising for the whole time period, with fatalities showing a first peak in the 50 s and later quickly increase from the 80 s.In general, ''small'' accidents (less than 10 fatalities) are on the rise, with big accident still occurring, despite being statistically less frequent (in proportion).Further, statistical analysis showed that when ratios are considered, by dividing fatalities and the number of people injured by crowd size, a decreasing trend is observed in relation to crowd size.This result is similar to what already reported in other contexts: what has also been dubbed as ''safety in numbers''.
When specifics of each accident are considered we noted a reverse trend between the ratio of sport-and religion-related accidents, with the first type in decline and the second on the rise.On the other hand, accidents are on the rise in lower-middle income group countries, especially in areas having a high population density.Northern India and West Africa are the areas where crowd accidents have been more common over the last decades, with accident on the American continent being more limited in number.
Finally, we provided some initial evidence to conclude that the sharp rise in crowd accidents observed over the last century is likely only partially related to a real increase.Technological advances have made information sharing more effective and made it easier to collect information about events previously not enough dramatic to focus the worldwide attention of media.The frequency of crowd accidents has stopped rising exponentially and the number have been fairly stable over the last years, despite the almost global reach of internet.
The conclusions of this study hint on the fact that it is now possible to monitor crowd accidents on a steadily basis to determine whether regulations in force are sufficient or more strict regulations are needed to safety of crowd events.In this sense, we hope that the lessons learned in the UK, where crowd accidents used to be common in the past and led to better practices, could be applied on a global basis, although we are perfectly aware that many countries lack the financial needs for such improvements.
Nonetheless, because awareness is one of the first steps toward safety, we wish this work could help in promoting the consideration of pedestrian traffic when designing infrastructure accommodating large crowds or planning for mass events.
In an upcoming work, we are extending the current analyses and discussions on press reports by providing in-depth analyses of the lexical aspects of the reports.More specifically, we will present evidence as to how reporting has changed over time, how it varies across geographical areas as well as the sources (i.e., press, scientific article, or Wikipedia page).In doing so, we pay special attention to the controversial terms such as ''panic'' and ''stampede''.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data are shared through the zenodo platform with details provided at: https://doi.org/10.5281/zenodo.7523480 As already mentioned in the main text, data collection on past crowd accidents has started by considering already existing lists provided by several sources to later individually verify each event and gather more detailed information about it.In this work, we considered events which occurred between 1900 and 2019, thus covering a period of 12 complete decades.Some reports are available also for accidents occurring in the 19th century (e.g. the opening event of Brooklyn Bridge in New York in 1883), although material relative to that period is only partially digitized and often available only in the local language, thus making a systematic research on a global scale difficult and possibly very incomplete.
After the emergence of the internet, information started to be abundant and available in a digitally editable form (although some old newspaper reports are only available as images, thus requiring additional steps to convert them into analyzable text and numbers).In addition, automatic translation services allow to grasp relevant information also in languages for which someone does not have a formal understanding.When precise dynamics of accidents is sought or legal documents are to be translated through automatic services, details may turn out to be inaccurately translated, but, if number of fatalities, people injured and estimates for crowd size are sought, usually reliable data can be obtained through automatic translation tools.
In light of the above, we would like to stress out that our list might be incomplete for older accidents, since the original source might be reported only on a local newspaper available in the local language in the regional library archives.On the other side, newest accidents may be inaccurate in the description, as information rapidly circulate over the internet and information added by non-informed individuals may become mainstream as a number of people start sharing it.For instance, accidents occurring at stadiums named after an important day (e.g.''Estádio 4 de Janeiro'', translated into ''January 4th Stadium'') are often mistakenly reported in different forms (confusing date and place in this example).
While considering the limitations presented above, the process to prepare a first-hand list of crowd accidents can be summarized as follows.
2. Later, the different lists were compared to create a final list which would drop duplicates (obviously, well-known accidents are typically contained in every sort of list covering crowd accidents, thus requiring a cross-check).3. Finally, each item of the list has been investigated individually to verify its authenticity by cross-checking between multiple sources or looking for pieces of evidence in trusted newspapers or official documents.4. If the accident was confirmed, we proceeded by extracting the relevant information provided in Appendix B.
The operation described above resulted in the creation of a database containing 281 accidents,8 which has been used as the main source for the analyses presented in this work.
For reasons listed below we refrained from providing the sources for each accident in the form of references of this work.Yet, the full list of our sources is available at Feliciani (2023) along with the dataset containing information for each accident.
• For each accident we relied on one of more sources, which would result in a list of references exceeding 800 items and taking a considerable amount of space.• Most of sources are internet pages: some of them were only available through internet archives and some are not accessible anymore.It is therefore highly likely that, even if a list of reference is given, many items could be incorrect or inaccessible in the near future.• We found that for most accidents a search based on date and country to which keywords like ''stampede'', ''crush'', ''disaster'' or ''tragedy'' are added would be sufficient to find some reports, thus making this approach more robust for changes over time.• We nonetheless kept an offline copy of all reports used in the analysis and based on which the dataset was created (reports additionally contain a description and images for each accidents).9

Inclusion criteria
Some of the accidents analyzed while preparing the final list would makes it questionable whether they should be regarded as ''crowd'' accidents or not and some presented too little details to allow a meaningful analysis.Therefore, to allow reproduction of our results (i.e.independently obtain a similar list) and clarify why some accidents were discarded while others were not, the inclusion criteria are given below.Each accident was kept in the list only when it satisfied all the criteria given as follows.
• Number of fatalities: Only accidents resulting in one or more fatalities have been included in the list.We focused on casualties caused by the crowd itself (i.e.death caused by asphyxiation or thoracic compression, typically observed in crowd accidents), or due to the motion of the crowd (i.e.people falling on top of others due to a wall collapsing under crowd pressure) but excluded those caused by weapons such as knives or guns (more on violence will be discussed below).
• Number of people injured: In addition to deadly accidents we also included those which resulted in more than 10 people getting injured.We did not differentiate on the severity of the injuries or whether people had to be taken to the hospital or were treated on-site.The criterion for the number of injuries simply relied on the figure reported by the media.For example, a statement like ''dozens of people required medical treatment'' would be sufficient to lead to the inclusion in the list.• Violence, smoke or non-crowd intrinsic causes: In some cases violence (sometimes caused by hooliganism) or smoke intoxication had been the main or only cause for fatalities.In this work, we are interested in accidents caused by a collective crowd motion which could have been potentially prevented by employing a different design or through a proper crowd management.When smoke is the main cause of death, usually the use of improper material in the construction is to blame.Similarly, kills caused by armed individuals have little to do with collective crowd motion and could have been prevented only by proper inspection by the police or investigations by counter-terrorism agencies.As such, accidents in which deaths were clearly related with fire or violence have been excluded from the list.However, in case a crowd tried to escape from apparently violent individuals or from the threat of a fire and got deadly injured in the act of escaping (for example because a door was locked), then we consider this accident as caused by negligence in crowd management and thus was included in the list.• Structural failure: Quite a few accidents have occurred due to structural collapse.In this case, we tried to understand whether the collapse was caused by crowd pressure and whether the number of people had been higher than the maximum allowed or whether it was the result of a constructional failure (Bruno and Corbetta, 2017).If the second case is true, then, the only way to prevent the accident would have been by proper structural engineering calculations and therefore the accident falls outside the scope of this work.But in the first case, failure to limit the number of people or to estimate the pressure exerted by the crowd could be said to be the main cause of the collapse and, as such, is considered a crowd accident.• Reliable source of information: We only retained in the list those accidents for which we had a reliable source of information, such as news articles from press agencies (Reuters, Associated Press, etc.) or established newspapers (BBC, New York Times, etc.).In addition, we also considered as reliable those reports that had similar pieces of evidence from apparently independent sources.For example, in the case of soccer accidents, official internet pages from clubs or fanclubs typically provide quite accurate information, sometimes also presenting images or articles form their archives.• Language used: We mainly considered reports provided in English language or those in a language for which the authors are fluent.In a few cases, machine translation had been used, but we nonetheless discarded those for which translation would be too inaccurate and usually asked to a colleague/friend familiar in that language to confirm for important details.Nonetheless, considering that information analyzed here is generally insensitive to grammar or writing style even not semantically proper translation should not be considered an issue (e.g.number of fatalities is typically translated correctly regardless on the translation tool used).• Amount of information available: Finally we discarded those (few) reports that, although confirmed, did contain too little information to be useful in the analysis.For example, in case a report mentioned about an accident occurring in a shopping mall with people hurt, regardless on whether the same information had been independently provided elsewhere with some pictures, little can be said on the gravity or the size of the crowd.
Obviously, most if not all of the criteria listed above cannot be judged with complete certainty.For example, there is no systematic method to determine whether an accident should be excluded as smoke played an important role or it was not dense enough to be considered as a fire accident.Nonetheless, we believe that methods used in the analysis and countermeasures explained in Appendix C should have helped minimizing issues related to potentially missing information.

Appendix B
In this appendix, the type of information extracted for each accident and later used in the analysis is explained in details.Except for a few incontestable facts (such as the date of the accident) there are many information which are not uniquely determined and a discussion on the approach taken in this work is needed.

Date and country
Date and country are usually easy to find and all sources almost always agree on these aspects.It is only probably worth mentioning that classification of the country is based on the historical context (e.g.USSR was used for accidents occurring during the Soviet era).In determining the date, the time of the accident is taken as reference, so accidents occurring during nightlong events are classified based on the moment when the accident occurred (before or after midnight).

Location (latitude/longitude)
For each accident, we tried to obtain the geographical position to facilitate the creation of maps and consider individual events occurring within the same country.The task has been relatively easy for accidents which occurred in the past few years in structures still existing or in case of historical buildings which have not been moved from the original location (e.g.Hillsborough Stadium, Bethnal Green tube station, etc.).However, in some cases, either the information provided was approximate in geographical terms (e.g.''a theater on Baghdad's outskirts'') or the building did not exist anymore (e.g.Shiloh Baptist Church, USA).In those cases, we provide a location representative of the accident, also considering the analysis is performed on a global scale and even an error of a few dozens or even hundreds of kilometers does not affect the results.

Number of fatalities and people injured
For some accidents, even the number of fatalities is not easy to retrieve or it is not clear which figure should be taken among the ones provided by the different sources.This is sometimes related to political reasons, especially when governments do not want to release the real number of fatalities or the disclosed figure is believed to be too low by external sources.In this instances the discrepancy between the official and unofficial report can be quite large, like for the 2015 accident in Mina (Saudi Arabia) when officially 769 people lost their lives but most media report over 2'000 fatalities (Wikipedia, 2021a).
In addition, the number of fatalities is sometimes difficult to determine since victims of crowd accidents often pass away after days, weeks or even years, thus making it necessary to determine the point in time to which a report is referring to.For example, in the accident which occurred in Turin (Italy) in 2017 three people lost their lives.But none of them were killed on the location of accident, with the first victim passing away after two weeks in hospital and the remaining two losing their lives after 1.5 and 2.5 year spent in the hospital battling for their lives (Wikipedia, 2021b).
To be consistent in the scientific method while paying respect to the people who lost their lives in the tragic accidents and remain neutral toward political positions, in this work, the highest number of reported fatalities is taken for a given accident.It should be remarked that disputed cases are relatively small and therefore, even a different approach would had little influence on the results, but such a radical approach is needed if a systematic analysis is sought in which each accident is treated equally.The same approach taken for fatalities has been taken for the number of people injured.As already mentioned in the main text, in extracting the number of injured we did not distinguish between light and traumatic injuries and considered only the total number, regardless whether hospitalization was need or injuries could be treated on-site.
To conclude the discussion on the number of fatalities and people injured, we would like to stress once again that figures used here are to be considered as an estimate of the magnitude of an accident and we are not trying to make claims on the validity of each individual value.Long period of time and statistical methods are used in the analysis, thus minimizing the numerical importance of a single accident.Nonetheless numbers are needed if trends are to be studied and in that sense a systematic and consistent method had to be chosen.

Crowd size
Although uniquely determine the above discussed numbers for crowd accidents can be already a challenging task, crowd size is arguably the most variable among the sources.For non-confined structures, estimation of crowd size can be a challenging task also from a technical point of view and it is therefore not surprising that estimates vary from source to source.However, even the number of attendees is also often reported with remarkable differences.Especially when accidents occur (which is what is being investigated here) organizers tend to report numbers below or equal to capacity to deny any wrongdoing and media often blame overcapacity as the main cause, reporting numbers well above the maximum allowed.
For this reason, although caution is generally needed for all numbers on which this analysis is based, crowd size require a particular attention.Nonetheless, even here, we decided to use the highest reported value, mainly because it is usually correct enough to present the magnitude of the crowd.For example, regardless on whether the estimate is correct or not, the half-a-million people which reportedly gathered to see the Pope in Zaire in 1980 surely represented a huge crowd.Similarly, regardless on whether 150'000 or 100'000 is the correct number, a large number of people could be said to have attended the Fonte Nova Stadium reopening in 1971 in Brazil.In terms of magnitude, there is no doubt that both crowds were much bigger than, for example, the 2'000 shoppers which were involved in an accident at a store in the USA in 2008 and this regardless on the accuracy of the figures reported.

Purpose of gathering
Finally, each accident had been categorized based on the purpose of gathering, i.e. the reason people had to be in the given place at the given time.Inspired by the work of Asgary (2023), we considered nine categories which are given as follows.
• Religious: People visiting a temple, a mosque, a church or a sacred place in the frame of a religious tradition.Typically this sort of events are held following a calendar which will result in having varying importance for the same event in different years.• Sport: Events where people gather to show a match or a sport competition.Although generally falling in the category of ''entertainment'' discussed next, sporting events deserve being considered separately as they historically represent a particular case and the match result could have a dramatic role in determining crowd motion (especially in soccer games).• Entertainment: Gatherings whose main purpose is to get entertained by the event, like music concerts or firework displays.As already stated, sport events are considered separately.• Giveaway: This type of gathering is characterized by the distribution of free goods.Reason for holding such an event could be different: provision of emergency reliefs after natural disasters, free food to impoverished people or gift-giving in the frame of a religious tradition (such as during the Ramadan for Muslims).
• Educational: This category includes accidents which occurred in educational facilities such as schools or universities during or between regular classes.This sort of accidents have been particularly common in China around the turn at the century.If an event, like a concert, is hosted in a school it will be categorized according to the type of event.
• Political: Gatherings in which people assemble for political reasons such as rallies or valedictory events.In case a religions leader is the reason for people to gather, the event is considered mainly of religions nature.
• Transportation: This includes accidents which occurred at transportation facilities during normal operation.Accidents which occurred for example at train stations in the frame of a particular event are considered according to the nature of the event itself.For example, if an accident occurs in a train station as people move in mass to a sacred place, this will be considered a religious gathering, despite the accident occurring in a transportation facility.• Application: Although not frequent, some important accidents occurred when people attempted to apply for a job or enter a facility where applications would be processed.Several accidents occurred in different cities in Nigeria in 2014 as tests were scheduled for job-seekers applying for public positions.• Shopping: Finally, accidents which occurred at events such as bargain sales or due to discounted items are considered in this category.If items were given for free it is considered a ''giveaway'' and counted accordingly.
Again, also in this case, some accidents had not a uniquely identified purpose of gathering and it could be argued that a different category could have been used.In this regard, this should be seen as a proposed categorization and alternative approaches are clearly possible.

Conversion of word expressions into numbers
Excluding the number of fatalities, which is usually reported in precise values, several expressions are used to report the number of people injured and crowd size.Considering that our analyses focus on decades and thus total values rather high, it is therefore fairly accurate to translate each expression into specific values which are needed in the calculations.
Table 2 reports the conversions used in this study.We generally tried to follow a logical order taking into account that overestimating expressions for small quantities will have a limited influence when total are sought, while large numbers may play a bigger role.The conversion system is also based on the experience of the authors while reading all press reports, noting that, when reported, numbers with several digits have a lower first digit compared to numbers smaller in magnitude.In other words, it is more common to find expressions like ''700 people'' than ''700'000 people''; in the latter case ''almost one million'' would be more commonly used instead.Finally, we should remark that expressions such as ''close to '', ''over'', ''exceeding'', etc. have been neglected and only the words relating to numbers were used.

Table 3
Potential biases to be considered in regard to the work presented here and countermeasures taken to limit their effect on the results.

Bias
Background and potential effect on the results

Countermeasure taken
Numbers (in general) differ among media reports Disputes over the number of people killed, but more commonly, over crowd size are not rare in media reports, especially when an accident could have political outcomes.Consequently, results presented in this work will depend on the numbers selected for each accident and the criteria used to determine the most ''reliable'' value.If the most reliable value is chosen, selection may be influenced by the way an accident is reported and individual background of the person taking the decision (e.g. one may be more likely to trust a known institution compared to another heard for the first time).
Highest value reported for each accident was taken, thus assuming that there will always be someone trying to ''overestimate''.Also, the analysis performed here is comparative and takes long time periods into consideration, so potential biases on this aspect are likely to get canceled-out.
Definition for (legal) death is not universal and the number tend to rise over time Legal definition of death may change from country to country and may not even be consistent within a single nation (Lewis et al., 2017).Also, people suffering severe injuries during crowd accidents may pass away after long permanence in hospital, thus making the death toll slightly rising over time.As such, news reported just after an accident tend to report lower fatalities compared to press reports published several years later.
The highest value is taken to ensure a better comparison among sources.Also, the increase in reported fatalities tend to be limited to the first days following an accident and we always tried to check also reports published weeks, months or years after occurrence to ensure the number is more accurate.
Injuries are typically provided in rough estimate and in different forms among media Some press reports describe as fatalities those requiring treatment in hospital while other sources simply provide an estimate of those who got injured on-site.Thus, definition is not always consistent and, in addition, numbers for fatalities are rarely reported in values but often as word expressions, e.g.dozens, hundreds, etc.
Again, the highest value was used to employ a consistent approach and we also tried to focus our discussion mostly on fatalities and frequency of accidents, spending only a few words on the trend regarding injuries.Also, we employed a solid approach to convert word expressions into numbers also considering the general context in which are used.
Crowd size is often provided as a very rough estimate Crowd size is without doubt the most difficult number to estimate in regard to crowds.Even for stadia with a limited number of seats it is difficult to provide an estimate if individuals entered without a valid ticket and sometimes the crowd outside the stadium need also to be accounted for.This very rough estimate makes it questionable whether crowd size can be considered as a valid indicator.
In addition to using the highest value among sources, when using crowd size, logarithmic scale was used.This drastically reduces potential issues related to reporting bias.Some type of accidents may be over-reported compared to other types if they start being the focus of press/researchers (for example for soccer accidents in UK) The soccer accidents reported in the UK and those occurring in schools in China show that, when a number of similar accidents occur in a specific area, media and academics are more likely to focus on the topic consequently resulting in the emergence of minor events previously widely uncovered.A number of lists can be found for soccer-related accidents and many scholars in the UK investigated the topic of crowd accidents following the tragedy of Hillsborough.It is therefore likely that should similar events occur in a delimited geographical area over a short time period, previous almost forgotten reports could reemerge in the press.
We employed a strategy to look for details on each accident which also tried to promote the emergence of similar accidents in the search process.In fact, local news reports often described similar accidents which occurred in the area and were not in our list.But this only occurred in the early stages of research.The fact that no new accidents emerged despite the discovery of new lists should convince of the completeness of our dataset.
Research was mostly limited to English with only few other languages considered Although the authors also performed searches in non-English language (specifically Chinese, French, German, Italian, Japanese and Spanish), English was the primary working language and therefore accidents occurring in English-speaking areas are possibly over-reported compared to other languages (for instance both India and UK are countries where English is a commonly used language).Also, machine learning was used in a few cases and its accuracy may be questionable.
English is a language spoken by almost 1 billion people, in addition search was also done in Chinese (another billion speakers) and partially in Spanish (half-a-billion), French (more than 200 million speakers) (Julian, 2020) and Japanese (more than 100 million speakers).The total number accounts for one third of the world population.In addition, many countries have state press agency reporting in English (e.g.Al Jazeera, Xinhua) thus making important news also accessible to people not reading the local language.Thus, we believe that the language issue was sufficiency tackled and should play a limited role for recent press reports.
also taking an analytical approach minimizing potential issues.Biases, potential effect on the results and countermeasures taken are presented in a schematic way in Table 3.

Fig. 1 .
Fig. 1.Flowchart of the process used to create the dataset on crowd accidents analyzed in this work.

Fig. 2 .
Fig. 2. Information extracted relative to each crowd accident enlisted in the dataset.

Fig. 3 .
Fig. 3.Total number of crowd accidents and related number of fatalities and people injured for each decade from the beginning of the 20th century until the last complete decade (i.e.2019 included).Number of accidents shows a steady increase, while a peak around the 50 s is seen for fatalities.

Fig. 4 .
Fig.4.Distribution of the number of fatalities for accidents occurred over a time period of 5 years.Only periods with more than 5 accidents were considered here.Median value is given as a red line inside the box showing the 25th and 75th percentiles on the bottom and the top, respectively.Minimum and maximum are represented by the extrema of the whiskers with outliers represented using a red cross.In general, from the end of the 20th century accidents resulting in a small number of fatalities are getting more frequent.

Fig. 5 .
Fig. 5. (a) and (b): Comparison between fatality and injury ratio, defined as the number of people killed or injured to the total size of the crowd.(c) Comparison between the number of people injured and killed for each accident.The diagonal line is added to show the asymmetry between both values indicating that, on average, more people are likely to get injured compared to the number of fatalities.

Fig. 8 .
Fig. 8. Share in the number of accidents, fatalities and people injured by the income group of the countries where accidents occurred.The vast majority of the accidents of the last 35 years occurred in countries with limited financial resources while the share between low-income and high-income countries has been fairly constant.Results are presented starting from 1990 since income group data were not available earlier.

Fig. 9 .
Fig. 9. Location of crowd accidents (red dots) from 2000 to 2019 compared with population density (relative to 2018 (Center for International Earth Science Information Network -CIESIN -Columbia University, 2018)).In this map, the American continent has been excluded to focus on the locations where crowd accidents have been more frequent in the last 20 years (see also Fig.7).Accidents are more common in densely populated areas (India and West Africa in particular) although remote areas are not excluded.

Fig. 10 .
Fig. 10.Comparison between the number of paper published and the number of crowd accidents over the same time period.

Table 1
Correlation coefficient between indicators showing the change in world population and information sharing.Each indicator is compared with both the number of crowd accidents and the number of fatalities by decade.Data collected by the Universal Postal Union are used for post shipments(Universal Postal Union, 2007).Urban population data is obtained from the ''World Urbanization Prospects'' of the UN (United Nations Population Division, 2018).For the scientific publications the ''Microsoft Academic Graph'' is used(Microsoft Academic, 2022).

Table 2
Conversion between word expressions and numeric values used in this work.Lakh is commonly used in Indian media also within English press reports.