What can we learn from corporate sustainability reporting? Deriving propositions for research and practice from over 9,500 corporate sustainability reports published between 1999 and 2015 using topic modelling technique

Organizations are increasingly using sustainability reports to inform their stakeholders and the public about their sustainability practices. We apply topic modelling to 9,514 sustainability reports published between 1999 and 2015 in order to identify common topics and, thus, the most common practices described in these reports. In particular, we identify forty-two topics that reflect sustainability and focus on the coverage and trends of economic, environmental, and social sustainability topics. Among the first to analyse such a large amount of data on organizations’ sustainability reporting, the paper serves as an example of how to apply natural language processing as a strategy of inquiry in sustainability research. The paper also derives from the data analysis ten propositions for future research and practice that are of immediate value for organizations and researchers.


Introduction
Growing legislative pressure and increasing public concern about the global climate and the carrying capacity of the earth have led to increasing demands for organizations to act in sustainable ways [1]. Consequently, the number of organizations that publish information on their sustainability practices has grown steadily [2]. One way in which organizations communicate these practices to stakeholders is through sustainability reports-usually published annually with financial reports [3]-that report on the organization's "economic, environmental and social impacts caused by its everyday activities" [4].
For the last fifteen years, researchers have sought to shed light on the publication of organizational sustainability practices in sustainability reports in order to determine how organizations interpret the challenge of sustainability. Some researchers have focused on the frequency of reporting and other high-level information in order to gain insights into the general development of sustainability reporting [2,5], while others have used qualitative content analysis techniques to provide an overview of certain organizations' reporting practices [1]. Other research has examined the content of sustainability reports in a more quantitative way through text-mining techniques, focusing in particular on the frequency of certain terms that are related to sustainability practices [1,3,6,7]. Another study explored the references made to ecological limit by analysing the context of use of a predefined list of terms related to ecological limit [8]. Besides the last study, all of these studies have taken only a limited number of reports into consideration.
In contrast, the present study employs text-mining techniques to conduct topic modelling on 9,514 sustainability reports published between 1999 and 2015. In particular, we apply Latent Dirichlet Allocation (LDA), which is used to identify themes and their distribution in large collections of documents [9].
We extend the current research on sustainability reports in three ways. First, we use a recent data sets by including sustainability reports that were published as recently as the beginning of 2015, but we also extend the time frame back to 1999. Second, we analyse a significantly large number of sustainability reports-9,514 reports. By extending the time frame and the number of reports, we include a high diversity of reports in terms of sector and published year, which allows us to show the development of topics over time and their distribution among sectors. Third, to our best knowledge, we apply a methodology-LDA-that has not yet been used to examine sustainability reports. This method allows us to examine the documents without a predefined list of terms and thus provides us with a broader view on the content of the sustainability reports than other studies were able to gain.
In seeking to shed light on organizations' sustainability reporting, we focus on identifying (1) sustainability practices and their development over time; (2) the coverage of economic, environmental, and social aspects in sustainability reports; and (3) the differences in sustainability reporting (and practices) among certain sectors [3].
The paper proceeds as follows. The next section provides background on corporate sustainability and sustainability reports. Then we describe the data and methods used. After presenting and discussing the results of our analysis, we conclude in the last section.

Research background Corporate sustainability
Through the Brundtland Commission's publication of the report, Our Common Future, the concept of sustainability and particularly the definition of sustainable development as "meet (ing) the needs of the present without compromising the ability of future generations to meet their own needs" [10] has gained popularity [11]. The term corporate sustainability is often used in the context of organizations, but it has no commonly accepted definition [11]. Some authors focus on the environmental aspects of sustainability, others on the social aspects, and others take an integrated view, combining sustainability's environmental, social, and economic aspects without prioritizing any one dimension [11][12][13]. We see corporate sustainability as lying at the interface of economic contribution, environmental performance, and social responsibility [14]. Further, we agree with the work of Dyllick and Hockerts [15], that these three dimensions of corporate sustainability can be seen distinct on an operational level, but should be integrated on strategic level.
Definitions of sustainable practices differ among the sectors in which organizations operate, but sustainable practices can generally be divided into environmental, social, and economical sustainable practices. The environmental practices refer to the consumption of natural resources and the release of emissions, both of which should be below a rate that ensures the health of the eco-system [15]. Thus, the environmental practices are concerned with reducing

Sustainability reports
Organizations' reporting of non-financial data started in the 1970s with "social balance sheets" [25]. At first, organizations reported on the social benefits they paid to their employees quantitatively [25]. Later they also included information on product quality and social engagement [25]. After several environmental catastrophes in the 1980s, organizations started to report on the environmental aspects of their efforts as well [25], with the first publication of a separate environmental report in 1989 [2]. In the following years, the focus shifted solidly to environmental reports and from somewhat argumentative reporting to proactive reporting with competitive elements [25]. Consequently, today's sustainability reports are often seen as marketing instruments. Involvement of public relations departments and third parties in the compilation of sustainability reports, as well as industry-specific foci because of industries' differing stakeholders, also suggest that sustainability reports are often used as marketing instruments [1,26]. Around 2000, the focus shifted again to include more social and financial aspects of companies' sustainability efforts [1][2][3]25]. While in 1999, 98 percent of the reports published by the largest 250 multinationals were concerned only with environmental issues, by 2002 this percentage had declined to 71 percent [2]. In addition, the names of the reports changed from corporate citizenship report (emphasizing the social aspect), to corporate (social) responsibility report and then finally to sustainability report [25]. The number of organizations that report on their sustainability activities has steadily increased to the point at which sustainability reports have become standard procedure [1]. Today many reports follow the format published by the Global Reporting Initiative (GRI) [2], but even though the reports increasingly focus on performance indicators, improvement in the creditability of these figures is needed, as organizations often report only a few indicators, sometimes provide only summarized figures, and do not indicate whether the figures are estimates or measures or how changes were made [1]. Many organizations follow the GRI standard to increase the credibility of their reports [27], and stakeholders are often directly involved in determining the content of the reports [28].
The number of companies that publish sustainability reports differs from sector to sector [2]. In the past, in industrial sectors like chemicals, computers and electronics, cars, utilities, oil and gas, and food and beverages, the number of organizations that publish sustainability reports was higher than average [2], while financial companies, trade and retail, services, communications, and media were less active in reporting their sustainability activities [2]. Since the amount of companies publishing a sustainability report for the first time is decreasing since 2003 [1], one might expect that this distribution among sectors might still be true today.

Method
We employ a semi-automated text-mining technique on publicly available sustainability reports to determine the topics they address. These techniques usually represent documents as vectors. In the easiest form, such a vector includes for each term in the document the number of appearance. However, such a vector has a high number of dimensions (each one reflecting one term). Thus, we need to reduce the dimensionality of the resulting vector [29] in order to be able to handle these huge amount of data.
For this, we are using LDA since, in the resulting vector of LDA, each dimension corresponds to one topic or concept [29]. A topic is a probability distribution over all of the terms that co-occur in the underlying documents [29] and one document is a probability distribution itself over all topics in the corpus [30,31]. That means, when describing a topic, the author takes words with a certain probability from the pool of terms related to that topic [31]. For instance, when writing about the topic of climate change, terms like climate, CO 2 , emissions, GHG, warming, or temperature have a high probability of appearing, while terms like employee benefits, social responsibility, or gains have a lower probability.
Topics are identified through considering which terms are often occurring together, thus, it is assumed that the more often terms occur in the same document, the more likely it is, that they belong to the same topic. Each sustainability report consists of several topics. The probability distribution of one of these documents shows how prominent the identified topics are in this specific report.
we excluded 1,152 encrypted documents that could not be extracted easily. To avoid issues related to translation, we limited the documents to those written in English, so we used another Python script to identify those documents. The remaining 9,514 sustainability reports written in English serve as the basis for all further analyses.
Next, we tokenized the documents-split them into tokens that include words and special symbols like punctuation marks-and cleaned up the text by bringing all characters into lower-case and removing special characters and numbers. Then we lemmatized the words using the WordNetLemmatizer, and eliminated standard stop words (i.e., general-purpose words like articles, pronouns, and conjunctions). For this, we used the stop word list "English" provided by the NLTK package of Python. Further, we removed terms that appeared in fewer than two documents. We manually checked the remaining vocabulary to exclude other irrelevant terms like country names.

Latent Dirichlet Allocation (LDA)
The purpose of the LDA process is to find in each document a mix of topics, where each topic is described by a mix of terms [31]. Thus, the probability distribution of the mix of topics differ from that of the mix of terms [31]. The hyper-parameter α describes the shape of the per-document topic distribution, and the hyper-parameter β describes the shape of the per-topic word distribution [32]. The distributions are estimated by the algorithm using Dirichlet priors [31]. Gensim (for Python) and Mallet (for Java) are among the extant efficient and effective implementations of LDA [29].
We use the Mallet implementation, which automatically estimates the hyper-parameters α and β, to conduct the LDA analysis. The number of topics, defined in advance, depends on the intended level of topic specialization [31]. We wanted to assign labels to each of the resulting topics, but for when there are too few dimensions, the topics tend to be more general, as they are a broad mixture of terms that makes it difficult to assign specific labels, and when there are too many, the topics become too specific. We employed the algorithm on three, five, ten, twenty, fifty, seventy, and one hundred dimensions and compared the results, deciding to focus on seventy dimensions, as this number gave us a broad variety of topics without going into too much detail.
The algorithm produced two result sets per topic. The first result set consists of all terms of the corpus and the degree to which they are likely contribute to the topic [31]. The second result set contains all documents in the corpus and the probability that the corresponding topic occurs in the document.
In interpreting the results, the five to twenty most probable terms for each topic are usually examined in order to identify the degree of commonality and, thus, specify the label of the topic [29]. Our analysis focused on the terms with the twenty highest probabilities. Five researchers, including the first author of this work, examined the seventy topics and classified each as describing environmental sustainability, social sustainability, economic sustainability, sustainability in general, or no sustainability at all. All researchers were provided with definitions of environmental, social, and economic sustainability based on the practices described in the research background of this paper. Since sustainability reports also contain information that is not related to sustainability, we expected to find several topics that were not relevant to our further analysis.
In the next step, we continued the examination of the topics that are relevant to sustainability by analysing the prominence of industries in each topic in terms of their mean probability of occurring in the topic. We excluded a few topics that consisted of terms that seemed to describe sustainability practices but instead described the business of the most probable sectors. For instance, one topic contained words like oil, gas, and energy, which appear to describe energy sources as a topic of environmental sustainability. However, analysis of the most probable sectors showed that this topic is used primarily by the energy sector, so it probably describes their business activities. Therefore, we excluded this topic. For the remaining topics, we analysed the mean probability per year, per country, per continent, and per organization size. We found all information except that for the continent in the meta-data provided by the GRI database and assigned countries to their continents based on a map from the United Nations Statistics Division (http://unstats.un.org/unsd/methods/m49/m49regin.htm).
We also assigned to each topic that was relevant to sustainability a label that describes the topic's content. Therefore, the first author of this study made one proposal based on the twenty most probable terms of each topic, discussed it with the second author, and resolved any disagreement. Thereby, labels were selected in order to represent the twenty most probable terms (those terms that are used with high probability when describing the topic) of the specific topic. Table 1 provides an overview on the conducted steps as well as the main decisions that had to be made in order to conduct the analysis.

Results
We analyse 9,514 sustainability reports published between 1999 and 2015 by 3,906 different organizations. The most common industries were financial services, followed by the energy sector, the mining sector, and food and beverage products.
We find forty-two topics that are related to sustainability. We conduct several analyses for each topic, including its development over time, its distribution over industries, countries, continents, and size of organization. Based on this analysis, we come up with ten observations that are summarized in Table 2 and are described in the following. Further, the Appendix contains an overview of all seventy topics, including the most prominent terms of each one as well as the probability of occurrence (how high is the chance that the specific term appears in the context of this topic) of these terms in the context of this topic.
Observation 1: Organizations report on environmental, social, and economic sustainability During our first interpretation phase, we looked at each topic/collection of terms and assigned this topic to environmental sustainability, social sustainability, economic sustainability, general sustainability or not related to sustainability. The corresponding assignment can be found in the table in Appendix. We can find topics that are related to environmental sustainability, as well as topics that are related to social or economic sustainability. Thus, we state that organizations report on environmental, social, and economic sustainability.
Observation 2: Topics on environmental, social, and economic sustainability are equally distributed In total, we identify 42 topics that are related to sustainability from which we assigned 31 topics to be either related to environmental or social or economic sustainability. In total, there are eight topics related to environmental sustainability, 13 topics related to social sustainability and eleven topics related to economic sustainability. Thus, all three dimensions are covered by roughly the same number of topics.

Observation 3: Economic sustainability topics are of increasing importance for organizations
For each topic, the LDA algorithm provides us with the probability that this topic appears in a specific document. And for each document we know the year in which it was published. In order to understand how the probability of a topic changed over years, we calculated the mean probability for this topic in all documents of a specific year. We further calculated this mean probability not only for one specific topic but for a group of topics, e.g., all topics that we previously assigned to being related to economic sustainability. Fig 1 provides an overview on the development of the mean probabilities of each of the three dimensions. As the linear trend line shows, the probability of environmental sustainability topics is slightly decreasing, while the one of social sustainability topics is more or less stable. The trend line of economic topics shows a constant increase. Particularly, the probability that an economic topic appeared in a sustainability report strongly increased from 2010 to 2011. Thus, we observe that economic sustainability topics are of increasing importance for organizations.
Observation 4: As to environmental sustainability, organizations report on emissions and energy consumption We find eight topics that are related to environmental sustainability. Two topics (no. 16 and no. 32) refer to environmental sustainability performance and environmental sustainability data respectively. Four topics are concerned with environmental sustainability in the supply chain. Topic no. 10 (green supplier) focuses on the supplier as part of the global supply chain. Organizations report on environmental, social, and economic sustainability.
2 Topics on environmental, social, and economic sustainability are equally distributed. 3 Economic sustainability topics are of increasing importance for organizations. 4 As to environmental sustainability, organizations report on emissions and energy consumption.

5
Biodiversity and renewable energy sources receive little attention in reports by organizations. 6 Regarding social sustainability, organizations report on labour practices.

7
Customer orientation is in organizations' focus. 8 Sponsorship activities for social sustainability focus on schools and education.

9
Economic sustainability reporting is based on financial data.
Environment-related terms are energy and emissions. Topic no. 21 (production) and topic no. 45 (green production) are, of course, concerned with production. Among the most probable occurring terms are safety, emissions, and fuel in topic no 21 and material, energy, waste, reduction, recycling, emissions, and impact in topic no. 45. Topic no. 35 (production and packaging) broadens the focus by including packaging, particularly recycling. Most probable occurring terms for this topic are recycling, waste, water, safety, and health. Two topics focus on environmental sustainability in certain contexts: Topic no. 53 summarizes terms from building construction, and environmental-sustainability-related terms are energy, green, sustainable, environmental, water, and material. Topic no. 63 summarizes terms that relate to water management, including wastewater treatment. Table 3 provides an overview on all these topics, including the most probable terms for each topic. The most probable terms are those terms that have the highest probability (noted as percentage after each term) to appear if a text is about the specific topic.
For most of the environmental sustainability related topics, the terms energy and emissions as well as related terms such as gas, ton, waste, or consumption are among the twenty most probable terms, showing that organizations focus on energy consumption and emissions (including waste) specifically.

Observation 5: Biodiversity and renewable energy sources receive little attention in reports by organizations
While energy is well covered in the topics related to environmental sustainability, we could not find evidence that renewable energy was discussed in the reports as well. Further, the probability of the term biodiversity was close to zero, meaning that the chance of appearing in one of the sustainability reports is very low. Consequently, we conclude that biodiversity and renewable energy sources receive little attention in the analysed sustainability reports. Observation 6: Regarding social sustainability, organizations report on labour practices We find thirteen topics that are related to social sustainability. Six of these topics are concerned with employees and labour practices, each has a unique focus. Table 4 shows the most probable terms of each of these topics. Employee safety and work time belong to the five topics with the highest mean probability over all analysed topics, which further show the relevance of these topics for organizations. Observation 7: Customer orientation is in organizations' focus Two topics related to social sustainability focus on stakeholder involvement. Topic no. 69 refers to stakeholder information and consists of probable terms like program, community, organization, management, performance, government, work, people, information, reporting, and development. Topic no. 20 (customer orientation) focuses on one specific stakeholder, the customer. Probable terms in this topic include customer, service, product, responsibility, satisfaction, information, online, and survey. Analysing the development of the mean probability of these both topics between 2000 and 2014, we find that the mean probability of topic no. 20 is slightly increasing over time while the trend for topic no. 69 shows a slight decrease (see Fig 2). To summarize, the customer is one only stakeholder that is mentioned in a separate topic and this topic is further of increasing prominence in the sustainability reports, showing the importance of this topic for organizations.

Observation 8: Sponsorship activities for social sustainability focus on schools and education
One topic related to social sustainability (topic no. 60) specifically addresses sponsorship, more precisely, school sponsorship. Among the most probable terms are school, project, education, child, development, support, foundation, initiative, and partnership. After a peak in 2004, the probability of occurrence of this topic in a sustainability report remained more or less stable.

Observation 9: Economic sustainability reporting is based on financial data
Of the ten topics related to economic sustainability, six are related to financial data. Table 5 shows the most probable terms for each of these financial data topics. Common probable terms are share, board, risk, tax, consolidated, shareholder, asset, euro, cost, and loss. The trend analysis of the probability of occurrence for these topics over time (Fig 3) shows that all financial data topics increase in probability over time, meaning that it is more likely that they are mentioned in sustainability reports.
Observation 10: Sustainability actions are both general and contextspecific in nature For each topic, we also analyse how it is distributed over industries and countries respectively continents. That means, for each topic, we calculate the mean of the probabilities of all sustainability reports that were published by organizations that belonged to one specific industry or  Uncovering sustainability practices country. We gained the information about industry and country from the meta-data that we downloaded together with each sustainability report. Many topics are kind of equally distributed over the available industries and countries, however, several topics are remarkable prominent on specific industries or countries. In the following, we only highlight those topics that differ from the average.
For the environmental sustainability topics, we find several prominent industries. For instance in the topics no. 10 (green supplier), 21 (production), and 35 (production & packaging) two industries are prominent: in topic 10, the computer industry and the technology hardware industry; in topic no. 21 the chemicals industry and the construction material industry; and in topic no. 35 the food and beverages industry and the households and personal products industry. In topic 45 (green production), the technology hardware industry, the consumer durables industry, and the equipment industry are the most prominent sectors. The construction and the real estate industries are concerned with building construction, while the water utilities industry and the waste management industry report about water management. Regarding the country, we find further significant differences between the two general environmental sustainability topics, as environmental sustainability performance is prominent mainly in North America, particularly in the United States. We also find differences among the supply chain topics. Green supplier is prominent in North America, while the production topics are most likely to occur in Asian reports. Production and packaging is most probable in Europe, followed by North America and Africa.
Analysing the topics related to social sustainability, topic no. 11 (social sustainability data) is particularly probable in the forest and paper products industry and in the agriculture industry, while the topic work time is most probable in reports from the toy industry. Topic no. 25 (sustainable development) is most probable in reports from Morocco or France. Topics no. 11 (social sustainability data) and no. 44 (corporate social responsibility) are most probable in the countries of South America; topic no. 11 is particularly common in Brazil. Employee safety is particularly prominent in North America and Asia, while employee diversity occurs primarily https://doi.org/10.1371/journal.pone.0174807.g003 Uncovering sustainability practices in the reports of multinational enterprises from North America. In Europe, employee responsibility is the most probable topic, while management is the most probable topic in reports from Asia. The topic work time is most probable in reports of small and medium-sized enterprises headquartered in Ecuador, while stakeholder information is more probable in small andmedium-sized enterprises in Australia and New Zealand.
The topics related to economic sustainability are the least focused on specific industries and countries. Only topics no. 39 (financial data 4) and no. 66 (financial data 6) are particularly probable in reports from Africa, as is investment.
We also find some topics that are related to sustainability in general. These topics are rather specific for certain industries and countries, for instance topic no. 30 (sustainability activities) is most probable in reports from the energy utilities sector, while nuclear power is, understandably, of particular interest to the energy industry. Stakeholder issues are mainly a topic of the tobacco industry. Topic no. 33 (CSR activities) is most probable in reports from the toys and the hardware technology industry. Topic no. 56 (development) is highly probable in reports from the mining industry. Further, topics no. 14 (organizational sustainability), no. 31 (sustainability program), and no. 33 (CSR activities) are most probable in reports from Asia. The probability of topic no. 30 (sustainability activities) is particularly high in Italian reports, while topic no. 41 (general sustainability) is more probable in reports from Oceania, and topic no. 50 (corporate sustainability) is more likely to appear in European reports. Sustainability projects are prominent only in Europe, and there particularly in Germany and Austria. The analysis of the companies' nationality shows distinct results for the nuclear power topic, which is prominent mainly in Europe, but particularly in Belarus and the Russian Federation. Stakeholder issues is probable mainly in reports from Uganda, while the topic annual meeting has a high probability of appearing in reports from Africa, particularly South Africa and Namibia. Topic no. 56 (development) is most probable in reports from North America.

Discussion
Our analysis applies topic modelling to more than 9,000 sustainability reports in order to identify sustainability practices. We identify forty-two topics that are related to sustainability from which we make ten observations. In the following, we discuss these observations and develop ten related recommendations for organizations and researchers.
Observation 1: Organizations report on environmental, social, and economic sustainability Coding the topics identified in the sustainability reports confirms the notion of the so-called triple bottom line [33], in that topics relate to environmental, social, and economic sustainability. From the forty-two topics, we assigned thirty-one to one of these dimensions. Even though the triple bottom line has been criticized for being difficult to implement [34], our results suggest that the three dimensions fit to structure organizations' sustainability topics in practice, confirming previous results that organizations report on all of these dimensions [1][2][3]25]. The remaining eleven topics that were not assigned to one of the three dimensions are related to general sustainability topics that consist of a mix of terms that belong to all three dimensions, thus representing the integration aspect of the sustainability definition [35]. Hence, our results show that organizations report on the distinct dimensions of sustainability, but their reports also reflect the required integration of environmental, social, and economic sustainability. Particularly in the general sustainability topics, the focus seem to be more on strategic elements, for instance, terms like business, group, and management appear among the most probable terms. This would be in line with our understanding of sustainability that sustainability can be seen distinct on operational level, but should be seen integrated on strategic level. We recommend that organizations keep this distinction on their operational level but focus on integrating the three dimensions on a strategic level [15].
Observation 2: Topics on environmental, social, and economic sustainability are equally distributed While the common definitions of sustainability highlight the integration of its environmental, social, and economic dimensions [35], these dimensions have been seen historically as distinct. For instance, corporate sustainability origins are in environmental sustainability, while corporate social responsibility, which is today often synonymous with corporate sustainability, has its origins in social sustainability [12]. Before these two terms gained prominence, profit maximization and, therefore, economic value were seen as the core business functions [36]. Our study shows that the topics organizations report on can be more or less equally assigned to all three dimensions. Of the forty-two topics, slightly less than a quarter relates to environmental sustainability, slightly more than a quarter relates to social sustainability, and a quarter relates to economic sustainability. Thus, at least based on the number of topics, despite their origins, the three dimensions are nearly equally covered in the reports. However, this does not mean that the three dimensions are also equally covered in terms of depth or occurrence. Also previous studies found that organizations report on all three dimensions, however, the dimensions are not equally covered and efforts should be made to balance all three dimensions [37].

Observation 3: Economic sustainability topics are of increasing importance in organizations
The mean probability of most economic sustainability topics is increasing, indicating thatwhile economic, social, and environmental sustainability overall are covered equally in sustainability reports-mentions of economic sustainability are increasing. This finding confirms previous results concerning an increasing relevance of economic topics in sustainability reports [2] that might be a consequence of the 2008 financial crisis [1]. The mean probability of all environmental sustainability topics and the mean probability of all social sustainability topics have been largely stable since 1999; however, these two areas' mean probabilities in all of the reports in our analysis are higher than that of all economic sustainability topics. One reason for this result might be seen in economic pressures like that seen in the Europe crises [38,39], the concerns about the Chinese economy [40], or organizations' growing interest in digital innovation and transformation in their businesses [41]. In this regard, the data might confirm that organizations prioritize economic concerns during crises and that ecological and social interests are more likely to be considered in stable economic times [42,43]. However, research indicates that sustainability transformation can also offer economic potential for organizations and that digital technologies in particular can open up new business opportunities and business models in areas of environmental and social sustainability [37,44,45] like smart houses and energy supply solutions [46]. Consequently, organizations should leverage the economic potential of including environmental and social sustainability in their activities.
Observation 4: As to environmental sustainability, organizations report on emissions and energy consumption Diverse measures for environmental sustainability have been discussed in the literature, including air emissions, energy use, resource depletion, waste, and water use [14]. Our study reveals that organizations predominantly report on their emissions and energy consumption data. Emissions and consumption are also the most frequently mentioned environmental issues found in a previous analysis, but in that analysis consumption data for energy and water were reported equally often [1]. In our results, energy was more probable than water in the context of environmental sustainability performance. We found a specific topic on water, but this topic focused on waste water. Consequently, we conclude that organizations should increase their range of measures for environmental sustainability by, for instance, additionally reporting on fuel and paper consumption, waste, and emission of certain gases [1].

Observation 5: Biodiversity and renewable energy sources receive little attention in reports by organizations
Research has identified both biodiversity and renewable energy sources as important aspects of environmental sustainability [14,47], but these topics were absent or rare in the sustainability reports we analysed. The probability of the term biodiversity is close to zero, and we found no evidence that the term occurs in the context of general environmental sustainability reporting. One reason for this rarity might be the complexity of biodiversity, as the related impacts of some of organizations' actions are often distant in time and space [48]. Furthermore, other than some well-known threats to biodiversity, such as pollution, many threats are not yet fully understood [48], making it difficult for organization to address the issue. Another explanation might be that loss of biodiversity is a result of environmentally unsustainable behaviour, including over-abstraction of water, increasing demand for resources, and rising consumption levels [48], topics that are well-covered in the sustainability reports. Terms that relate to renewable energy are also rare in the reports. Our study suggests that organizations have not taken significant action to invest in renewable energy, nor are they reporting on projects to come. Investigations on why this is the case in terms of whether organizations see sufficient potential in renewable energy and biodiversity and plan to adopt it in the future would be useful.

Observation 6: Regarding social sustainability, organizations report on labour practices
The literature has discussed diverse issues concerning social sustainability, such as employee training programs, health and prevention programs, stakeholder involvement, customer satisfaction, and sponsorship [13,14]. Our study also supports the importance of topics like safety, work time, diversity, and development. These topics tend to cover the indicators mentioned in the global reporting principles and standards under the sub-category of labour practices and decent work, and organizations seem to cover the most important human rights in their reports. Compared to all 42 topics, which we analysed, two of the topics related to employees are among the five most probable topics (having the highest mean probability of occurring in a sustainability report), so employees play an important role in organizational sustainability reports. However, organizations should scope their sustainability initiatives beyond legal requirements [37] in order to differentiate themselves from their competitors. In particular, continuing with efforts concerning labour practices beyond existing regulations can greatly improve an employer's profile and attract top talent on the competitive global job market [49].

Observation 7: Customer orientation is in organizations' focus
The literature has identified several motives for organizations to engage in sustainability transformations, including regulations [14,20,21], pressure from customers [21], and new market creation [45]. Against this background, our analysis revealed a dominant topic related to customer-orientation. The probable terms of customer, responsibility, satisfaction, information, online, and survey show that organizations are concerned with customer satisfaction and use online surveys to measure it. These findings suggest that customer orientation is a valid strategy for organizations, as they report on their sustainability initiatives with respect to their customers' perspective. Drawing from the data, we conclude that organizations should scope sustainability initiatives in order to consider a wider range of stakeholders. Stakeholder theory, in particular, shows that multiple views should be balanced in order to achieve business success over the mid-to long-term [50]. Sustainability research has also shown that corporate sustainability requires that one considers interactions with and value creation for all stakeholders [37,51].

Observation 8: Sponsorship activities for social sustainability focus on schools and education
Sponsorship activities are part of organizations' social sustainability practices [14]. Since our analysis reveals a focus of such sponsorship on funds for schools and education, we find that decision makers appear to believe in the role of education in improving (social) sustainability over the long term [52]. Building on our data, we conclude that, in addition to their current activities, organizations should invest in their employees' education as well in order to achieve bottom-up support of their sustainability activities, which has been shown to help organizations adopt sustainable practices [23].

Observation 9: Economic sustainability reporting is based on financial data
The topics on economic sustainability focus on financial data particularly that which is part of the corporate balance sheet. Our study did not find topics on practices like compliance or on codes of conduct, both of which have been identified as valuable in the effort to sustain economic success. We argue that sustainability reports should be enriched by statements on how to sustain and develop economic results in pursuit of economic sustainability [37].

Observation 10: Sustainability actions are both general and contextspecific in nature
Research on sustainability transformations has identified a number of action potentials, such as guiding behaviour by sense-making and sustainable practices [53]. Building on the data from the sustainability reports of 3,900 companies, our study reveals a broad range of topics covered in sustainability reports, some of which are related to certain industries or certain geographic regions, while others are well distributed among industries and regions. In this regard, our study confirms previous assumptions about industry-specific practices [3]. We further assume that the differences in industries and regions might be due to different stakeholders, e.g. different national initiatives. Also previous research showed that the content of sustainability reports is influence by the organization's stakeholders [1]. Therefore, we argue that future research and practice should be more specific in characterizing and understanding the context of sustainability behaviour.
In several cases, we can make assumptions concerning why a certain industry or region focuses on a particular topic. For instance, green production is most likely to be a topic in reports from companies in Asia, especially those in the technology hardware, consumer durables, and equipment industries, with which companies in China, Japan, and Taiwan are typically associated. Therefore, we see a link among the continent, the industry, and the most probable terms in the topic. The production and packaging topic is most likely to occur in reports from companies in Europe, possibly because of the European directive on packaging and packaging waste that aims to improve packaging's environmental performance. Another example is sustainability projects, which are most likely to occur in German reports, possibly because of the German Energy Transition ("Energiewende"), a movement toward alternative energy sources that started in the 1970s and gained popularity in 2011 [54]. Therefore, we agree with the statement from Liew et al. [3] that sustainability practices are industry-specific. We also show how different stakeholders-particularly governments through regulationsinfluence the content of sustainability reports [1].
We summarize these ten observations and ten recommendations in Table 6.

Conclusion
Increasing numbers of organizations are publishing sustainability reports about their sustainability practices [2]. The present work used topic-modelling techniques to analyse 9,514 sustainability reports published by organizations between 1999 and 2015 and derives ten specific propositions to guide future research and practice. More specifically, we identified forty-two topics related to sustainability that are distributed approximately equally in the areas of environmental, social, economic, and general sustainability. We showed that topics related to environmental sustainability consist mainly of emissions Table 6. Summary of observations and recommendations.

# Observation Propositions
1 Organizations report on environmental, social, and economic sustainability.
Organizations should distinguish among the three dimensions of sustainability on the operational level but focus on integrating the three dimensions on a strategic level.
2 Topics on environmental, social, and economic sustainability are equally distributed.
Organizations should balance social, environmental, and economic dimensions of sustainability in their choice of sustainability-related activities.
3 Economic sustainability topics are of increasing importance for organizations.
Organizations should leverage the economic potential of including environmental and social sustainability in their activities.
4 As to environmental sustainability, organizations report on emissions and energy consumption.
Organizations should increase their range of measures of environmental sustainability by, for instance, additionally reporting on fuel and paper consumption, waste, and emission of certain gases.
5 Biodiversity and renewable energy sources receive little attention in reports by organizations.
Researchers should investigate whether organizations see sufficient potential in renewable energy and biodiversity and plan to adopt it in the future.
6 Regarding social sustainability, organizations report on labour practices.
Organizations should continue their efforts concerning labour practices beyond existing regulations in order to improve their profiles as employers.
7 Customer orientation is in organizations' focus. Organizations should consider their interaction with and value creation for all stakeholders.
8 Sponsorship activities for social sustainability focus on schools and education.
Organizations should invest in their employees' sustainability-related education.
9 Economic sustainability reporting is based on financial data.
Organizations should enrich their reports with statements on how to sustain and improve economic results.
10 Sustainability actions are both general and context-specific in nature.
Researchers should be more specific in characterizing and explaining the context of sustainability behaviour. and consumption, particularly related to energy; biodiversity and renewable energy do not appear in our results. The focus of social sustainability is on employees, but customer orientation and sponsorship are also covered. In addressing economic sustainability, organizations tend simply to present their financial data. We also show the influence of industry and country on the content of the topics. We advise organizations to balance their activities in and to use the potential of all three dimensions of sustainability, to increase their measures for environmental sustainability, to continue with their efforts concerning labour practices, to consider their interactions with all stakeholders, to invest in their employees' education concerning sustainability, and to provide information in their sustainability reports on how to sustain and develop their economic results. Researchers are advised to investigate why organizations have not focused on biodiversity or renewable energy and to be precise on the contexts of the sustainability behaviours they examine.
Our work is not without limitations. We used text-mining techniques that reduce the content of the documents to simple collections of terms, so our findings depend on our interpretation of the results. Particularly, the labelling of the topics is based on the subjective opinion of the authors and other researchers might come up with different labels. Still, we are convinced that our labels represent the content of each topic (based on the most probable terms that describe this topic) well and thus, also other labels would not have a big influence on our findings and observations. Nevertheless, we encourage other researchers to evaluate our results through an in-depth qualitative analysis of sustainability reports. In addition, our data sources are sustainability reports that organizations publish to report on their sustainability activities, but they are also used as marketing instruments [1], so they might not reflect corporate sustainability practices in all details. Furthermore, we received these reports from one single source, the GRI database, thus, there might exist a certain bias in the data. Future research can address this limitation by complementing our findings with interviews of those who participate in sustainability activities in organizations or use another data source such as the Corporate Register, a database with more than eighty thousand corporate responsibility reports (http://corporateregister.com). To our best knowledge, our application of LDA in the context of sustainability reports is new, so questions could arise regarding its reliability. While the use of LDA is not without risk, we are confident that, in this case, it provided valuable insights.
Despite its risks, we propose that other researchers use LDA in their research on sustainability reports or organizational reports in general, as its use in these contexts has several advantages compared to manual coding techniques: First, LDA allows the researcher to take a large amount of data into consideration and opposed to manual coding, the data analysis costs are minimal [32]; in our case, it allowed us to provide a broad picture of sustainability practices over several industries and years. Second, LDA requires no restrictions on the content of the topics, such as the requirement that one focuses on certain indicators as previous studies [8] have done; we had only to restrict the number of topics to be modelled. Third, LDA allows the resulting topics to be used for further analysis, such as we did in analysing the topic distribution over industries, years, and regions, restricted only to the nearly forty industries that published their sustainability reports on the GRI database. For example, the number of reports from the toy industry might otherwise have kept it from being the focus of any sustainability research, but using LDA allowed it to be included, and our analysis revealed two topics, work time and corporate responsibility activities, that were highly probable in reports from this industry. Fourth, applying LDA provides a broad overview of the many definitions and conceptualizations of sustainability that exist in organizations.
Our research can also guide practitioners in their sustainability activities, as it provides a comprehensive overview of potential sustainability-related efforts among the ten recommendations.     The study contributes to research in two ways. First, researchers can use our results to either explore the topics that resulted from the analysis or to explore the reasons for missing topics. Second, we propose a new technique for analysing sustainability reports or corporate reports in general that other researchers might use to analyse other types of documents. Uncovering sustainability practices