NASA’s carbon monitoring system (CMS) and arctic-boreal vulnerability experiment (ABoVE) social network and community of practice

The NASA Carbon Monitoring System (CMS) and Arctic-Boreal Vulnerability Experiment (ABoVE) have been planned and funded by the NASA Earth Science Division. Both programs have a focus on engaging stakeholders and developing science useful for decision making. The resulting programs have funded significant scientific output and advancements in understanding how satellite remote sensing observations can be used to not just study how the Earth is changing, but also create data products that are of high utility to stakeholders and decisions makers. In this paper we focus on documenting thematic diversity of research themes and methods used, and how the CMS and ABoVE themes are related. We do this through developing a Correlated Topic Model on the 521 papers produced by the two programs and plotting the results in a network diagram. Through analysis of the themes in these papers, we document the relationships between researchers and institutions participating in CMS and ABoVE programs and the benefits from sustained engagement with stakeholders due to recurring funding. We note an absence of policy engagement in the papers and conclude that funded researchers need to be more ambitious and explicit in drawing the connection between their research and carbon policy implications in order to meet the stated goals of the CMS and ABoVE programs.


Introduction
The United States National Aeronautics and Space Administration (NASA) has for decades invested in creating freely available satellite-based Earth observation data which can be used to generate scientific knowledge. Its programs support scientific research that translates data into an understanding to the dynamics of the carbon cycle and terrestrial ecosystems through interdisciplinary collaborations and research. Information on ecosystem function, land cover change, leaf area, stress and biomass have been derived from satellite data since the 1970s, starting with the launch of Landsat 1 in 1972 (Perry and Lautenschlager 1984).
More recently, satellite data have been used to monitor and map changes in carbon emissions which result in rising greenhouse gases (Defries et al 1999, Houghton 2018, Allen et al 2018. For example, deforestation and degradation of tropical forests have been shown to account for up to 30% of anthropogenic carbon emissions (Goetz et al 2009). In remote regions such as the arctic, ecosystems are changing rapidly, but with few inhabitants and direct observations being expensive to obtain, satellite data integrated with models are essential and effective ways of measuring and responding to change (Fisher et al 2018). Direct observations of carbon emissions from new sensors such as the Orbiting Carbon Observatory (OCO2) (Boesch et al 2011) and from the Greenhouse Gases Observing Satellite (GOSAT) (Butz et al 2011) have shown how satellite data can be used to directly observe carbon emissions from specific locations.
Carbon monitoring is the sustained measurement of carbon dynamics, including capabilities that can be useful for management of emissions and decision making (West et al 2013). Designing observations, models and engagements together with stakeholders and boundary organizations (Gustafsson and Lidskog 2017) has been shown to be much more effective in delivering this wealth of information to policy makers than when scientists work in isolation (Moser and Ekstrom 2010). Engaging with stakeholders while developing new methods, models and datasets using Earth observation data is a central goal of both NASA's Carbon Monitoring System (CMS) and Arctic-Boreal Vulnerability Experiment (ABoVE) programs. To achieve these goals, NASA has worked to develop a community of practice across both programs to rapidly produce new science and data products that will be needed for stakeholders as we face increasing impacts from climate change (Brown et al 2016).
Social scientists have used the concept of a community of practice across a variety of domains, but the origin of the concept comes from learning theory (Wenger-Trayner and Wenger-Trayner 2015). The complex set of social relationships among individuals in a community is an important source of learning for all its members. Given the challenge of considering diverse perspectives while delivering complex science and algorithmic advances, this learning is central to the success of the programs. Three components define a community of practice: • the community needs to share a commitment to a specific domain of interest and to developing new relationships and connections, enabling coauthorship, work teams and shared institutions; • the domain or topical area the group works on should evolve, allow for learning to increase competence and to define success, and allow for shared use of models, satellite data, geographic extent and topics of interest; and • the practice the community conducts allows for working towards similar goals, developing a shared repertoire of methods, vocabulary, and resources through experiences, stories, tools, and ways of addressing recurring problems (Wenger 2011).
It is the combination of these three elements that constitutes a community of practice. Lemos and Morehouse (2005) find that a diversity of topics, research approaches and models that span the producer-user divide are needed to best meet the diverse needs of both stakeholders and scientists.
Both the CMS and ABoVE communities have similarity in funding societally relevant analyses and data products, investment in an applications program coordinator who provides support to investigators, and a focus on using remote sensing observations together with modeling. Both programs began with significant participation of scientists in crafting the original objectives and scope of the program, have articulated a focus on 'societal drivers, consequences and responses research' , and have shared leadership and support throughout their period of activity through management from the Carbon Cycle & Ecosystems Office at NASA Goddard Space Flight Center.
The objective of CMS is to apply NASA capabilities to support national and international needs for carbon Monitoring, Reporting and Verification (MRV). 3 Since the program began in 2010, researchers have emphasized the use of NASA satellite data and scientific expertise with ground capabilities in order to better understand the carbon cycle (Hurtt and Kang 2014). To be relevant to U.S. government agencies and state-level programs that map carbon stocks and biomass for regulatory purposes, CMS products must reflect the timing, resolution, and quantities set forth in the legal and regulatory frameworks in which these agencies work. By interacting with Federal and state agencies, non-profit organizations, and other stakeholder institutions that are working within regulatory frameworks, CMS investigators can maximize the utility of their data products for MRV (Hurtt et al 2019).
During the past decade, a focus of the CMS science team has been to work with funded scientists to communicate science data and model outputs in ways that make sense to these stakeholders, and iteratively develop data products that are useful for decision makers. Thus, in the project proposal, every CMS project must identify a user of their data and a potential community that is interested in the research. The themes of evaluation, accuracy and user community should be evident in the project descriptions and in the papers written about the projects. For example, data products that provide annual estimates of aboveground biomass density maps in the Tapajos Forest region of Para, Brazil (Treuhaft et al 2017), are used in quantification of carbon pools through the REDD + (Reducing Emissions from Deforestation and Forest Degradation) process, which seeks to reduce emissions from deforestation and forest degradation. Other CMS products, such as landscape-level forest biomass products for a variety of regions within the United States are used by local, regional, state and national decision makers such as the US State Department, US Forest Service, and the US Agency for International Development. Details on users and data products can be found at the NASA CMS website under Applications and Data & Products.
Significant effort has been devoted to rigorous evaluation of the quality of data being produced, as well as to the characterization and communication of errors and uncertainties in those data to stakeholders . An example of the use of CMS data is from a 2015 CMS project which provides methane emission data to the California Air Resources Board (CARB) and the California Energy Commission (CEC) in the South Coast Air Basin using proven airborne imaging spectrometers such as Airborne Visible InfraRed Imaging Spectrometer -Next Generation (AVIRIS-NG). The project showed that a third of California's methane emissions were traced to a few specific point sources that could be mitigated with direct action (Duren et al 2019).
Starting in 2013, NASA's ABoVE is a Terrestrial Ecology Program research and field campaign conducted in Alaska and Western Canada whose objective to understand the environmental change and its implications for social-ecological systems 4 (Kasischke et al 2014). The program focuses on research objectives that benefit from the unique capabilities provided by remote sensing data. Data products from new and existing satellite and airborne remote sensing systems allow for the study of seasonal and inter-annual variability over large geographic regions characteristic of the boreal zone. At landscape to regional scales, these data products are critical to the spatial and temporal scaling of observations made from field studies. ABoVE's science objectives are broadly focused on (1) gaining a better understanding of the vulnerability and resilience of Arctic and boreal ecosystems to environmental change in western North America through field observations, and (2) providing the scientific basis for informed decision-making to guide societal responses at local to international levels (Goetz et al 2016). A key aspect of the ABoVE program is overcoming the challenges obtaining field data in this region because of its remoteness and harsh conditions, which makes remote sensing particularly important for understanding environmental change. Although there are over 10 million acres of forests in Alaska, the US Forest Service has only been able to estimate total biomass for this region because of the ABoVE project's novel use of satellite remote sensing and field data (Ene et al 2018).
In this paper, we seek to show through a textual analysis of research papers and project descriptions created as a result of the funding that both CMS and ABoVE are together a community of practice. By analyzing the shared vocabulary, topics studied, methods used, datasets incorporated, vocabulary employed, and ways of addressing recurring problems, we can demonstrate that a community of practice has emerged with characteristics that will support and encourage improved use of scientific information by their collaborators. Moreover, we can compare the current coverage of the research output of ABoVE and CMS with the stated research goals of the program to identify current gaps. To achieve this, we use a topic modeling approach to assess the various topics and themes that have been addressed in CMS and ABoVE publications over the past decade. This approach allows us to not only explore the key topics in the literature and how they change over time, but also examine their thematic inter-relationships. We supplement this analysis with an exploration of the individuals and institutions involved in CMS and ABoVE projects to better understand the extent to which projects were connected by individual researchers. Brown et al (2016) used network analysis to evaluate the scientific community of practice of the North American Carbon Program (NACP). The NACP was formed to further the scientific understanding of the sources, sinks, and stocks of carbon in the Earth's environment, with a particular focus on those in the North American continent. The paper sought to determine how well the social and physical sciences have been integrated in the work of the NACP, and whether the necessary interdisciplinary research, set out in its 2011 strategic plan, was being acted upon by its members (Michalak et al 2011). Results of the analysis showed that the NACP has formed a tightly connected community with many social pathways through which knowledge may flow, and that it has also expanded its network of institutions involved in carbon cycle research over the past seven years.

Previous work
Here we extend this work to connect the NACP analysis to CMS and ABoVE programs and their impact over the past decade. Communities of practice can be defined as a community that develops when people have a common interest in a subject or area, and collaborate over an extended period of time in a process of social learning (Wenger et al 2002). Unlike many other Earth science research programs funded by NASA's Earth Science Division, all researchers submitting proposals to CMS were asked to: • Explain the societal relevance of the proposed research and scientific analyses; • Provide justification regarding the importance of their work to U.S. national interests in current or potential carbon monitoring for science, management, and policy; and • Address stakeholder interests in their studies and to contribute to CMS science team activities to understand and engage the user community for carbon monitoring products.
By engaging with the social, political and scientific agendas that drive decision making on carbon pollution, CMS scientists can design products and models that can be used in decision making. The programs attract those scientists willing and interested in interacting with institutions connected with current or potential carbon monitoring for science, management and policy order to design models, experiments and new data products that can eventually be used to support decision making (Michalak et al 2011). The CMS program supports projects with a variety of applications readiness, from discovery and feasibility, to development testing and validation through to integration into a partner's system (NASA 2017). Here we document the coherence of the topics being studied, which will show how these different communities are working together and how they extend their influence and connectivity through multiple disciplines through stakeholder engagement. For ABoVE solicitations, applicants were asked to engage in collaborations with interested parties and stakeholders to advance the ABoVE implementation plan. Similar to CMS, the ABoVE solicitation requires that projects examine the societal impacts of changes to Arctic and boreal ecosystems; and integrate these results into a coherent modeling framework for diagnosing and predicting ecosystem dynamics and the consequent societal impacts of changes to ecosystem services. They list a number of potential collaborators, which require engagement across policy, decision making and organizational boundaries.
Research that analyzes links between peer reviewed publications can provide evidence about how knowledge is shared among researchers. Issac and Thomas (2019) show that when analyzing information on how knowledge moves from one person to another, knowledge can be traced via both who the researcher knows (human capital) as well as what they know demonstrated through research papers and databases (structural capital). In our analyses, we provide evidence of human and structural capital, as well as analyze the role of institutional learning across a decade of funding. Behara et al (2014) shows how co-authorship in research papers is a form of social networking in research collaborations and can be used to understand relational linkages among individuals, organizations, and nations. Research on using co-authorship in publications to understand a social network has been extended to technology and to innovations such as dataset provision (Moody 2004, Van Der Valk andGijsbers 2010). Social networks are an inherent part of organizations, which affect collaborations and decision making, particularly in long-term research which engages with both technology and social decision making, such as in monitoring, reporting and verification of carbon pollution .
By connecting these two NASA-funded research programs, we seek to demonstrate how researchers, faculty, graduate students and practitioners are benefitting from engaging with social networks to access funds, information and influence (Hult et al 2003, Garvin et al 2008. Carbon cycle and arctic research are multidisciplinary and include models, data and field data. Our hypothesis in this research is that because of the similar focus on the societal relevance of the physical systems being studied, the two NASA programs have formed a single community of practice that uses similar vocabulary, have data and methods that are shared, and that enables community learning. To demonstrate this, we use co-authorship and a correlated topic model from published research in both communities.

Data
Selection of the journal set for the CMS and ABoVE projects based on self-reported research papers that were published describing research which was conducted using funds provided by NASA. The papers were reported by funded projects to their respective projects as part of the reporting process to NASA. The corpus used in the topic model included 521 peerreviewed papers; 319 papers from CMS, which had a total of 2.6 million words, and 202 papers from ABoVE, with 1.6 million words represented in the analysis.
In addition to the papers, for the analysis of research collaboration across proposals, we used abstract summaries of funded proposals, summaries of project data and research papers published by funded scientists from both the CMS and ABoVE programs as the basis of this analysis. These projects are described by the project title, the project abstract and names and affiliations of principal investigators of the project.

Topic modeling
To connect these papers and proposal documents we used a topic modeling algorithm to detect topics in the literature and to visualize their interrelationships. The Correlated Topic Model, or CTM (Blei et al 2007) is similar to the more-popular Latent Dirichlet Allocation, or LDA, model (Blei et al 2003). Both models can be viewed as an unsupervised classification model that use words as their basic units of analysis. In this model, words occur in documents and, in this case, each document is an entire research paper, including the title, abstract, text, figure captions and all references. Nothing was excluded from the paper in the analysis. Based on the distribution of words occurring across documents, the CTM to identifies groups of words that occur together across all documents, and these are analyzed as topics, where one document can contain multiple topics. Before running the model, we ran several typical text pre-processing steps, including removing numbers and punctuation, removing common English words, also known as stopwords, and removing suffixes of words, a process known as stemming, which ensures that two words derived from the same root, like 'climate' and 'climatic' , are counted the same. We also created bigrams, which treats two commonly adjacent words as one word, so the words 'climate' and 'change' would also be modeled as 'climate_change' . Finally, we removed words that occurred in fewer than 10% or more than 80% of documents. Of the words that appeared in over 80% of the papers, the 15 most common were 'use' , 'science' , 'differ' , 'can' , 'also' , 'model' , 'estimate' , 'studies' , 'provide' , 'system' , 'universe' , 'refer' , 'time' , 'avail' , and 'include' . The other common words can be found in table S1.
Using this basic framework, LDA models assume that topic proportions are drawn from a Dirichlet distribution, which assumes near-independence of the components of the proportions (Blei and Lafferty et al 2007). CTMs, on the other hand, avoid the strong independence assumptions associated with LDA models by using a logistic normal distribution, which, unlike a Dirichlet, can represent correlations between topics (21) across papers. While the CTM is much more computationally challenging, it has the advantage of identifying topics in corpora where topics are inter-related. Furthermore, it allows the analyst to explore relationships between topics, as we do.
A key hyperparameter in CTMs is the number of topics to identify (k). We estimated topic models for every 10 topics (i.e. 10 topics, 20 topics, 30 topics, etc.) because of the computational complexity of CTMs. Typically, the topic size is found by optimizing an evaluation metric, such as log-likelihood or model perplexity. However, for this data set, the model did not optimize at less than 100 topics (figure S1 (available at stacks.iop.org/ERL/15/115014/mmedia)), an unwieldy volume of topics to evaluate. Moreover, there is much subjectivity in selecting the number of topics in a model, because evaluation metrics do not always capture the semantic validity of a topic model (Chang et al 2009). We therefore used these metrics as a guide to the number of topics that led to a generally well-fit model rather than the sole determinant of the final model. We manually examined the results from multiple models to determine how the output matched theoretical expectations, as well as the semantic validity and coherence for the topics identified by each model and selected the number of topic models that represented the literature manually. We present results from the 10, 20 and 60-topic models here.
We created a label for each topic based on the individual words as well as the abstracts most associated with each topic, and we further grouped these topics into a set of four broader themes present in the literature. Additionally, we give the mean topic proportions across papers associated with CMS and ABoVE, to show the average proportion of a CMS and ABoVE paper that contains the topic. One advantage of a CTM over simpler topic modeling methods, such as the common LDA model, is that a CTM can model the covariance between topics, rather than assuming they are orthogonal (Blei and Lafferty et al 2007). Based on this covariance, a graph of linkages between topics can be derived using the covariance matrix of the topic proportions by modeling each topic as a function of the others in a regularized regression, with two topics that have coefficients being greater than 0 in their mutual regressions being linked (Blei and Lafferty et al 2007). Because the sparsity of a regularized regression is determined by a tuning parameter (λ 1 ) this parameter will also determine the connectivity of the derived graph. We therefore weight each edge based on the size of the tuning parameter that maintains connectivity between every pair of topics. Thus, in our resulting network diagram, the size of the line connecting two topics is proportional to the strength of the correlation between those topics, and the size of the box holding the topic name is related to the proportion of the total corpus that topic represents.

Institution modeling
Beyond our topic modeling analysis, we analyzed how CMS projects were related, both in terms of which projects specifically succeeded previous projects as well as which projects shared a project lead. We used database information on institution and year funded to connect funding to the evolution of research themes through time, and via institutions who have sustained funding.
Authors, institutions, and project descriptions for all years of NASA CMS are available on the website https://carbon.nasa.gov. We used a network diagraming approach to illustrate the connectedness of each project and the institution that the lead author is affiliated with. Two kinds of connectivity are illustratedeither projects with the same project lead scientist, or two projects that self-identified as 'successor' projects in the CMS database even if they have different lead scientists.

Results
Our results show that the scientists involved in the ABoVE and CMS programs have created a community of practice. In this section, we first present results relevant to the domain of the science produced by the programs, the practice that can be demonstrated across both programs, and analyses that demonstrate community relationships. Finally, in the discussion we will present the results in terms of what these programs have accomplished and conclusions we draw from the evidence on the effectiveness of the programmatic approach.

Domain connections
Our results show that the ABoVE and CMS programs have published papers on topics that are coherent and connected. Figure 1 shows that CMS and ABoVE programs are connected in their approach and use similar scientific analyses and datasets. The topics being studied use models such as gross primary production and modeling on topics, and include words such as 'prediction, parameterization, and covariation' . Both communities engage in research and publish papers that mention these terms. The colors in figure diagram represents the CTM model-calculated distances between CMS and ABoVE-associated papers, shown by the values in table 1.
Our resulting topics show groups of words that occur together and are missing together throughout the corpus; the presence or absence of a word associated with a topic strongly predicts whether the other words will occur in a given publication. When examining all words in the corpus of 521 research papers, we show in table 1 that 18 of the 20 CTM results represented analysis topics which both programs published research on. Some topics have greater representation from one project than another, such as papers on biomass topics and on methane emissions, which together represent 21% of the corpus of CMS, but only 6% of the ABoVE program. Both programs have funded work which has been instrumental in improving the United States' governments' knowledge of the forest inventory in Alaska, which previously had been beyond the ability of the US Forest Service to implement due to the extreme remoteness and cost of conducting traditional biomass estimates (Taylor-Rodriguez et al 2018). Merging of field data, models and satellite remote sensing in the ABoVE region of interest is also critical for CMS research ('biomass; plot; tree' 10.8% of ABoVE papers, 4.1% CMS papers).
Table 1 also shows that only two topics were only associated with the ABoVE program with no papers from CMS, both of which are either geographical in nature (arctic, tundra, Alaska), or specific to the topical focus of the ABoVE program (permafrost, snow, ice). For CMS, there is only one topic which does not also have papers from the ABoVE corpus represented, which is on wetlands, stock and carbon inventories. Figure S2 shows a topic model diagram with 60 topics instead of only 20 as is represented in figure 1. This more detailed diagram splits many of the topics seen previously into many more topical themes, with labels showing the three most representative keywords. This more complex diagram repeats many of the themes seen in figure 1, but with more detail.
Most of the literature revolves around landatmosphere interactions or are related to biomass. For example, in the 10-topic CTM (Table S2), the topic that captures the most amount of the literature are remote sensing Light Detection and Ranging (lidar)-related applications (14% of all papers), with polar research (30% of ABoVE literature), and deforestation and degradation (11.5% of CMS literature) also being very well represented. Both CMS and ABoVE research papers include information and results on integrating satellite data, particularly lidar data, with ground observations of forest inventory analyses, which is how governments monitor and manage both public and private forest ecosystems (15.8% of CMS, 11.7% of ABoVE literature from table S2).
Both programs state that their objectives include providing decision makers information on how land ecosystems are changing and on ways that satellite remote sensing products can be used to monitor the impact of government regulation and policies on conservation (management, urban, population, policy), however, only a small fraction of the literature discusses policy on these topics (5% of CMS, 3% of ABoVE research, table 1). This discrepancy is a common problem for scientific programs relevant to complex societal decision making with significant political and economic consequences of policy making (Moser andEkstrom 2010, Termeer et al 2011). However, some effort has been made in CMS to publish on how decisions can be improved with high quality scientific datasets (West et al 2013, Hurtt et al 2019. Atmospheric flux topics includes papers on flux inversion and atmospheric transport models (8% of CMS, 2% of ABoVE literature, e.g. Chen et al 2015, Liu et al 2016, as well as papers that use field data and satellite data to drive flux and transportation models of other greenhouse gasses such as methane, nitrous oxide and other species (11% and 2%). In addition, connecting models driven by satellite remote sensing of atmospheric concentrations to ground inventory data is an important theme of this topic (e.g. Chen et al 2016). Please see table S3 for the top 10 abstracts for each topic.
Only one topic shown in table 1 focuses on how rivers, oceans and water are changing due to climate change (8% of CMS, 1% of ABoVE literature) (Guo et al 2012, Huang et al 2015a. Only six research projects were funded in previous years of CMS that capture ocean biomass or lake biomass, and the oceans are not a focus of the ABoVE activity, although the impact of melting permafrost on hydrology is represented in CTM results (1% of the ABoVE literature studied). How oceans incorporate greenhouse gasses as a sink, and the improvement of terrestrialocean carbon fluxes in areas that have been subject to perturbations have been emphasized as an important topic of new research being solicited in the 2020 CMS funding opportunity.

Practice connections
Our results demonstrate that the ABoVE and CMS programs have developed a shared repertoire of interests, experiences, tools, and ways of addressing recurring problems, or a shared practice. Table  S1 lists vocabulary similar across all papers in the corpus, which are remarkably few given the millions of words and diversity of language in the 521 papers. Table 1.
Results from a 20-topic CTM using populations from both ABoVE and CMS. Each row is a topic and includes the words most associated with that topic that are not associated with any other topic, as well as the percentage of the CMS papers, ABoVE papers, and all (CMS + ABoVE) papers that consist of that topic. The first three words associated with each topic have been bolded, which are also the label for the network diagram in figure 1. We also provide a reference of a representative paper that consisted largely of the associated topic. gpp-gross primary production; nee-net ecosystem exchange. Truncated words, such as 'satellit' denote inclusion of both singular and plural forms. The topics that have been researched are remarkably consistent through time. Figure 2 shows different topics over time and the year the papers were published from 2010 through to 2019 from both programs, derived from the 10 topic CTM, also presented in table S2. The figure shows that the research produced by the two programs are represented consistently through time and that the body of knowledge, methods used, and tools developed to produce the knowledge has consistency and has grown through time, particularly on new approaches to characterizing vegetation height as can be captured by LiDAR data (Dubayah et al 1997). The dip in research production in 2019 is due to the fact that the papers were assembled in the summer of 2019.
The decline in publications seen in figure 2 in 2017 is explained by the fact that there were no projects selected for CMS in 2015. There were projects started in 2016 and 2017, but these would not be expected to have publications in 2017. There is usually a 2-year lag between a project receiving funding and when it is most productive in terms of publications. Furthermore, there was a hiatus in 2017 and 2018 when CMS had no science team meetings, due to changes in federal priorities. Working group meetings lagged during this period as well.
Congress re-authorized CMS in the 2019 federal budget.

Community connections
In pursuing their objectives, both programs hold periodic science team meetings and work together to address issues such as how to communicate uncertainty to stakeholder organizations (Lemos and Morehouse 2005). The terms 'uncertain' , 'signific' , and 'statist' are in the list of terms removed from 100% of the sample, and are presented in table S1. Engagement on research methods, models and approach has resulted in the development of relationships between the scientific team members across institutions and between institutions.
Both programs welcome new members in after every funding cycle, hear from stakeholders who use data products developed by funded researchers, and learn from each other through collaboration between and among institutional researchers during their Science Team meetings. Figure 3 Figure 4 shows the number of institutions and principle investigators from funded CMS and ABoVE programs each year. When ABoVE scientists are added to the CMS PIs, we add an additional four institutions to the four from CMS, including Boston University, Woods Hole Research Center, Oregon State University and the National Oceanic and Atmospheric Administration (NOAA) Earth System Research Laboratory. All the other scientists are from institutions with much smaller representation in the number of investigators participating in funded research.
To show the connection between research funded under CMS and that funded under ABoVE, figure 5 shows a network diagram illustrating all the authors who have submitted funded proposals during the entire history of both programs (also figure  S3). Each dot is a project, and projects that share a scientist are connected with a line. The network diagram shows how well connected the two groups of scientists are, although there are only three projects that were funded by both CMS and ABoVE funding (orange dots). Of all the funded projects over the past 10 years, only two were unconnected to the broader community, which both focused on using commercial off-the-shelf technology to measure total column methane and CO2 to better measure carbon elements in the atmosphere.

Discussion
The central feature of embedding engagement with stakeholders into ambitious, topically focused NASA Earth science research initiative, where scientists are required to propose new methods of data acquisition and use satellite data and modeling outputs to inform key decisions, has resulted in significant scientific contributions: • CMS researchers created a novel sub-hectare tree canopy map for the State of Maryland, which has been used to demonstrate that when urban and suburban trees are included in 'forest' biomass, the total above ground biomass for the region increased by ∼30% (Huang et al 2015b). • CMS researchers used airborne instruments to map methane emission hotspots in California to meet the needs of new legislation seeking to reduce the state's greenhouse gas emissions (Duren et al 2019).  the Global Surface Water dataset in the region, and meet the needs of communities relying on roads that can be easily submerged by melt water (Carroll and Loboda 2017); • ABoVE researchers are creating datasets that can document and measure factors that control bigger, hotter and more frequent wildfires across Alaska and the Arctic zone, and communicate these findings to threatened communities, such as Fairbanks (Miller et al 2016, Fresco 2019. These examples are just a few represented in the literature and published in Environmental Research Letters Special Issues for both programs, recording critical advances (Duncan et al 2020). By engaging with potential users of data products early in the research process, sustaining the engagement during product development, and being able to mature relationships between users and producers of scientific data over time, these programs have seen success in generating impact. Although evaluating the use of CMS and ABoVE data products within decision making processes is beyond the scope of this paper, the research and new knowledge produced by these programs is substantial.
Results presented (figures 4 and 5) show that approximately half of all projects that receive funding are affiliated with just a few institutions. There are both advantages and disadvantages in concentrating resources in a few institutions. In the case of CMS and ABoVE, these institutions bring researchers and existing relationships that have enabled highly productive and impactful engagements, resulting in new methods, new datasets and new scientific advances. An example of this is the US Forest Service, the University of Maryland and NASA Goddard, who have worked together since the start of the Landsat science program (Bryant et al 1980). CMS and ABoVE-funded research have enabled the operational integration of satellite data into the FIA system, which will result in substantial reductions in cost while improving accuracy over remote forest areas. This result built off of long-standing relationships, trust and understanding of the needs of the Forest Service and its procedures. In the complex relationship between science and policy, trust and personal relationships are critical (Hunt and Shackley 1999). However, when projects become over-concentrated in a few institutions, this can stifle innovation (Yin et al 2018) and unfairly favor well-established scientists at the expense of junior scientists and under-represented groups (Osterloh and Frey 2020).
More broadly, understanding the carbon cycle plays a key role in regulating Earth's global temperature and climate. Michalak et al (2011) set out three fundamental carbon cycle science questions, which the work reported here should respond to, given that NASA CMS is one of the primary ways that the US Carbon Cycle Science program funds research that responds to these questions. The questions are: • How do natural processes and human actions affect the carbon cycle on land, in the atmosphere, and in the oceans? • How do policy and management decisions affect the levels of the primary carbon-containing gases, carbon dioxide and methane, in the atmosphere? • How are ecosystems, species, and natural resources impacted by increasing greenhouse gas Figure 5. Relationship between coauthors in CMS and ABoVE programs. Each link between dots represents a project that links the two scientists. An 'active' graph is available here and in the supplemental materials, where each author is shown when the dot is clicked, along with their project name.
concentrations, the associated changes in climate, and by carbon management decisions?
Our results show that the research being produced by funded projects are focused on these questions, particularly Question 1 regarding how processes affect carbon cycle. For example, 12.3% of the literature relates to phenology and productivity, 11% on deforestation and degradation and 7.8% on wildfire topics, which contribute to changes in the carbon cycle (table S2). Question 3 is also represented, with significant effort (14.2% of the corpus) being put by the community put into connecting satellite remote sensing observations to models that involve processes that will result in carbon sequestration or emissions.
Question 2 is more policy oriented, with substantially less research being published by either program, with the exception of research on MRV (West et al 2013, Hurtt et al 2019. We found no specific topic in the 10 topic model outcomes that highlighted policy engagements (table S2), while for the 20-topic model, 'policy' was only a low-ranking keyword for one of the less prominent topics (table 1). How carbon management and policy decisions affect changes in the carbon cycle and emissions in North America is central to our ability to rapidly and effectively reduce greenhouse gas emissions. For example, understanding how agricultural practices affects soil carbon, and modeling these impacts across agroecosystems is an important contribution to better understanding how policy affects carbon sequestration (Spencer et al 2011). However, the CMS program does not include economic or policy analysts, and research on these is typically interdisciplinary and often being led by social scientists and researchers focused on understanding the carbon cycle and its anthropogenic constituents. Papers published by CMS researchers should be making a connection to 'policy' or 'management' so those keywords should show up at least a couple times in their papers, although we did not find this. Our results show that funded researchers need to be more ambitious and explicit in drawing the connection between their research and policy implications. In subsequent rounds of funding, more effort should be put into connecting basic research to policy outcomes.

Conclusions
The support and engagement provided by NASA through funding, website building, organizing meetings and providing stakeholder engagement has engendered a vibrant and active social network and community of practice across the CMS and ABoVE programs. Although significant effort has been made to create and distribute satellite-derived data products in both programs, more work is needed in documenting the use of these data products and their impact on policy and decision making. To create the most useful information, data products need to be created using repeated, iterative feedback from stakeholders. More research with scholars across multiple fields, such as decision science, political science, legal fields and others would enhance NASA's ability to ensure broad interest and participation in its carbon and arctic science agendas.