A protocol to determine the least cost supply of coal to China with an installation-level optimization model

Summary This protocol provides instructions on how to run a linear optimization model that determines the cost-optimal supply of coal, from Chinese and foreign mines, to satisfy a given demand for coal in Chinese power and steel plants. We describe how to download the code and required software and detail steps to insert third-party data. We further provide instructions to adjust and define the scenario settings, build and solve the optimization problem file, and visualize the model outcomes. For complete details on the use and execution of this protocol, please refer to Gosens et al. (2022).1

coal demand in another node. These models include information on coal production capacity and cost for each producing node (in coal mines), transport capacity and cost for each link (railways, roads, waterways, and trans-shipment points), and coal demand levels in consuming nodes.
In comparison to these previous models, the IL-CCM stands out because of its very high geo-spatial resolution. The most detailed previous analysis with this modelling method represented China's coal production and demand conflated into provincial-level nodes, and the transport system conflated into inter-provincial links, for a total of 31 nodes and about 150 links. 7 The IL-CCM has installation-level location and technical detail for each individual coal mine, power and steel plant, as well as for all connecting rail, highway, UHV transmission, and shipping networks, for a total of 12,161 nodes and 40,396 links (for more detail see ''Methods S1''). A backcasting exercise shows that this high level of detail greatly helps model accuracy. The model predicts levels of Chinese coal imports from different suppliers that very closely match historically observed levels ( Figure 1).
The IL-CCM model uses input data stored in spreadsheet files. Data preparation, processing of the solution file created by the optimization software, and visualization of results is done in R. The file with the linear problem written out as a set of mathematical expressions (a .lp file), for use in linear optimization, is created in Jupyter on a cloud computing instance, whilst solving the problem file is done locally with the CPLEX solver.
We provide all code and data used in this protocol in a public repository, whereas earlier work with similar methods, at best, provided only mathematical descriptions of the optimization model, 7,8 or in one exception, model descriptions and data, 10 and is the first Star Protocol reporting a process for an energy system optimization model. This repository further includes scripts for editing the networks, and local and freeware alternatives for the optimization steps. The user will need a dataset on mine-level production costs and capacity. This is commercial data that could not be provided with the public repository. For further model details including a full mathematical description, see ''Methods S1''.

Download IL-CCM data and code repository
Timing: 10 min The IL-CCM repository includes all the required code and data to run the model. When editing files in this model, only change the values of input data, and do not change the names of files, or sheet or column names in spreadsheet files.
1. Download the repository with fully functional code, data, and folder structure from Zenodo: https://doi.org/10.5281/zenodo.7222764. 2. Extract the repository to any location on your computer.
Note: Most of the scripts used below use relative file paths. Any absolute file reference in the below will use ''/China-coal-model-star-protocol/'' to indicate the location of the extracted repository on your machine.
Get third-party coal mine data Timing: 1-2 h The repository described in the previous step does not include mine-level production costs, production capacity, and coal qualities, as this is proprietary third party data. The user can either use the same third-party data as was originally used in creating this model, or alternatively they may use their own coal mine data, or from any other commercial supplier, although it is substantially more work to insert such data into the model.
3. Purchase or prepare mine-level coal-mine data. a. Option 1: Purchase the ''China and Mongolia coal supply'' and the ''Seaborne Export Thermal Curve'' and ''Seaborne Export Metallurgical Curve'' from Wood Mackenzie (https://www. woodmac.com/). b. Option 2: Use any other data supplier or own assumptions. Data requirements are: i. A unique mine name for each mine.
ii. Locational data: what province the mine (cluster) is in for Chinese mines, or what country a mine (cluster) is in for foreign mines. iii. A technical production capacity and total cash cost in US$/t. iv. Coal quality data, specifically: whether it is thermal, hard coking, semi-soft coking, or pulverized coal for injection, as well as calorific value (in kcal/t) for thermal coals.
Note: the repository described in the step ''Download IL-CCM data and code repository'' still includes the coal mine data sheets with the required layout. These files have been censored by replacing data with arbitrarily picked numbers for key technical and economic indicators. That means the model will run fine, but will produce unrealistic results until these coal mine data sheets have been updated with real data.
Note: We recommend using a ''compute optimized'' instance, with at least 16 virtual CPU and 32 GB of memory, i.e., an instance of type ''ml.c5.4xlarge'' or better.
Note: this step is optional but recommended. The cloud computing service is used to run the code that builds the problem file for use in the optimization studio. This can be done on a desktop computer but will take considerable time. Below are instructions for two alternative popular providers of these cloud computing services.
Sign up for a geocoding api The IL-CCM code and data repository is complete apart from the coal-mine level data, which is proprietary and has been censored. This step is to replace the censored data with real data.

Deposited data
The The original model was created with the coal supply dataset from Wood Mackenzie. 19 Inserting this same data is straightforward as the layout and variable naming from this dataset have been preserved.
The user can also use their own coal mine data, or from any other commercial supplier, but inserting such data into the model will require additional steps, as described below.
1. Option 1: when using the coal supply dataset from Wood Mackenzie. 19 a. Browse to the Wood Mackenzie data dashboard, and download the full set of production cost data, production data, and coal quality data by mine and year in csv format, and download the global supply curve data.
Note: We have provided spreadsheet files that allow the user to copy paste the entirety of the original Wood Mackenzie data into a sheet called 'input'. The spreadsheets have formulas that automatically find the correct values on those 'input' sheets and transform these into a data layout as expected by the model.
b. For the files in ''/China-coal-model-star-protocol/1 input/WoodMac data -template'': i. Copy the entirety of production cost data into the sheet 'input' in the file ''selected china coal mine cost data woodmac -CENSORED.xlsx''. ii. Copy the entirety of production data into the sheet 'input' in the file ''selected china coal mine prod data woodmac -CENSORED.xlsx''. iii. Copy the entirety of coal quality data into the sheet 'input' in the file ''selected coal quality data by mine and type -CENSORED.xlsx''. iv. Copy the entirety of the global cost curve data into the sheet 'input' in each of the following 3 files ''Seaborne Export Thermal -non energy adjusted Worksheet -CENSORED.xlsx'', ''Seaborne Export Met Worksheet -CENSORED.xlsx'', ''Australia met coal worksheet -CENSORED.xlsx''. v. Copy all of the files in the folder ''/China-coal-model-star-protocol/1 input/WoodMac data -template'' into the folder ''/China-coal-model-star-protocol/1 input/WoodMac data'', overwriting the files there in the process. 2. Option 2: when using mine-level data from other sources.
Note: We have provided spreadsheet files with templates including a small set of example entries for each sheet that needs to be adjusted. Each of these template files contains a sheet named 'possible values' to explain restrictions on the values in each column. These spreadsheets are included in ''/China-coal-model-star-protocol/1 input/alternative coal datatemplate''.
a. Insert coal mine data as follows: i. In the file ''selected coal quality data by mine and type -CENSORED.xlsx'': for every mine, define its name. Define whether its product type is thermal or metallurgical coal.
Note: For metallurgical, also define product sub-type (hard coking coal, semi-soft coking coal, or pulverized coal for injection. For thermal coal, a product subtype can be included but this information is not used. For thermal coals, define a calorific value either as net as received (SE nar) or as gross as received (SE gar), or both. For metallurgical coals define their coking strength after reduction (CSR) number.
ii. In the file ''selected china coal mine prod data woodmac -CENSORED.xlsx'': for every mine, define its name. Define whether its product type is thermal or metallurgical coal. Note: For metallurgical, also define product sub-type (hard coking coal, semi-soft coking coal, or pulverized coal for injection. For thermal coal, a product subtype can be included but this information is not used. For each year considered in your analysis, define an operating status and a production capacity for each mine.
iii. In the file ''selected china coal mine cost data woodmac -CENSORED.xlsx'': for every mine, define its name. Define whether its product type is thermal or metallurgical coal.
Note: For metallurgical, also define product sub-type (hard coking coal, semi-soft coking coal, or pulverized coal for injection. For thermal coal, a product subtype can be included but this information is not used. For each year considered in your analysis, define the total production cost, and where this data is available, royalties, product transport cost, and port fees.
iv. In the file ''Seaborne Export Thermal -non energy adjusted Worksheet -CENSOR-ED.xlsx'', in the sheet ''worksheet navo mines connectors'': for every thermal coal mine supplying the seaborne market, define its name, and the country and 3 letter iso country code it is located in. Also define its production capacity, production cost, and the calorific value of its output.
CRITICAL: The destination node name (column dest-node_name) has to list the node that connects these foreign mines to the rest of the node-and-link network. There is one such node for each country, with values provided in the 'possible values' sheet. Do the same for all Russian thermal coal mine connected to the Chinese market via rail or road connections, in the sheet ''worksheet ovld mines connectors''.
v. In the file ''Seaborne Export Met Worksheet -CENSORED.xlsx'', do the same as in last step, but for metallurgical coal mines. Define their coal subtype (hard coking, semi-soft coking, or PCI), and not their calorific value. Exclude Australian mines in this sheet. vi. In the file ''Australia met coal worksheet -CENSORED.xlsx'', do the same as in last step, but for Australian metallurgical coal mines. vii. Copy all files in the folder ''/China-coal-model-star-protocol/1 input/alternative coal data -template'' into the folder ''/China-coal-model-star-protocol/1 input/WoodMac data'', overwriting the files there in the process. b. Connect the mines defined in the above step to the rest of the node-and-link network.
Note: For this step, we have provided template files with a small set of example entries. Each of these template files contains a sheet named 'possible values' to explain restrictions on the values in each column. Adjust as follows: i. From each of the files created in step a above, collect all values in the column 'orig_no-de_name', and create a list with unique values of node names (i.e., each node appearing only once) in the file ''/China-coal-model-star-protocol/1 input/helper files/mine node list helper.xlsx''.
Note: the above mentioned file in the code and data repo contains exactly such a list for the node names as used in all of the files described in step a above.
ii. Copy the list of mine node names from step i into the file ''/China-coal-model-star-protocol/1 input/alternative coal data -template/helper files/node location helper.xlsx'', sheet ''loclist'', pasting the list in column A, row 11438 onwards (highlighted in green in the spreadsheet). iii. For each mine node in step ii, add latitude and longitude. ll OPEN ACCESS iv. Copy the list of mine node names from step i into the file ''/China-coal-model-star-protocol/1 input/alternative coal data -template/helper files/mine name to location.xlsx'', sheet ''Sheet1'', pasting the list in column B, row 2 onwards (highlighted in green in the spreadsheet). v. For each mine node in step iv, add the location, which is either the country name, or for mines in China, the name of the province. For mines in China and Mongolia, also select the corresponding basin name. ix. For each mine node in step vii, Also define whether the connection type between the mine and destination nodes is via rail or road, and what distance the link represents. x. Copy the contents of the folder ''/China-coal-model-star-protocol/1 input/alternative coal data -template'' into the folder ''/China-coal-model-star-protocol/1 input/'', overwriting the files there in the process.

Adjust scenario inputs and create model build files
Timing: 1-2 h These are instructions on how the user may define their own scenario settings for coal production, consumption and transport infrastructure capacity. Running the build preparation script (step 9 below) gathers the scenario settings from their respective spreadsheet files and processes these into model build files which are in the correct format to be read by the optimization script used in the next step.
Note: This file contains explanations on the calculations made in each of the spreadsheets in the folder ''/China-coal-model-star-protocol/1 input'', including on the values to be adjusted for scenario settings.
Note: Demand levels can either be set at national levels, with an automatic distribution to provincial-level demand calculated by these sheets, or the user can adjust provincial-level demand directly.

Run the script that creates the optimization problem file
Timing: 10 h This step will take your scenario build files, and process it these with the Linear Programming modeler Package ''PuLP'' (https://pypi.org/project/PuLP/). You can do this on a desktop machine, but it is recommended to be done using an online cloud computing service. These steps will create a problem file (extension ''.lp'') to be used in a solver in the next step. f. Press F5 to run the file. g. The problemfile created by these steps will appear in the folder ''/China-coal-model-starprotocol/3 problemfiles" after a given period of time, depending on the computing power of your desktop machine, between circa 10 h to many days.

Solve the optimization problemfile
Timing: 10 m This step will take your problem file, which is a text file describing the optimization problem, and find the minimal cost solution to the problem. This solution file will be saved to '' /China-coal-modelstar-protocol/4 solutionfiles''.

Prepare visualizations of updated scenario runs
Timing: 10 m This step will take the spreadsheets with results from last step, and create visualizations of your scenario settings.
17. Open and run each of the six R scripts in the folder ''/China-coal-model-star-protocol/ 6 figure prep scripts''.
Optional: Expanding or adjusting the network in the model

Timing: 1-10 h
This step is only needed when the user wants to expand the node-and-link model. If the user simply wants to adjust transport capacities of links currently included in the model, this can be done by following the instructions in step 6. If the user wants add nodes or links to the network, it is important that these are included into the full node-and-link model in the correct way. We cannot provide precise protocol steps for every possible expansion of the model, but provide a number of generic steps and scripts to do this below.
As an example, we have included the exact data and code that we used to identify driving distances between each city and its nearest neighboring cities, in the folder ''/China-coal-model-star-protocol/9 geocoding network additions''.
c. Run this script; this saves location names with latitude and longitude data into the file ''/China-coal-model-star-protocol/9 geocoding network additions/locations geocoded.xlsx''. 19. Determine geodesic distances.
Note: This step takes the geocoded locations from step 18, and creates a matrix with a combination of each location with all other locations in list of nodes from step 18. This is used to calculate distances from each node to every other node, using the latitude and longitude data of each point determined in step 18, as geodesic or ''as the crow flies'' distances. For each node, it retains links with the 12 nearest other nodes. For nodes even further away, we consider that these can be reached via other nodes. This ensures that each node is connected, whilst reducing the total number of links.
Note: The example files currently included with the data and code repository contain a list of Chinese cities and their latitude and longitude.
Note: This step takes the pairs of nodes that are relatively close to each other, as determined with ''as the crow flies'' distances in step 19, and finds the real world driving distance between these node pairs. a. Open script ''/China-coal-model-star-protocol/9 geocoding network additions/geocode distances.R'' and copy your Google Maps api credentials into line 21: b. Run this script; this saves location names with geodesic as well as driving distances into the file ''/distances 12 nearest geodesic and geocoded.csv''. 21. Insert these new connections in the appropriate file in the folder ''/China-coal-model-star-protocol/1 input/network connector files''.

EXPECTED OUTCOMES
The expected outcome with each run is 1) a problem file; 2) a solution file; 3) a set of spreadsheets with coal suppliers and coal flows; and 4) a set of visualizations of these model outcomes. The IL-CMM was built specifically to determine levels and origin of coal imports, at given levels of Chinese demand for coal, and with constraints on mine-level production capacity and transport capacity of different types of transport infrastructure. Model outputs are therefore focused on visualizing the origin of supply, the networks used to get to the final source of consumption, and changes in flows of coal under different scenario settings. For thermal coal, these results shows that imports have been fairly stable over the period 2015-2019, but are expected to come down sharply over the period to 2025, even in the scenario of continued high Chinese coal demand. Indonesian imports of thermal coal are expected to fall most sharply, both in total volume as well as in percentage terms, followed by Australian imports. This is due mostly to improvements in transport infrastructure, in particular rail lines that have become operational over 2020-2022, and some that are in currently in advanced stages of construction. These new rail projects strongly lower the cost of delivery of coal from domestic mines. The reduced demand for imported coal is further due a continuing improvement in power plant efficiency. For more details, see the original analysis in Gosens et al. 1 For coking coal, there has been a steady rise of imports that have tracked closely with total volumes of Chinese steel output. Over the period through to 2025 and beyond, total Chinese coking coal consumption is expected to fall, as crude steel production is expected to have peaked. A simultaneous increase in the availability of steel scrap improves the supply of secondary steel (made from recycled scrap) and reduces the need for primary steel (made with iron ore and coking coal). Total imports of coking coal are still expected to rise, as a new rail line connects the steel making heartland of China directly with a new coking coal mine in Mongolia, which produces high quality and low cost coking coal. The additional imports form Mongolia are expected to push out imports from Australia ( Figure 2).

OPEN ACCESS
key coal mining region around Ordos in Inner Mongolia, to China's central provinces (so flowing north to south). This means less coal needs to flow west towards Bohai Bay and then south via ship, drastically reducing the cost of coal deliveries, as was the intention of the project. 7 The flows of seaborne imports of coal appear very light red, as these changes are small compared to changes in coal from domestic sources, but still represent a significant change to the total volume of imported thermal coal (compare Figure 2). For coking coal, there is a clear reduction in flows of coal throughout most of China in both the scenarios with high (plot b) and low (plot d) demand. This illustrates that reduced demand for primary steel leads to lower consumption and therefore shipments of coking coal. Reductions in shipments from both domestic and seaborne suppliers are further exacerbated by increases in coking coal shipments from neighboring Mongolia, showing up as the only green streak in both plots b and d in Figure 5.

LIMITATIONS
The current model is a highly detailed model of China's coal market. The demand and supply conditions in the rest of the world are included in very simplified form only, and cannot be adjusted. The model is also purely focused on coal consumption; it does not include cost-optimization between different sources of electricity and instead uses exogenous input from user defined demand scenarios to determine cost-optimal supply of that demand.

TROUBLESHOOTING
Problem 1 A package throws a certain error in R or Python.

Potential solution
It is recommended to use the environment files for both R and python provided with the code and data repository. These have been tested to run as a self-contained package. Instructions on how to use either of these environment files are provided under step ''install the required software''.

Problem 2
The solver reports that the problem is infeasible.

Potential solution
This is a generic issue that could happen when defining problem files. It indicates an incompatibility between different constraints in the model. Investigate by opening the problem text file and finding the definition of the constraint that the CPLEX Optimization Studio reports as causing the infeasibility. A number of likely causes are:

OPEN ACCESS
You've set demand levels for a certain type of coal at level exceeding the production capacity of all the mines in the model. This is an issue especially with PCI, which has the least supply slack in the model. You've set provincial level demand for coal-fired power or steel at levels exceeding the provinciallevel production capacity of coal-fired power plants or steel plants.
You've set a reduction in transport network capacities that disconnects different parts of the network from each other. This is unlikely as almost all nodes can also be reached by trucking networks.
A special case is that the provincial-level city of Beijing has no operational power plants in the dataset after 2029; any coal-fired power demand must be delivered via UHV networks. Removing the UHV network whilst simultaneously setting a non-zero coal-fired power demand in Beijing will cause infeasibility.
The solution is to investigate whether and which demand, supply, and transport constraint cause the infeasibility and adjust scenario settings.

Problem 3
When running the script in AWS Sagemaker it reports an error loading the files from the S3 bucket. This is likely due to a version of boto3 (the software package that allows python to communicate with other components of the AWS infrastructure such as the file storage system), as installed on AWS Sagemaker by default, that is incompatible with the provided script.

Potential solution
Uninstall the incompatible version of boto3 and its dependencies, and install a version that does work. To do so, run below code in a new cell in your jupyter notebook:

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Jorrit Gosens (jorrit.gosens@anu.edu.au).

Materials availability
This study did not generate new unique reagents.