Tracking the Source of Solar Type II Bursts through Comparisons of Simulations and Radio Data

The most intense solar energetic particle events are produced by coronal mass ejections (CMEs) accompanied by intense type II radio bursts below 15 MHz. Understanding where these type II bursts are generated relative to an erupting CME would reveal important details of particle acceleration near the Sun, but the emission cannot be imaged on Earth due to distortion from its ionosphere. Here, a technique is introduced to identify the likely source location of the emission by comparing the observed dynamic spectrum observed from a single spacecraft against synthetic spectra made from hypothesized emitting regions within a magnetohydrodynamic (MHD) numerical simulation of the recreated CME. The radio-loud 2005 May 13 CME was chosen as a test case, with Wind/WAVES radio data used to frame the inverse problem of finding the most likely progression of burst locations. An MHD recreation is used to create synthetic spectra for various hypothesized burst locations. A framework is developed to score these synthetic spectra by their similarity to the type II frequency profile derived from Wind/WAVES data. Simulated areas with 4x enhanced entropy and elevated de Hoffmann Teller velocities are found to produce synthetic spectra similar to spacecraft observations. A geometrical analysis suggests that the eastern edge of the entropy derived shock around (-30, 0) degrees in heliocentric coordinates was emitting in the first hour of the event before ceasing emission, and that the western/southwestern edge of the shock centered around (6, -12) degrees was a dominant area of radio emission for the 2 hours of simulation data out to 20 solar radii.


INTRODUCTION
It has long been known that the solar corona is capable of producing a variety of intense radio bursts in narrow frequency bands with different characteristic variations in frequency, time, and intensity depending on the source (Wild et al. 1963). Among these bursts are type II and type III radio bursts, which were first observed in the 1950s with ground based radio observatories in frequencies slightly above the 15 MHz ionospheric cutoff (Wild 1950). These observations continued into the space age with single spacecraft observations like Wind (Bougeret et al. 1995) that observed these bursts in the lowest frequencies below 15 MHz. A Wind spectrum in Figure 1 illustrates typical examples of both type II and III bursts, identified by their narrow-band structure that drifts to lower frequencies, with type II bursts drifting more slowly than type IIIs. Type II and III solar radio bursts have been observed to be associated with the main types of Solar energetic particles (SEP) events: gradual and impulsive respectively (see review by Reames (2013)). Type III bursts are more numerous than type II bursts (Cane et al. 2002), but last a shorter amount of time and have a lower SEP fluence than the gradual event associated type II bursts (Reames 1999).
SEPs consist of high-energy ions and electrons, generally in the keV to GeV range, and are thought to be produced by solar flares and coronal mass ejections (CMEs) (Tsurutani et al. 2009). The most intense SEP events are caused by CMEs as they accelerate away from the Sun and plow into the extended solar corona (Reames 2013). These SEPs are a form of space weather that poses a threat to both satellites and humans in space. Energetic particles can access the Earth system through the polar regions, potentially causing harm to satellites and passengers on jets in polar routes. The higher the flux of incoming energetic particles, the further down in latitude their effects can spread. In near-Earth space, energetic particles can cause satellite damage and harm astronauts that are not protected by Earth's atmosphere or other means (Council 2008). Interplanetary CMEs (ICMEs) and their shock and sheath regions are also the leading drivers of geomagnetic storms (Ontiveros & Gonzalez-Esparza 2010;Kilpua et al. 2019). Understanding the physics behind these events is crucial to predicting their impact ahead of time to protect human lives and property, and untangling the relationship between SEPs and type II and III radio bursts is instrumental in this endeavor.
Electron beams associated with type III bursts were tracked in the 1970s by groups such as Frank & Gurnett (1972) and Alvarez et al. (1972) who found Langmuir waves associated with the electron beams were causing the type III emission (Gurnett & Anderson 1976, 1977. In addition, Lin et al. (1981Lin et al. ( , 1986 showed that these plasma oscillations occurred alongside a bump-on-tail electron energy distribution, indicating that a plasma emission mechanism was behind the type III radio emission. This same mechanism is thought to be behind the emission of type II bursts as well (Thejappa et al. 2007).
The prevailing physical theory for the plasma emission mechanism was first put forward in 1958 by Ginzburg & Zhelezniakov (1958), which has since been refined by them and many others over the years, as reviewed by Melrose (2008) and references therein. The basic idea is that the radio bursts are created in a multi-step process that starts with an electron beam, or a sub-population of electrons that are much faster than their neighbors. In the case of type III bursts, relatively small packets of jettisoned particles from flaring active regions on the surface of the Sun make up the beam with typical electron energies of 10 -100 keV (Reames 2013). For type II bursts, the beam is thought to be formed by particles somehow energized by a CME/interplanetary shock (Cane & Stone 1984;Cane et al. 1987) with Figure 1. Radio Spectral Data from Wind/WAVES. Type II and III bursts are identified by their narrow-band structure that drifts to lower frequencies, with type II bursts drifting more slowly than type IIIs. typical electron energies on the order of ∼ 1 MeV (Cliver & Ling 2007). Once the electron beam forms, it is unstable to wave-particle interactions that can generate Langmuir waves with frequencies around the local plasma frequency via the standard "bump-on-tail" instability. The waves that are the right frequency to resonate with the beamed electrons convert energy from the electron beam to the plasma waves, relaxing the distribution function to a smoother profile. Through a combination of three wave decay, ion scattering, and coalescence, a fraction of the power in the Langmuir waves can be converted into radio waves at the corresponding local plasma frequency, or its harmonic. These radio waves may then escape from their source region and propagate into space.
The local plasma frequency is only dependent on the density of electrons n e , being proportional to √ n e , and so the frequency of the burst trails down through frequency space as the source energetic electrons soar outwards from the Sun. An example of a Wind/WAVES radio spectrum (Bougeret et al. 1995) containing both types of emission is shown in Figure 1. Both types of bursts are identified by their narrow-band structure that drifts to lower frequencies, and may be differentiated by their frequency drift rate, with type II bursts drifting more slowly than type IIIs. The full formula for the local plasma frequency ω pe is shown in Equation 1, where the elementary charge of an electron e (in electrostatic unit of charge, esu) and the mass of an electron m e (grams) are constants, and n e is the local electron number density per cubic centimeter.
ω pe = 4πn e e 2 m e = 8.98kHz n e /cm 3 Type II emissions can be classified by their emitted wavelength, ranging from metric (m), to decametric-hectometric (DH), all the way to kilometric (km). Type II emissions can start at frequencies around 150 MHz, which corresponds to the plasma frequency a few solar radii from the Sun, and can drift past 1 AU where the local plasma frequency is approximately 25 kHz. Gopalswamy et al. (2005) found evidence that the kinetic energy of the CME seems to determine the lifetime of type II radio bursts, with the most energetic events having bursts that extend from the metric frequencies, down into the DH and kilometric ranges. Less energetic CMEs on the other hand have no type II emission, or emission that stops in the metric frequencies, before the DH wavelength ranges. Wide ranging surveys of events by Gopalswamy et al. (2008) found that basically all DH type II bursts are associated with SEP events at Earth if one accounts for magnetic connectivity. The intensity of the associated SEP fluxes also increases with CME speed and width (i.e. kinetic energy). Another recent broad ranging study by Winter & Ledbetter (2015) further solidifies this relationship where it is reported that every single SEP event seen by Wind WAVES from 2010-2013 with a peak > 10 MeV flux above 15 protons cm −2 · s −1 · sr −1 is associated with a DH type II burst, most of which occur alongside a type III burst.
With this general relationship cemented, the primary goal of this work is to better understand which plasma parameters are most important for type II burst generation within the context of CME structure and shock geometry. To do this, a methodology is developed to take CME simulation data and create synthetic type II bursts for hypothesized burst locations across the entire simulation time. Existing methodologies from those such as Reiner et al. (2007) for using type II radio data to profile the velocity and acceleration of a CME are extended in order to create a "stencil", which is an estimated spectral time profile employing Gaussian intensity profile fitted over the observed burst intensity. It is this Gaussian profile fit to the observed radio burst that the synthetic spectra are scored against. These two methodologies are then used as cornerstones in a framework in which the following optimization problem is posed:"What progression of simulated burst locations provide a synthetic spectrum that best matches a single spacecraft observed spectrum?" Within this framework, one may track different features of the simulated CME to test different hypotheses of what plasma parameters are causal in creating the type II bursts around CMEs. These burst locations in turn mark where the acceleration of energetic particles is happening as the CME expands throughout the heliosphere. These burst locations can then be fed to energetic particle codes like EPREM (Schwadron et al. 2010(Schwadron et al. , 2014) that track the magnetic connectivity with Earth to forecast energetic particle flux at Earth, though that is not done in this work. One promising long-term goal of applying this framework is finding a common set of thresholds of plasma parameters that create synthetic type II spectra that match observed spectra across many recreated simulations of major CMEs. Such findings would be valuable information for constraining physical theories of particle acceleration at CMEs, and would also be immediately applicable to real time space weather operations for those at NOAA's Space Weather Prediction Center and others. A set of key thresholds that mark the site of type II burst generation and particle acceleration would help enable more reliable space weather prediction, especially with the use of faster than real time simulations of incoming CMEs on the horizon.
In this work, these methodologies and optimization framework are developed with the focus on a single case study in the form of the well documented 2005 May 13 CME. A general outline of the remaining sections of the paper are as follows: in section 2, previous work in tracking CMEs and type II bursts is reviewed, with major steps in methodology and subsequent understanding being identified. Section 3 describes the simulation of the CME by Manchester et al. (2014) (performed with the Alfvén Wave Solar Model (AWSoM) developed by van der Holst et al. (2014)) that is used throughout the rest of the paper. Section 4 develops the methodology to create an event specific "stencil" of a type II burst from observed single spacecraft data. In this case, the methodology assumes CME velocity and solar wind density profiles to fit the evolution of the type II burst over time. This approach yields, for each moment in time, an idealized Gaussian spectral profile around a central frequency, which is used to create the "stencil" that synthetic spectra are scored on according to their similarity with the observed Wind/WAVES radio data. Section 5 develops the methodology to create synthetic spectra from the simulation, using the upstream electron density to determine the observed frequency profile for each moment in time. In section 6, the methodologies are applied to frame the optimization problem:"What progression of simulated burst locations provide a synthetic spectrum that best matches the single spacecraft observed spectrum?" Here a range of CME model features based on different physical thresholds and geometrical factors are tested to identify the likely site of burst generation around the CME. In section 7, possible sources of error and objections to the new methodologies are discussed and addressed to establish that the methodologies are appropriate for this optimization framework. In section 8 the results from this investigation are discussed and future work is laid out.

Theories of Particle Acceleration
Even with the general association of SEP events with CMEs and type IIs being well documented, the exact nature of the relationship is unknown in many respects. One of the largest outstanding issues is identifying which part of the CME is accelerating the particles, and how it does so. A variety of possible acceleration mechanisms have been proposed: one is diffusive shock acceleration at the CME-driven shock (Tylka et al. 2005;, second is SEP generation with CME expansion and acceleration in the low corona (Schwadron et al. 2015), third is SEP acceleration occurring at a current sheet sun-ward of the ejected flux rope (Forbes & Isenberg 1991;Forbes & Priest 1995), a fourth possibility is non-local acceleration is occurring diffusely over a broad region as coronal plasma is diverted and compressed by the expanding filament (Zank et al. 2015) and a fifth possibility is acceleration at compression regions in the CME sheath (Manchester IV et al. 2005;Kozarev et al. 2013). If the location and conditions of SEP acceleration within the CME were better understood, it would advance the capability to forecast and predict the occurrence and intensity of energetic particles at Earth. This is a very important problem, and much effort over the decades has gone into pinning down the origin of type II bursts within ICMEs.

Advances in Space Based Observations
A major methodological leap was achieved by Leblanc et al. (2001) in combining LASCO white light coronagraph data with radio observations to trace the progression of CMEs and their associated shock waves in relation to their observed type II bursts. They found that sometimes the inferred kinematics of the shock matched the observed frequency profile, but often there was some discrepancy, indicating a more complicated relationship. Reiner et al. (2007) also applied kinematic models to reproduce the type II frequency drift from the propagation of ICMEs/shocks, using SOHO LASCO white-light observations to constrain the initial speed of the ICME. Gopalswamy et al. (2009) later showed how STEREO coronagraph data from 1.4 -4 R can be used to observe CMEs at the onset of type II bursts.
Another methodological leap was enabled by the STEREO spacecraft in the form of multi-point observations for both radio and white light coronagraph observations. Reiner et al. (2009) used Wind/WAVES and STEREO/WAVES to give the first three-point measurements of type III bursts, constraining their beaming characteristics. Oliveros et al. (2012) utilized the same set of spacecraft to study a type II bust from an August 1, 2010 CME-CME interaction event. They reported a spread in type II radio source positions, indicating an complex and extended structure. Krupar et al. (2014) used both STEREO spacecraft to estimate multiple wave vector directions of type III bursts to provide estimates of their emitting regions and their corresponding source sizes. Mäkelä et al. (2016) also utilized multi-point goniopolarimetric observations from STEREO/Waves and WIND/Waves receivers in addition to LASCO white light coronagraph data to triangulate the source of type II emission in an event of CME-CME interaction in DH wavelengths. Through comparison of radio data and white light coronagraph data they found that scattering affects radio wave propagation at lower frequencies. Analysis of the radio data around 1 MHz from the Waves antennas for this event indicated that the source direction matched the front of the CME, which was imaged by the LASCO/C3 coronagraph. Studies like these that combine radio data with coronagraph images are powerful, but lack any way of observing many different plasma parameters like the magnetic field that may affect burst generation but cannot be remotely sensed. Magdalenić et al. (2014), like Mäkelä et al. (2016), used multi-point observations to pin down the likely location of the burst utilizing recent techniques of goniopolarimetric observations from STEREO/Waves and WIND/Waves spacecraft data. However, Magdalenić et al. (2014) also used an Enlil simulation and coronagraph images to compare the height of the CME with the radio data and track the propagation of the shock wave further than coronagraph data alone can. They found that combined observations of the March 5, 2012 event suggested that the southern flank of the CME was the source of the radio emission. Remote observations combined with simulated recreations of the event in question can provide additional insight into the physical parameters at the possible site of burst generation that cannot be remotely sensed, such as the magnetic field strength. However, Magdalenić et al. (2014) does not go this far, and uses the model merely to double check the shock arrival time and the kinematics of the CME. The following sections of this work will go further and use the simulation data to probe the plasma parameters at the hypothesized site of radio emission that would normally only be available as in situ observations. Progress in tracking type II bursts into lower frequencies continues with the Sun Radio Interferometer Space Experiment (SunRISE) (Alibay et al. 2018;Hegedus et al. 2019). SunRISE is a recently funded NASA Heliophysics Explorer Mission of Opportunity that will send 6 smallsats each with a 5 m dipole to observe radio frequencies between 0.1 -25 MHz, which promises to fill the gap in imaging data for type II and type III bursts in the lowest frequencies below the ionospheric cutoff. As opposed to multi-spacecraft radio observations done in Magdalenić et al. (2014) and Mäkelä et al. (2016), which use various direction finding techniques, the SunRISE receivers will be synchronized to enable true signal correlation and interferometry that has until this mission has been limited to ground based arrays.

Advances in Ground Based Radio Observations
There have also been efforts to combine ground based radio observations with space based observations of CME structure. Using radio observations from 164 -435 MHz by the Nançay radio-heliograph (Kerdraon & Delouis 1997), a number of metric type II bursts have been compared to EUV and X-ray maps in studies such as Klein et al. (1999) and Zucca et al. (2014). These metric type II events have also been studied in combination with the innermost edges of coronagraphs in works like Maia et al. (2000). These studies, while scientifically valuable, do not observe the DH type II bursts further from the Sun that are associated with the more intense SEP events . Recent progress has been made on this front with low frequency interferometers like LOFAR (van Haarlem et al. 2013) and OVRO-LWA (Eastwood et al. 2018;Chhabra et al. 2020) coming online. Zucca, P. et al. (2018) combined white light coronagraph data from SOHO and STEREO A along with radio interferometer data from LOFAR capturing type II burst harmonic emission between 50 -70 MHz. This was the first time concurrent radio positional observations could be made for the more energetic DH type II bursts. They found that the emission originated from the northern flank of the expanding CME. They employed a model named magneto-hydrodynamic around a sphere polytropic (MASP) developed in Linker et al. (1999). This model uses photospheric magnetograms from SDO/HMI to seed the simulation. One drawback to their approach is that this model utilizes a rather simple energy equation that cannot accurately describe fast and slow solar wind streams. It has also been reported by Rouillard et al. (2016) that this model can provide densities up to an order of magnitude too high. This is the same vein of methodology employed in this paper, except this work employs a more sophisticated MHD model and forms an inverse problem to find burst locations using synthetic spectra instead of using direction finding or imaging. The AWSoM model  used in this work, described more fully in the next section, uses a more realistic energy equation and more closely recreates observed plasma density (Manchester et al. 2014).
In another study combining ground based and spaced based measurements, Morosan et al. (2019) used LOFAR radio imaging in combination with EUV and coronagraph imaging to piece together the relative location of radio emission on an energetic 2017 CME. They recorded radio emission coming from the southern flank of the CME, then minutes later at the northern flank, where shocks were thought to form in both flanks in regions of relatively low Alfvén speed and high Θ Bn , where Θ Bn is the local angle between the shock normal and the magnetic field. These parameters were discerned by a relatively simple potential field source surface (PFSS) model that takes as input the observed photospheric field and outputs a model of the coronal magnetic field up to 2.5 solar radii.

Advances in MHD Simulations
Another work that employs simulations to better understand CME observations is Kozarev et al. (2013), which connects a BATSRUS MHD simulation of the same 2005 May 13 CME (also the focus in this work) to a SEP propagation code named EPREM from Schwadron et al. (2010). EPREM tracks the energization of particles, taking into account compressive acceleration around shocks and compression regions, as well as adiabatic cooling over the course of CME expansion. They employ this model to study the SEP acceleration around the CME between 1.8 -8 R , finding evidence of accelerated particle energies of 100 MeV, thought to originate in the sheath region right behind the shock. This is largely in agreement with the findings of this study, which finds that a large swath of the simulated entropy derived shock overlaps to some degree with observed radio data, though this work goes further in attempting to pinpoint where in the shock this process is happening.
Another approach to understanding radio emission around CMEs is taken in a series of papers by Schmidt & Cairns (2012a,b), where an analytic bolt-on model is described that runs on top of any MHD simulation, calculates the energy transfer from Langmuir waves generated by an electron beam to the field, and outputs predicted radio fluxes seen from Earth. This first analysis runs on top of an azimuthally symmetric 2.5D MHD model of a simple CME with no Carrington Rotation specific steady state pre-event solution. They later expanded on this work in Schmidt et al. (2013), where a full 3D MHD model of a February 15 2011 CME was used and passed into the analytic type II burst algorithm. There was good agreement of their simulated spectra with the recorded Wind/WAVES data, indicating that the framework described did a decent job at recreating the physics underlying the radio bursts. However, their results were still a factor of ∼ 10 − 30% off in frequency at some times, showing that there was still progress to be made. There is also the question of unique solutions, was this calculation just one of many that yielded similarly good recreations of the data? Applying this method across more events would help increase confidence in its results. This is just what they did in Schmidt & Cairns (2014), where they included analyses of MHD recreations of both the February 15 2011 CME and the March 7 2012 CME. Again they found significant overlap with their calculated spectra and the real data. In another more recent work Schmidt & Cairns (2016), the group demonstrates the use of their bolt on model with an MHD recreation of the 2013 Nov 29 -Dec 1 CME. They find good agreement between their model and the recorded STEREO/WAVES data, extending the comparisons out to 1 AU, with plasma frequencies below 100 kHz. This series of papers is in a very similar spirit to this work, with a key difference being that this work assumes no single burst generation mechanism, and instead creates many synthetic spectra to score and compare to the observed radio data.

Using in situ Plasma Data
There has already been some interest in determining which plasma parameters are important for generation of Langmuir waves, which act as post facto indicators of particle acceleration. Pulupa et al. (2010) analyzed Wind data of in situ measurements of Langmuir waves upstream of interplanetary shocks to see which plasma parameters were correlated with Langmuir activity. They found that the variable most correlated with Langmuir activity is the de Hoffmann-Teller speed. The de Hoffmann-Teller frame, mathematically defined in a following section, is the reference frame where the convective electric field vanishes on both sides of the shock. Θ Bn is also thought to be important in deciding the acceleration zone, and was also found to be highly correlated with Langmuir activity. However, Langmuir activity was found to be correlated with many other variables as well, including the strength of the upstream and downstream magnetic fields, but no single parameter being decisively predictive. Nevertheless, Pulupa et al. (2010) provides a useful starting point for what plasma parameters are likely to be successful in recreating observed radio spectra.
Studies like this one are useful for understanding which shock parameters may be important in SEP acceleration at CMEs, but they do not provide predictive capability since these plasma parameters are only observable in situ, and only provide information about Langmuir waves as a marker for the SEP acceleration site at 1 AU. A similar study can be done in silico, where a sophisticated simulation can provide synthetic in situ data over the entire simulated domain. Notably, this means that one can observe the plasma parameters when the CME is within 20 solar radii of the Sun, when most of the SEP acceleration happens. By combining this synthetic in situ data with the actual observed

SOHO C2 White Light Coronagraph
Wind/WAVES Radio spectrum Figure 2. Left: SOHO C2 coronagraph image of the 2005 May 13 "halo" CME that is Earth-directed. Right: Wind/WAVES radio spectrum accompanying the CME. The highly intense emission in white is the type III burst, while the secondary red stripe above it is the type II burst, which is sustained for over 33 hours for this event, all the way out to 1 AU. A white dotted line has been plotted over the first ∼ 2 hours of the type II burst for visibility.
radio data, one can frame the inverse problem of "Which hypothesized progression of burst sites matches the observed frequency profile the best?" Since the emitting frequency is only a function of upstream plasma density, extraction of this information from an MHD simulation allows one create a synthetic spectrum for a given hypothesized source progression and compare it to the observed spectrum. Given a quantitative way to assign a similarity score to a synthetic spectrum, one may directly compare hypothesized burst locations perhaps find a progression that nicely matches the observed frequency profile. This scoring methodology is fleshed out in sections 4 and 5. This way, one may directly test different combinations of physical thresholds and geometrical simulation features to determine what factors are causal in type II burst generation, and correspondingly, SEP acceleration. Solving this inverse problem also bypasses the need for simulating the exact physical processes energizing particles creating the bursts, and directly tells us what areas of the CME are most likely to be the site of particle acceleration. These locations may then be inspected more closely with the additional data from the simulation to see which plasma parameters are important for type II burst generation to occur. In the following sections, these methodologies are applied to the 2005 May 13 radio-loud ICME to identify likely regions where the type II burst was generated, utilizing an AWSoM MHD recreation Manchester et al. 2014), and Wind/WAVES (Bougeret et al. 1995) radio spectrum.

SIMULATING THE 2005 MAY 13 CME
The CME simulation (described in Manchester et al. (2014)) was performed using the Alfvén Wave Solar Model (AWSoM) (van der Holst et al. 2014), which models the time-average ambient solar wind over a Carrington Rotation (CR) based on photospheric radial magnetic field measurements from synoptic magnetograms. This level of detail is necessary for accurate simulations of upstream solar wind density as Manchester IV et al. (2005) showed that the ambient solar wind structure strongly affects the evolution of CME structure. The simulated CME is driven by a Gibson & Low (1998) magnetic flux rope that has some common pre-event characteristics, such as a dense core surrounding the flux rope and a dense helmet streamer with a cavity. This particular CME has been well studied, with works like Bisi et al. (2010) combining data from several suites of instruments to piece together a more complete picture of the event. A SOHO C2 coronagraph showing the "halo" CME for this event alongside its corresponding Wind/WAVES radio spectrum is shown in Figure 2.
The AWSoM model, recently developed at the University of Michigan (van der Holst et al. 2014), includes a phenomenological model of low-frequency Alfvén wave turbulence to heat the corona and accelerate the solar wind. At the inner boundary, the Alfvén wave energy is injected with the Poynting flux proportional to the magnetic field strength.
Counter-propagating waves are present on open field lines due to Alfvén speed gradients and field-aligned vorticity reflecting some of the outward propagating waves backwards. Wave energy enters both ends of closed field lines to produce near-balanced turbulence at the apex of the loops. AWSoM currently has 3 temperatures, an isotropic electron temperature, and separate parallel and perpendicular ion temperatures, though the results here were obtained from a 2 temperature simulation, one for electrons and a second for isotropic ions. To distribute the heating of turbulence dissipation across the different temperatures, the model uses the linear wave theory and nonlinear stochastic heating as presented in Chandran et al. (2011). AWSoM also incorporates both collisional and collisionless heat conduction into its treatment of electron thermodynamics. This all leads to a more realistic simulation that better matches real data (Sachdeva et al. 2019).
Results from the simulation of the 2005 May 13 CME event are seen in Figure 3. Here, different physical quantities relevant to SEP/acceleration and radio emission are shown 20 minutes after initiation. The panel on the right shows the magnetic flux rope being expelled from the corona with a peak speed of nearly 2000 km/s, which closely matches the speed fit by Reiner et al. (2007) for this same event. The left panel shows the full extent of the region of the simulation with local densities that are at least 3.5x the pre-eruption values to identify the density derived shock, colored to show the upstream plasma frequency. The central panel shows a region of strongly-shocked plasma identified by selecting regions of the simulation with entropy values that are at least 4x the pre-eruption values, with the surface colored to show the shock angle Θ Bn . Entropy is defined as S = p n 5/3 for plasma number density n and pressure p. The right hand panel shows only the magnetic field configuration. In all panels, the magnetic field lines are colored to show the plasma frequency.
Data from MHD simulations contains a wealth of information that would be impossible to get at similar cadence and resolution with in situ or remote sensing data. Each data point contains cell averaged quantities of magnetic field, plasma density, momentum, and internal energy. These parameters may then be used to calculate second order parameters like entropy, de Hoffmann-Teller Frame velocity, angle between shock normal and upstream magnetic field Θ Bn , and more. Some of these derived plasma quantities have been found to be highly correlated with Langmuir waves upstream of shocks from in situ observations (Pulupa et al. 2010). Model data extractions (iso-surfaces and field lines) using these plasma parameters along with geometrical sub-selections will be used in the following sections to test different hypotheses of burst locations by comparing their synthetic spectra with the observed Wind/WAVES spectrum.

METHODOLOGY FOR CREATING A SCORING STENCIL
Synthetic spectra do not have physical units such as V 2 or spectral brightness like spacecraft data, as they are not made by directly simulating the physics of burst generation. Briefly put, a synthetic spectrum is created by taking a suspected source region from an MHD simulation, and calculating the upstream plasma frequency for each point, and plotting the resulting distribution in a histogram for each time step, with the frequency bins being the same as the Wind/WAVES data. So it is not feasible to directly compare synthetic spectra with spacecraft observed spectra, as they have dimensionless units, consisting only of raw counts of simulation points with a certain plasma frequency, compared to power received at an antenna in a given frequency. Thus it is necessary to create a custom "stencil", i.e, a distribution over frequency and time derived from the observed data, so one may analytically assign a similarity score between the synthetic spectrum and the observed spectrum, using the stencil as a proxy. This methodology gives one the flexibility to be agnostic to the actual physics generating the type II burst, assuming only that the emission frequency is the upstream plasma frequency. This forms the basis of a framework where one can find test hypothetical burst locations by quantifying how well the formed synthetic spectra match the stencil.
Note that this framework would not be possible without the assumption of the emission frequency being upstream plasma frequency. It has been suggested by Bastian (2007) that some events have type II like emission that comes not from the plasma emission mechanism, but from incoherent synchrotron radiation from near-relativistic electrons around the CME or its sheath. If this were the case, one can potentially back out the distribution of energetic elections, but probably not constrain their location, since the emitting frequency is not tied to a location like it is for the plasma emission mechanism, where the upstream plasma frequency is affected by the distance from the Sun.
Since every CME is unique, having a different geometry and speed profile, each event requires a custom stencil to be created that synthetic spectra can be compared to. In the current literature, there are examples of using basic parameterizations of type III bursts to fit the underlying kinematics of the electron packet driving the burst. Dulk et al. (1987) simply fit a constant speed of the outgoing energetic electrons, while Reiner & MacDowall (2015) included a constant deceleration term alongside an electron density and Parker spiral model to accurately capture the radio wave travel times. They argue that the additional degree of freedom allows their model to better fit the frequency drift from spacecraft observed spectrum as well as match the arrival time of Langmuir waves or local emissions. Reiner et al. (2007) characterized the kinematics of CMEs from type II frequency drift, using SOHO LASCO white-light observations to constrain the initial speed of the ICME. One of the events they analyzed was actually the same 2005 May 13 event studied in this work. Here, this methodology is extended to create a physically motivated stencil based on CME kinematics.
The initial kinematics of the 2005 May 13 CME were derived from their recorded times in which the halo CME was detected with SOHO LASCO C2 (17:22 UT) and C3 (17:42). These times correspond to an initial speed of 1690 km/s and an extrapolated liftoff at 16:47. This initial speed however is not the true speed; it is only a plane of sky projected speed. Since the event in question is a halo CME, the true speed is higher than the plane of sky estimate. The event was detected in situ by the Wind spacecraft at 1 AU on 02:13:30 UT on 2005 May 15, implying an average speed of 1246 km/s. In addition, Reiner et al. (2007) found the measured radial shock speed at 1 AU to be 778 km/s. This large deceleration between liftoff and 1 AU is expected for fast CMEs; Woo et al. (1985); Sheeley et al. (1999); Manchester IV et al. (2004) have shown that this deceleration increases with the initial speed of the CME, explaining the rather narrow range in speeds of shocks observed at 1 AU.
As Reiner et al. (2007) describes, there are many possible simple kinematic solutions satisfying these constraints, with higher initial speeds being matched with quicker deceleration for shorter periods of time before settling at 778 km/s at 1 AU. They also note that the structure of the type II emission may be used to find the "true" velocity profile. By assuming a solar wind density model, one can convert a kinematic velocity profile for a CME-driven shock to a predicted radio profile over frequency and time. Reiner et al. (2007) opted for a simple r −2 density model determined by a prescribed density at 1 AU. They then adjusted the parameters of the kinematic profile until they found a good fit by eye to the center of the apparent type II event as the fundamental or harmonic emission. This kinematic model had v 0 = 2500 km/s, a deceleration of 26.7 m/s/s, and a deceleration time of 17.9 hours. The frequency profile described by this solution will be central in our own scoring stencil below. With this kinematic solution, one can see in Figure  4 that in the first 2 hours the type II emission is primarily at the fundamental plasma frequency, while after 19:30 or so the harmonic emission is dominant, continuing for another 31 hours before the shock arrives at 1 AU, coinciding with in situ observations of the shock by multiple spacecraft like Wind and ACE Bisi et al. (2010). This arrival of the shock may also be seen in the Wind/WAVES data as an enhancement in the type II burst at around 50 hours on the bottom panel of Figure 4.
One may also extend this characterization of the emission, fitting by eye a customized envelope created by manually choosing three time-frequency profiles using this quadratic model to partition the initial fundamental emission from 17:00-19:00. This can be seen in Figure 5 a, where the 3 lines surround the strongest part of the primary type II emission. The center frequency line from the Reiner et al. (2007) fit is used as the target frequency for each time step, with the lines surrounding it used to impose a Gaussian profile over the observed burst. This Gaussian profile is   normalized so that the middle frequency has weight 1. For every moment in time, each half of the identified emission in frequency space is given a different sigma value to determine its frequency width, and is defined with 3 sigma being the upper/lower end of the by-eye fit of the data. The simulation data used for this work is constrained to the first 2 hours of the event, which coincides with the period of the event where the fundamental emission is dominating. Further studies would require a straightforward extension to the stencil to include the harmonic frequencies.
The resulting Gaussian profile over frequency and time is shown in Figure 5 b, and can be thought of as a stencil to which a synthetic spectrum from an MHD simulation may be compared to and scored from, enabling a template matching technique that may be used to solve the inverse problem of finding the site of type II burst generation. The stencil is dimensionless, having no physical units, and thus may be directly compared to synthetic spectra if they have the same time and frequency structure. An example of the Wind/WAVES spectrum over-plotted with the described Gaussian stencil is seen in Figure 6 for 10 minutes into the type II event at 17:10. One can see in this figure that the type II burst is far from the brightest feature in the noisy data, making manual partitioning necessary for the creation of the stencil. A global element-wise multiplication between the synthetic spectrum and stencil followed by a global sum of all the channels over frequency and time will yield a similarity score. This similarity score is linearly proportional to the fractional amount of data points in a simulated CME feature whose upstream plasma frequency in its synthetic spectrum overlaps with the scoring stencil. The generation of these synthetic spectra will be further explained in the following section.

METHODOLOGY FOR CREATING SYNTHETIC SPECTRA
The primary goal of this work is to better understand which plasma parameters are most important for type II burst generation within the context of CME structure and shock geometry. Making this connection will lead to greater understanding of how gradual SEP events are created by CMEs. By employing state of the art AWSoM simulations, one can recreate historical major space weather events. Sophisticated simulations can provide synthetic in situ data over the entire simulated domain. Notably, this means that one can derive the plasma parameters when the CME is within 20 solar radii of the Sun, when most of the SEP acceleration happens. By taking different data extractions of the simulation based on physical thresholds for different physical processes or based on CME geometry, synthetic spectra can be made and compared with the actual radio data. This radio data needs to come from space, since the lowest parts of the type II and III bursts are emitting below the Earth's ionospheric cutoff. By matching synthetic spectra to the spacecraft-observed data, one may attempt to determine which physical thresholds must be exceeded in order for burst generation to occur. Confirmation of similar physical thresholds applied to synthetic spectra that generally match observations across multiple events would pin down the specific physics triggering the burst generation as well as the location of particle acceleration. This work is focused on a single radio loud event from 2005 May 13 at 16:47 as a test case for this new framework.
To this end, the first 2 hours of an MHD simulated recreation of the radio loud CME is compared with the Wind/WAVES radio spectrum to identify likely regions where the type II bursts are generated. Data extractions from the simulation are made based on combinations of physical thresholds, as well as CME/shock geometry, including solar longitude and latitude. From these data, synthetic radio spectra are made and compared to the stencil described in the previous section.
Subsets are extracted from the 3D data files of the AWSoM simulation, typically iso-surfaces that meet some physical criteria. These data files contain some basic plasma parameters that can be used to calculate secondary plasma parameters at the included simulation points, including the upstream plasma density used to calculate the upstream plasma frequency. The four main data types used here: iso-surfaces where points with at least a 3.5 MK temperature used to identify the CME current sheet, shocked plasma with a density compression factor of 3.5 or more, and points where the entropy changed by a factor of 4 or more, indicating the strongest (highest Mach number) shock. Each of these data features is produced with a time cadence of 2 minutes for the first 40 minutes of the simulation, and later to transitioning to every 5 minutes producing a total of 35 data files over 120 minutes. Two hours of simulation time brings the CME out to 20 solar radii, corresponding to a local upstream plasma frequency around 300 kHz. The one exception to this simulation sampling cadence is the selection for Fast Magnetosonic Mach # ≥ 3 sub-selection, which due to a lack of resources only combines the six times of [10,20,30,40,60,90] minutes after the start of the simulation. This results is a coarser data set that may still be used to inquire into the usefulness of this parameter, defined as the ratio of the shock speed to the fast magnetosonic speed, which has been shown to be associated with energetic particle acceleration at values > 3 Rouillard et al. (2016). With these broad data sets in hand, finer extractions can then be applied based on plasma parameters considered to be important for SEP acceleration such as upstream velocity magnitude and de Hoffmann-Teller frame velocity (Pulupa et al. 2010), as well as subsets based on the geometry and spatial location of the data. Synthetic spectra are created by calculating the upstream plasma frequency for each point and then plotting the resulting distribution in a histogram. This is done for each time step using the same frequency bins as the Wind/WAVES data. For every time step of data saved, some preliminary processing must be done. Depending on the simulation domain, different coordinate transformations must be performed.
The Solar Corona (SC) component of the MHD simulation goes out to 20 R and is done in a rotating reference frame; SWMF calls it Heliographic Rotation (HGR), while SunPy calls it Heliographic Carrington (HGC). The Inner Heliosphere (IH) simulation domain starts at 20 R and can go out to several AU. The IH domain is done in an Figure 7. Synthetic spectra from the MHD simulated entropy derived shock. Left: Uninterpolated 2D histogram for a subselection of the shock nose. Right: Synthetic spectrum interpolated to the 1 minute cadence of Wind/WAVES. In both plots, the columns are sum normalized to 1.0 for a maximum scoring potential of 1.0/minute. inertial reference frame; SWMF calls it Heliographic Inertial (HGI), while SunPy calls it Heliocentric Inertial (HCI). Data from both of these domains are transformed into the Heliocentric Earth Equatorial (HEEQ) frame, where the X axis is the Sun-Earth line, and Z is the Solar north pole. These coordinate systems are described fully in Thompson (2006).
The HEEQ data for each time step is then sorted by x coordinate, and a routine is applied to cull any low frequency points that appear to be behind higher frequency points from the Earth's perspective. This is done efficiently by employing fast algorithms for global nearest neighbor calculations. Starting with the largest x coordinate (closest to Earth), the plasma frequency is checked, then compared to any Sunward neighbors within a given angle on the sky, chosen here to be .001 solar radii from the vantage of Wind at L1, corresponding to 1 arc second in the planeof-sky from Earth. Any neighboring data points within this obscuration angle with a lower plasma frequency value than the Earthward neighbor is culled. This reflects the physical fact of absorption of these lower frequency waves when passing through higher density areas. Wind/WAVES was parked at L1 in the Sun-Earth line during this time period, so this line of sight culling is best done in this HEEQ frame to match the observed spectral data. Secondary scattering effects that can enable lower frequency emission to take non-line of sight paths around higher density areas that would normally absorb them are not considered in this work. The remaining points can then be further reduced using customized physical cutoffs to investigate the resulting distribution in upstream plasma frequency. There is also a special correction done on the entropy sub-selection to remove artifacts in the data that are much closer to the Sun than the rest of the extracted data. These points have upstream magnetic field magnitudes on the order of 1 Gauss, as opposed to the rest of the data with values of less than 0.1 Gauss, and constitute around 1% of the total data in the sub-selection. This correction is done for all figures using the entropy sub-selection in the paper.
To form the base of a synthetic spectrum, a histogram is created for each time step with the same frequency bins as the Wind data. This histogram makes a single column of the synthetic spectrum. The entire synthetic spectrum is a 2-D histogram, with each column sum normalized to 1. This synthetic spectrum is then interpolated to the same time cadence as the underlying data (1 minute for Wind/WAVES), while still enforcing the column sum normalization. An example of this interpolation process is shown in Figure 7. Then by employing an element-wise multiplication between the stencil and the interpolated synthetic spectrum, and then globally summing the resulting array, one receives a score for how well the synthetic spectrum from the selected simulation feature matches the spacecraft-observed data. Since the column sum of the synthetic spectrum is 1.0, and the stencil maximum is normalized to 1.0, a maximum score of 1.0/time step is achievable. The current system most rewards synthetic spectra that match the center frequency, with a Gaussian profile only marginally rewarding close by frequencies, but a constant 1.0 weight can also be applied throughout the entire stencil envelope for a less picky algorithm. Synthetic spectra for the strong shock region are shown in Figure 7. The left panel shows the synthetic spectrum as an uninterpolated 2D histogram, while the right panel shows the synthetic spectrum interpolated to Wind/WAVE cadences, plotted over the stencil envelope. The similarity score is seen in the title of this right panel, here a 29.26 out of a possible of 107. The maximum possible score is 107 because there is only 2 hours, or 120 minutes of simulation data, but the simulation starts at 16:47, and the type II stencil starts at 17:00, leaving a maximum overlap of 107 minutes. The slight distortion in the synthetic spectrum around 1 MHz is due to the transition from the Wind RAD1 receiver below 1 MHz that has a frequency resolution of 4 kHz to the RAD2 receiver above 1 MHz which has a frequency resolution of 50 kHz. When plotting these sets of frequency channels with different resolutions, an apparent distortion at their interface near 1 MHz can be seen where a greater portion of points per frequency channel are seen above 1 MHz with the increase in frequency bin width.

FINDING THE TYPE II BURST LOCATIONS
By analyzing the plasma parameters in detail from an MHD simulation of a radio-loud event, one may attempt to solve the inverse problem of determining the location of the burst generation site. One may extract different subsets from the simulated CME to create time-varying synthetic radio spectra using the upstream plasma density, and compare these to actual Wind/WAVES radio data to determine which hypothesized source region best matches the observed spacecraft data. This bypasses the need for simulating the exact physics that are creating the bursts, and tells us what areas of the CME are most likely to be the site of particle acceleration. These locations may then be inspected more closely with the additional data from the simulation to see which plasma parameters are important for type II burst generation to occur. Upper right: Synthetic spectrum made from the simulation with density enhanced (× 3.5) sub-selection. Lower left: Synthetic spectrum made from the Entropy enhanced (× 4.0) sub-selection. Lower Right: Synthetic spectrum made from the Temperature > 3.5 MK sub-selection. All synthetic spectra have in their title the calculated similarity score to the stencil, and also have the outer envelope of the stencil plotted in white dashed lines. The entropy enhancement sub-selection has the highest overlap with the stencil, reflected by the fact that its similarity score is the highest of the four, even after taking into consideration the shorter time period of the Mach # sub-selection. The slight distortion in the synthetic spectra around 1 MHz is due to the transition from the Wind RAD1 receiver below 1 MHz that has a frequency resolution of 4 kHz to the RAD2 receiver above 1 MHz that has a larger frequency resolution of 50 kHz, which captures a greater portion of points per frequency bin.
It is known from Aguilar-Rodriguez et al. (2005) statistical analysis of type II bursts that these bursts have a small bandwidth to frequency to ratio (BFR) of between 0.1 -0.4 in the frequency range 0.03 to 14 MHz of the RAD1 and RAD2 antennas. However, the MHD recreation of this 2005 May 15 event shows if the entire density enhanced shock was emitting a Type II burst, the frequency spread would be far larger. This effect can be seen in Figure 3 left and Figure 8 top right where the density enhanced region from the CME has upstream plasma frequencies spanning over an order of magnitude. In this section, thresholds of plasma parameters will be used to identify and track smaller parts of the CME through space and time to create synthetic spectra that can better match the spacecraft observed radio spectrum.

CME data extractions
The starting point for this analysis are the four sub-selections of data points from the CME simulation: first, points where the temperature lies at 3.5 MK, which occurs at the CME current sheet, second, points that are shock compressed to a factor of 3.5 in their density compared to the ambient level, third, points that have their entropy changed by a factor of 4 or more, and fourth, points that have a Fast Magnetosonic Mach # ≥ 3. These data, extracted as iso-surfaces from the 3D domain, are coming from the first 2 hours of simulation, with a time cadence of 2 minutes in the beginning of the run, transitioning to every 5 minutes later in the run for a total of 35 time points over 120 minutes. The extent of the simulation brings us out to 20 solar radii, corresponding to a local upstream plasma frequency around 300 kHz. The one exception to this simulation sampling cadence is the selection for Fast Magnetosonic Mach # ≥ 3 sub-selection, which due to a lack of resources only combines the six times of [10,20,30,40,60,90] minutes after the start of the simulation. This results is a coarser data set that may still be used to inquire into the usefulness of this parameter. With these data extractions in hand, finer selections can then be applied based on plasma parameters thought to be important for SEP acceleration, or sub-selections based on the geometry of the CME. Figure 8 shows the synthetic spectra from these four model source regions alongside the actual radio data in the top panel along with the stencil envelope. One can see that the bottom left panel with the entropy-derived shock front matches the actual frequency spectrum best, aligning with the gradual downward curve in frequency over time. The current sheet spectrum in the lower right stays mostly in the high frequency portion of the figure, reflecting the fact that the current sheet stayed close in to the sun even as the flux rope driving the shock moved outwards. The shape of this emission roughly matches that of a solar type IV radio noise storm (Takakura 1963). This may be proof that these radio storms originate from the current sheet formed behind a large solar shock wave. The synthetic spectrum on the upper right for the density enhanced region shows the correct general shape of the type II burst, but is far too wide when compared to the relatively thin bandwidth-frequency ratio (BFR) of the actual spectrum. This could be a problem with the shock compression region being too inclusive, but it does mostly overlap with the correct spectrum. The synthetic spectrum on the upper left for the sub-selection based off Fast Magnetosonic Mach # ≥ 3 also shows the correct general shape, and is more focused around the stencil than the density enhancement sub-selection. However, the data is not as closely aligned with the stencil as the entropy derived data. This sub-selection criterion will be a promising parameter to check on in future studies, but will not be the focus of the remainder of the paper.
In the following subsections further restrictions are applied to the entropy derived shock data, as it has the highest initial overlap with the type II stencil. This choice between the four initial sub-selections is easy to make just by eye, but one can also look at the similarity scores between the sub-selections to see quantitative evidence that the entropy shock sub-selection has the highest level of overlap with the observed type II burst data, even after taking into consideration the shorter time period of the Mach # sub-selection. Already one can see from Figure 8 that looking at entropy enhancements could be a part of a set of key thresholds that can be used to predict where type II burst generation and particle acceleration is happening in CME simulations, possibly leading to improvements in space weather forecasting.
When analyzing the speed of this entropy identified shock region, it is found that the outermost extent of the shock goes from 3.92 R at 8 minutes, to 23.99 R at 120 minutes. This corresponds to an average speed of about 2080 km/s, which is close to the expected average speed from the Reiner et al. (2007) fit of the CME kinematics from the frequency drift, which predicts an average speed of about 2400 km/s for this time period. The fit velocity profile of the simulated entropy derived shock is also decelerating, with an initial speed of over 2500 km/s, lending further confidence to the simulation's results.

Geometrical Data Selections
All (non-artifact) entropy enhanced data included Only data points with distance > 0.7*maxr included Only data points with distance > 0.8*maxr included Only data points with distance > 0.9*maxr included Figure 9. Synthetic spectra generated from subsections of the entropy derived shock front of the 2005 May 13 CME. Results are shown for data sub-selections that are above progressively larger fractions of maxr, the maximum radial distance of any point in that simulation time step. These outer data points naturally have a lower upstream plasma frequency, yielding a higher similarity to stencil derived from actual observed Wind/WAVES data.
One may further refine these data extractions to test the similarity of synthetic spectra derived from different geometrically defined areas in the CME. One can consider for each moment in time only keeping data points above some set fraction of the furthest out point with some radial distance maxr and discarding the rest. This would in effect only leave the portion of the shock furthest out from the Sun, which would typically have lower upstream plasma densities, and therefore lower plasma frequencies. One can see from Figure 9 that increasing the inner radial cutoff to 0.7*maxr (upper right), 0.8*maxr (lower left), or 0.9*maxr (lower right) in this manner can lead to a synthetic spectrum with a higher similarity score, with more of its remaining data within the envelope of the stencil. The fact that this is true for the entropy based sub-selection and not the density or magnetosonic mach # sub-selections is significant. It signals that the outer edge of the entropy derived shock is a potential site of radio burst emission. Here we take a moment to emphasize that this framework is a diagnostic tool, and is not expected to provide a guarantee of particle acceleration around simulated regions that provide high scoring synthetic spectra.
Each of the regions across the shock surface can then be subdivided even further to better isolate potential locations of radio emission. By looking at sub-selections based on solar longitude and latitude to create synthetic spectra, one can test hypotheses of stationary (with respect to the outward moving shock structure) burst production sites. Synthetic spectra from two geometrical simulation features performed on the entropy enhanced data with no global radial cutoff can be seen in Figure 10. The left panel shows the synthetic spectrum from the portion of the shock within −15 • -−12 • longitude, −9 • -−6 • degrees latitude that clearly lies inside the defined stencil envelope. The right panel shows the synthetic spectrum from the portion of the shock within −33 • -−30 • degrees longitude and Good geometrical Sub-selection, no radial sub-selection, 3x3 sector around (6 • , −12 • ) Bad geometrical sub-selection, no radial sub-selection, 3x3 sector around (−33 • , −9 • ) Figure 10. Comparing two geometrical simulation features of the entropy derived shock showing how certain areas of the shock better fit the stencil of the type II burst determined from spacecraft observed data. The scores for each of these geometrical sub-selections make up the similarity score histogram in Figure 11, the upper left panel of which has a white circle and a red circle to indicate the good and bad regions (respectively) used to generate the synthetic spectra in this figure.
−9 • -−6 • degrees latitude that clearly lies mostly outside the defined stencil envelope. This comparison shows how certain areas of the shock better fit the stencil of the type II burst determined from spacecraft observed data.
In addition to studying synthetic spectra from single sectors of the entropy based shock, looking at the geometrical distribution of scores of the event can be enlightening as well. Figure 11 shows a 2D spatial color contour graph where each cell corresponds to the similarity score for a synthetic spectrum created for a 3x3 degree geometrical simulation feature of the simulation, normalized to the maximum score of 107 over 120 minutes of simulation data. The upper left panel of Figure 11 shows this similarity score distribution for all the non-artifact data in the entropy feature, showing that the area of the CME to the southwest of the Sun -Earth line around 6 • Longitude and −12 • Latitude has a robust maximum. This panel has a white circle and a red circle to indicate the good and bad regions (respectively) of the shock used to make the synthetic spectra in Figure 10.
The other panels are also 2D similarity score graphs, but with progressively stricter global radial sub-selections, where only data points furthest out from the Sun are counted, with a time dependent radial threshold of 70%, 80%, or 90% of the maximum distance traveled by any point in the entropy enhanced feature at that time. All of these similarity score distributions with various radial cutoffs share a global maximum around (6 • , −12 • ). One can see that the similarity score distributions across the shock for global radial cutoff r > 0.7maxr maintains the maximum normalized score achieved with the no global radial cutoff subset of roughly 0.73, seen in the top 2 panels of Figure 11. However, the r > 0.7maxr radial cutoff also has a region in the northwestern portion of the shock where the similarity scores have improved relative to the no global cutoff subset. This implies that the r > 0.7maxr cutoff is getting rid of some secondary population of points slightly behind the main shock with slightly higher plasma frequencies, reducing the similarity score for those regions in the uncut subset. This r > 0.7maxr cutoff subset will be used for the remainder of the plots unless stated otherwise. Also apparent in these panels is that the feature of the shock around (−30 • , 0 • ) that has a constant normalized similarity score of around 0.3 across the different cutoffs, indicating that this area of the shock is at the leading edge. These eastern and southwestern edges of the shock will be more closely investigated in the following sections.

Derived Plasma Parameters
One can also use lists of potential key parameters from Pulupa et al. (2010) that are known to be correlated with upstream Langmuir waves to guide selection of simulation features. This work focuses on the upstream plasma data, starting with examining the upstream velocity magnitude and plasma frequency across the surface of the entropy enhanced MHD feature. Figure 12 shows the 2D spatial color contour graphs of upstream plasma frequency and upstream velocity magnitude across the entire surface of the shock at 20, 40, and 90 minutes after the start of the simulation, displaying the mean values of the parameters for each longitude & latitude bin. The plasma frequency was All (non-artifact) entropy enhanced data included Only data points with distance > 0.7*maxr included Only data points with distance > 0.8*maxr included Only data points with distance > 0.9*maxr included Figure 11. Score distribution of synthetic spectra made from different geometrical sub-selections of entropy derived shock over 120 minutes of simulation data. Each cell in the histogram corresponds to a 3x3 degree subset of the simulation data whose synthetic spectrum is created and similarity score calculated, using that score as the value on the histograms here, normalized to the maximum score of 107. The upper left panel has a white circle and a red circle to indicate the good and bad regions (respectively) of the shock used to make the synthetic spectra in Figure 10.
calculated with the upstream simulated electron density, and is used in creation of the synthetic radio spectra. The upstream velocity magnitude is shown as an example of a variable that is fairly well correlated with in situ observations of Langmuir waves, ranking 9/25th of the plasma parameters tested in Pulupa et al. (2010). The general pattern of upstream velocity magnitude across the shock front of having higher values on the southwestern side of the shock, smoothly transitioning to lower values on the eastern edge of the shock, is shared with the upstream magnetic field magnitude and upstream Alfvén speed, other plasma parameters that are fairly well correlated with in situ observations of Langmuir waves. The 2D spatial color contour graphs for upstream magnetic field magnitude and upstream Alfvén speed are not pictured here to save space. Pulupa et al. (2010) found that shocks with lower upstream plasma densities are more likely to exhibit Langmuir wave activity, indicating that the upstream solar wind structure is important in forming the shape of the shock and in any acceleration of particles nearby. This is another reason that it has taken this long for simulations to get to a point where studies like this are possible; only recently have simulations, like the one done in Manchester et al. (2014), accurately simulated the steady state solar wind structure of the inner heliosphere specific to the period preceding a simulated CME, before launching said CME and propagating it though the heliosphere.
Another plasma parameter classically thought to play an important role in particle acceleration at CMEs (and then Langmuir wave creation, followed by type II burst generation) is Θ Bn , the angle between the shock normaln and the upstream magnetic field vector B. Pulupa et al. (2010) has shown Θ Bn to be fairly well correlated with energetic electrons, with higher angles near 90 • having a higher level of association with electron beams near shocks. Another correlated parameter is de Hoffmann-Teller frame velocity, V HT , which was found to be the plasma parameter most correlated with energetic electrons by Pulupa et al. (2010). V HT is defined by the following equation.
V HT is closely related to Θ Bn , with the denominator, B u ·n = cos(Θ Bn )|B|, approaching zero as Θ Bn nears 90 • , causing V HT to become very large. Here, V u is the upstream velocity of the plasma in the shock rest frame. An approximation for V u is made by taking V u = V sim − 1500n, for simplicity assuming a constant shock speed of 1500 km/s in the direction of the shock normal. Here, V sim is the simulated upstream plasma velocity in the regular HEEQ framewhose magnitude is shown in Figure 12. Figure 13 shows the 2D spatial color contour graphs of Θ Bn and V HT across the surface of the shock at 20, 40, and 90 minutes after the start of the simulation. There is more structure within these variable contour graphs than the graph of V sim in Figure 12, with higher values around the edges of the shock that match well with the structure in the similarity score contour graphs. The area of maximum similarity around (6 • , −12 • ) coincides with an area of consistently high V HT over the 2 hours of simulation time. The other feature of note is the area of highly elevated V HT around the eastern edge of the shock at (−30 • , 0 • ), with V HT falling off after about an hour into the simulation.
One may use these correlated plasma parameters to provide additional restrictions on the MHD features and examine the resulting synthetic spectra to ascertain which thresholds produce the best match to the actual Wind/WAVES spectrum. This is done in the following subsection with V HT , as its structure across the entropy derived shock seems to have the best match to the features seen in the uncut similarity score distribution in Figure 11.

Sub-selections based on both Geometry and Plasma Parameters
Starting at the broadest scales and working down, Figure 14 shows synthetic spectra from a further reduction of the global radial cutoff subsets by imposing a minimum threshold of V HT > 2000 km/s across the entire shock. This value was chosen as it seems to partition nicely the areas of elevated V HT , and corresponds to roughly 1.3 times the simplified 1500 km/s shock speed used to calculate the V HT values. Since V HT is inversely proportional to cos(Θ Bn ), adopting a threshold of 1.3 times the shock speed means that 1/cos(Θ Bn ) > 1.3 =⇒ Θ Bn > 40 • , which will cut out most of the low Θ Bn data, but still be inclusive of elevated regions of Θ Bn , not just including the very highest. In situ shock analysis in Pulupa et al. (2010) Figure 3 shows that the cumulative distribution function for shocks that exhibit intense upstream Langmuir wave activity only really starts going up past 0.1 after Θ Bn = 40 • . These synthetic spectra show a broad overlap with the stencils, suggesting that the process creating the type II burst may be non-localized, with multiple points all across the shock passing some physical thresholds such as V HT > 2000 km/s to start the burst generation process. However, analyzing the similarity distribution as a function of geometry may reveal that the emission is concentrated to distinct regions on the shock. The following figures analyze these trends in the distribution of similarity scores in synthetic spectra that take both geometry and a threshold of V HT > 2000 km/s into account.
One may zoom in on that area of the shock that does best in stationary burst hypotheses, and see if it holds up with under various sub-selections based on thresholds of plasma parameters. This is done with a cut of V HT > 2000 km/s in Figure 15 where on the left is the synthetic spectrum for the 3x3 degree sector of the simulation around (−30 • , 0 • ) that matches the area of high V HT in the first hour of the simulation, and on the right is the synthetic spectrum for the 3x3 degree sector of the simulation around (6 • , −12 • ) with a high degree of similarity to the Wind data. On the left, one can see that this eastern edge of the shock matches the stencil quite well before tapering off after about an hour. On the right, one can see that the similarity score around the (6 • , −12 • ) edge is nearly unchanged from the left panel of Figure 10, lowering from 82 to 67, only a 20% dip, indicating that a population of points within this geometrical region of the shock with high V HT could be where at least part of the burst generation occurred.
One can also do a geometrical analysis of similarity scores while also including the variable threshold sub-selection V HT > 2000 km/s, the results of which are shown in Figure 16. The top 2 panels show similarity score 2D spatial color contour graphs calculated using only the first hour of simulation data, and are normalized as such. The top left  panel shows the distribution using the cutoffs r > 0.9*maxr and V HT > 2000 km/s, while the top right shows the distribution using the cutoffs r > 0.7*maxr and V HT > 2000 km/s. The bottom 2 panels show similarity score contour graphs calculated using the same respective cutoffs, but using all 2 hours of simulation data, and are normalized as such. So when comparing scores of the top row panels and bottom row panels, one must take roughly half of the upper scores to compare them on the same scale with the bottom scores.
From these contour graphs one may surmise a few things. First, the eastern edge of the shock around (−30 • , 0 • ) which coincides with a large swath of high V HT , is at the leading front of the shock for the first hour of the event, and matches the observed burst envelope quite closely in this period (see Figure 15 a). By looking at the two left panels, one can see that after 1 hour, the eastern edge of the shock does not gain any additional points to its similarity score, indicating that after that an hour, not much type II emission is originating from there. This can also be clearly seen in the synthetic spectrum for this area of the shock in the left panel in Figure 15. This area of the shock also has high Θ Bn and V HT values for the first hour of the simulation, as seen in the variable 2D spatial color contour graphs in Figure 13 a. This eastern edge of the shock can also been seen as a feature in the similarity score histograms, being most prominent in the r > 0.9maxr sub-selections. One can also see an obvious maximum around (6 • , −12 • ) on the Figure 15. Left: derived synthetic spectrum for the 3x3 degree sector of the simulation around (−30 • , 0 • ) using the r > 0.9*maxr cutoff, and a VHT > 2000. km/s cutoff. Right: derived synthetic spectrum for the 3x3 degree sector of the simulation around (6 • , −12 • ) using the r > 0.7*maxr cutoff, and a VHT > 2000. km/s cutoff. bottom two panels, and a feature at that area in the top right panel, coinciding with the previous similarity score contour graphs around the southwestern edge of the CME shock. This corresponds to areas where there is a strip of consistently high Θ Bn and V HT on the southern and western edges of the shock. A picture emerges where the key emitting parts of the shock are likely those with high V HT , with the eastern edge of the shock in the fist 40 minutes or so dominating the emission, then falling off, not being shocked strongly enough past a certain point, or some change in upstream magnetic field decreasing the local V HT . The western and southern edges of the shock are also seen to have points with consistently high V HT , with a maximum similarity centered around (6 • , −12 • ), roughly the southwestern corner of the main shock. This high V HT signature along much of the western edge of the shock persists through all but the beginning of the 2 hours of simulation time, achieving a high similarity score across a wide area in longitude and latitude before the V HT based sub-selection. After the V HT based sub-selection, the maximum around (6 • , −12 • ) is more pronounced, but the area around it remains relatively high, indicating that there may be some level of type II emission from much of the shock or shock edge where there exist areas of high V HT .

POSSIBLE SOURCES OF ERROR
MHD simulations have many limitations and simplifying assumptions and this one is no exception. A significant source of error arises in how the flux rope is inserted into the simulation to recreate a solar eruption and coronal mass ejection. The CME is initiated by inserting a Gibson-Low magnetic flux rope (Gibson & Low 1998) into the corona that is line-tied to the bottom boundary at the location of active region 10759 at 16:03 UT. The specific deformation of the flux rope (a mathematical stretching to draw the flux rope towards its origin, while preserving its angular size) leaves the flux rope in a state of force imbalance causing an instantaneous eruption. This treatment of the flux rope fails to capture the energy buildup and early acceleration phase of the eruption. There is also the matter of magnetic reconnection, which is done in the model through numerical diffusion, which is likely leading to reconnection rates faster than what would actually happen. More recent simulations by Török et al. (2018) capture the liftoff phase. Nonetheless, the important aspects that the simulation needs to capture for this study are the speed profile of the CME and the "upstream" mass density within 20 solar radii to calculate the plasma frequency. As discussed in Section 6, the initial speed of the simulated CME is just above 2500 km/s, close to the estimate of Reiner et al. (2007), which used the LASCO observations as an initial plane of sky estimate of the CME speed, before fitting the kinematic profile with the radio data. In some sense, the best test of the simulation's accuracy of the density structure close to the Sun is how well the synthetic spectrum generated by the shock, or whatever the emitting structure is, match the real data. As is seen in the previous sections, the synthetic spectra from simulations of the entropy derived shock largely match the observed radio data, lending confidence to the simulation's accuracy near the Sun.
There is also an argument to be made that it is not appropriate to compare a synthetic spectrum, which is really a column normalized 2D histogram of the plasma frequency of points in a selected feature over the simulation domain, to an actual radio spectrum which has physical units of spectral flux density. This would be a legitimate argument if this framework were trying to deduce the exact physical mechanism behind particle acceleration and type II burst generation and propagation all at once in this single work. This is not the case. This work is trying to provide physical thresholds of plasma parameters that when surpassed, seem to create spectra similar to entropy enhanced those actually observed. If these thresholds are consistent across simulations and observations of multiple events, that will be a significant takeaway that can inform forecasting models of Earthbound energetic particles. It would also provide heuristic checks that can be performed on fully fleshed out theories of particle acceleration and type II burst generation, potentially ruling out entire classes of theories.
There is also the issue that the model being used includes an adaptive mesh refinement (AMR) scheme. This implies that each point being counted may be of a different size. In this case, the blocks are refined in a spherical wedge centered on the trajectory of the CME, with blocks closer to the Sun being more highly refined. We argue that this fact is not an issue for the purpose of type II burst template matching. Many of the points identified in an initial phase of the eruption come from areas of high entropy that may be more highly refined than regions of the weaker shock. In general, the relatively small frequency bandwidth observed from type II bursts suggest that the emission source is localized or occurs from a region of uniform density and thus uniform radial distance from the Sun. Therefore it is likely that the model emission sources that are successful in recreating the observed radio spectrum come from localized areas of similar density and plasma frequency that are likely to be of the same level of refinement. Thus, for the proposed purposes of identifying useful physical thresholds or geometrical criteria that correspond to type II burst generation, apples to oranges comparison between synthetic spectra and spacecraft observed spectra is not a fatal flaw.

DISCUSSION AND FUTURE WORK
In this work, the simulated structures of the 2005 May 13 radio-loud ICME were compared with the Wind/WAVES recorded radio spectrum to identify likely regions where the type II burst was generated. This work attempts to solve the inverse problem of the location of the burst generation site by analyzing the plasma parameters in detail from an MHD simulation of the real event. Different physical structures (shock and current sheets) of the MHD simulations were extracted to create time-varying synthetic radio spectra and were compared to actual Wind/WAVES radio data to determine which hypothesized source region best matches the spacecraft observed data.
An event specific "stencil" was developed and used to assign a similarity score to the synthetic spectra based on how much of its data is aligned with the observed Wind/WAVES data. The stencil is an estimated spectral time profile employing a Gaussian intensity profile fitted over the identified type II fundamental emission. This fundamental emission was identified using a kinematic profile of the CME made by Reiner et al. (2007). The larger the portion of data from a given MHD feature whose upstream plasma frequency lies within this stencil, the higher similarity score it will receive. This scoring stencil is described in depth in section 4. This methodology bypasses the need for simulating the exact physics that are creating the bursts, and outputs what areas of the CME are most likely to be the site of particle acceleration. These locations were then inspected more closely with the additional sub-selections to the simulation data to see which plasma parameters are important for type II burst generation to occur. This methodology combines remotely sensed type II radio data alongside virtual in situ measurements through simulation data, enabling a detailed look at the plasma parameters at the best fit for the site of emission.
This technique was employed with several different model extractions to test the feasibility of different classes of particle acceleration mechanisms. The four main broad features used here were points with at least a 3.5 MK temperature used to identify the current sheet, points that were shocked to a factor of 3.5 or more in their density compared to before the shock wave, points that had their entropy enhanced by a factor of 4 or more compared to before the event, and points that have a Fast Magnetosonic Mach # ≥ 3. Each of these data sources had 2 hours of simulation data, with a time cadence of 2 minutes in the beginning of the run, transitioning to every 5 minutes later in the run for a total of 35 snapshots over 120 minutes. The one exception to this simulation sampling cadence is the selection for Fast Magnetosonic Mach # ≥ 3 sub-selection, which due to a lack of resources only combines the six times of [10,20,30,40,60,90] minutes after the start of the simulation. Synthetic spectra showing the upstream plasma frequency of these four broad MHD features alongside the spacecraft observed radio data reveal that of the four broad synthetic spectra, the one from the entropy enhanced feature best matches the Wind/WAVES observations both in spectral width and overall shape. This similarity is reflected in the computed similarity scores of each of these synthetic spectra, with the entropy feature having the highest similarity score.
With these broad MHD features in hand, finer sub-selections were also applied based on plasma parameters thought to be important for SEP acceleration, starting with sub-selections based on the geometry of the CME. The framework created here explores the optimization problem with a stationary burst assumption, one where the burst location has a constant longitude and latitude and continues outward with the outer edge of the entropy derived shock. An analysis of the similarity scores for different geometrical simulation features from the entropy enhanced data over the 2 hours of simulation time indicates that the area of the shock near (6 • , −12 • ) has the greatest overall similarity with the main band of fundamental emission in the stencil identified by the kinematic solution of the CME.
In addition to investigating the similarity of synthetic spectra generated from different geometrical regions across the entropy derived shock, further constraints on the data are considered, namely a threshold based on de Hoffmann Teller speed (V HT ). This plasma parameter is chosen both for its high correlation with in situ observations of Langmuir waves (Pulupa et al. 2010), and also for the fact that the distribution of this parameter across the surface of the shock generally matches the elevated features seen in the similarity score distributions around the eastern and southwestern edges of the shock, more so than other parameters like upstream velocity magnitude. The spatial distribution of the similarity scores after an additional constraint of V HT > 2000km/s was also examined, where it is found that the key areas of high similarity are unchanged, the matching peaks lending confidence to the suitability of using enhancements in de Hoffmann-Teller speed as a predicting factor of particle acceleration at CME driven shocks. Overall, the areas with the highest similarity in their generated synthetic spectra coincide with areas of high Θ Bn and V HT on the southern and western edges of the shock, as well as the eastern edge for the first hour. This shared structure was not present in other plots of plasma parameters across the entropy derived shock structure. The fact that the structure of de Hoffmann-Teller speed has the best match to the similarity structure of synthetic spectra across the shock is not particularly surprising, since Pulupa et al. (2010) found that this same parameter was most highly correlated with in situ observations of Langmuir waves at 1 AU.
In the future, additional studies utilizing this analysis framework across other historical events can be used to find common physical thresholds that create synthetic spectra that match the observed spectrum. If these thresholds are consistent across simulations and observations of multiple events, that will be a significant takeaway that can inform forecasting models of Earthbound energetic particles. It would also provide heuristic checks that can be performed on fully fleshed out theories of particle acceleration and type II burst generation, potentially ruling out entire classes of theories.
This analysis framework would also become more powerful with the addition of multiple spacecraft for imaging or localization at these low radio frequencies. Single antenna observations can measure the total flux density of solar radio bursts, but cannot pinpoint where the emission is coming from. Previous efforts such as Krupar et al. (2014) have been made in localizing low frequency emission from type III radio bursts using the two STEREO spacecraft, applying goniopolarimetric analysis techniques to deduce a wave vector direction and apparent source size at each spacecraft. Combining these measurements from two or more spacecraft allows approximate triangulation of the source. However, these direction finding techniques are not a substitute for true coherent imaging that is enabled at low frequencies by interferometry, as imaging yields information of the sky brightness pattern up to the resolution of the array. The imaging approach will be taken by the Sun Radio Interferometer Space Experiment (SunRISE) (Alibay et al. 2018;Hegedus et al. 2019), which will utilize 6 spacecraft equipped with dual-polarization low frequency radio antennas to form a radio interferometer in space. If reliable localizations of the source of the type II emission are available from SunRISE or other means, this MHD analysis framework allows probing of the plasma parameters at the observed location of the emission, enabling in depth studies of the physics of acceleration. At least one future paper will be dedicated to the combination of this MHD analysis framework with a simulated SunRISE data analysis pipeline to prove the feasibility of such an array, and hopefully more studies with actual data after SunRISE launches in 2023. The combination of observations of multiple type II events with SunRISE alongside simulated MHD recreations utilizing this framework will hopefully lead to a satisfying answer to the question of how CMEs generate type II bursts and accelerate particles.