The Evolution of Flare Activity with Stellar Age

Using a recent census of flare stars from the Kepler survey, we have explored how flare activity evolves across stellar main sequence lifetimes. We utilize a sample of 347 stars with robust flare activity detections, and which have rotation periods measured via starspot modulations in their Kepler light curves. We consider three separate methods for quantifying flare activity from optical light curves, and compare their utility for comparing flare activity between stars of differing ages and luminosities. These metrics include: the fractional luminosity emitted in flares, the specific rate of flares emitted at a given energy, and a model for the entire flare frequency distribution. With all three approaches we find that flare activity decreases for all low-mass stars as they spin-down, and thus with age. Most striking is the evolution of the flare occurrence frequency distributions, which show no significant change in the power law slope with age. Since our sample is preferentially constructed of younger, more active stars, our model over-predicts the super-flare rate previously estimated for the Sun. Finally, we parameterize our best-fit model of the flare frequency distribution for ease in predicting the rates of flares and their associated impacts on planet habitability and detection.


INTRODUCTION
Magnetic activity comes in a wide range of observable phenomena, including UV and Xray luminosity, cool starspots, and flares. Each of these signatures is observed to decline over time for low-mass stars on the main sequence. As the star loses angular momentum via stellar winds, the rotation velocity decreases and the internal magnetic dynamo is quieted. This ageactivity connection was outlined in the seminal work by Skumanich (1972), which has been directly connected with flare activity as well (Skumanich 1986). Magnetic activity evolution has been confirmed for low-mass stars (late F through early M type) in many studies using Xray luminosity (Wright et al. 2011;Núñez et al. 2017), UV emission (Shkolnik & Barman 2014), Zeeman-Doppler imaging (Vidotto et al. 2014), and Hα emission (West et al. 2015), to name but a few.
This surface activity can have significant impact on the evolution of habitable zone planets orbiting active stars. Flares in particular have been studied as a possible threat to a planet's ability to retain a habitable atmosphere (e.g. Segura et al. 2010;Luger et al. 2015;Tilley et al. 2017). For example, UV flux and high energy particles impacting a terrestrial planet's atmo-sphere from frequent stellar flares can significantly deplete ozone, and result in a potentially uninhabitable planetary surface (Tilley et al. 2017). Under more extreme scenarios, stellar activity could strip large portions of a planet's atmosphere away over timescales of a few 100 Myr (Luger et al. 2015). The duration of high flare activity early in a star's life may therefore be a fundamental property in defining potential planetary habitability.
Stellar activity also makes planet searches more difficult, adding both slow variability (e.g. starspots) that can impact radial velocity studies, and fast stochastic variability (e.g. flares) that can impede transit searches (e.g. Kipping et al. 2017). Further, though stellar activity is less prominent at redder wavelengths, even observing in the infrared does not eliminate the potential for flares from active stars to impact transit searches, particularly for the lowest mass stars (Davenport 2017a). To understand planet occurrence rates for nearby low-mass stars, as well as the evolution of planetary habitability, we must constrain flare rates and properties as a function of stellar age.
Measuring the evolution of flare activity with stellar age has typically been limited to comparisons between flare stars in young clusters or stellar associations with known ages. Studying flare stars in young nearby moving groups and clusters reaches back at least five decades (Haro & Chavira 1966), with the conclusion that all low-mass dwarf stars undergo an evolution in flare activity (Ambartsumian & Mirzoian 1975). Pre-main sequence stars have also been known to exhibit high levels of flare activity compared to field-aged dwarfs (e.g. Feigelson 2001). However, in most previous studies field stars are assumed to be approximately Solar-aged, and therefore generally flare-inactive, since few reliable age indicators for isolated field stars exist.
Space-based exoplanet transit missions such as Kepler (Borucki et al. 2010) have provided a revolutionary dataset for statistical studies of flare stars, particularly for field dwarfs (Walkowicz et al. 2011). Such missions are ideally suited for large scale studies of flares, as they produce high precision, continuous light curves for months to years in duration. The Kepler survey has been used to explore "superflare" activity (events with energies more than 10 times larger than those observed on the Sun) from solar-mass stars (Shibayama et al. 2013), as well as from K and M dwarfs (Candelaresi et al. 2014). White-light flares in Kepler light curves have been detected across the main sequence, from massive A and F stars (Balona 2012) down to L dwarfs (Gizis et al. 2013), and produced the most detailed catalogs of flares for individual active stars to date (Hawley et al. 2014;Davenport et al. 2014a).
In this paper we present an ensemble analysis of flare activity in the Kepler field, based on the flare sample amassed in Davenport (2016a). This sample was generated using an automated processing of the entire Kepler light curve database, and produced a sample of over 4000 candidate flare stars. The Davenport (2016a) catalog of flares provides the most complete census of flare activity from a large sample of field stars to date, and is the ideal dataset to study the evolution of flare rates with stellar age.
Here we explore the relationship between flare occurrence rates and stellar ages (derived from rotation periods) for stars in the Kepler field. Our sample of flare stars with measured rotation periods is detailed in §2. We present three approaches for quantitatively tracing changes in flare activity over time, including modeling the flare frequency distribution as a function of both stellar mass and age in §3. In §5 we explore the empirical evolution of flare activity for our Kepler sample, and provide an analytic prescription for use in other studies. Finally, a short discussion and comparison to other studies is given in §6.

FLARE STAR SAMPLE
Our sample of flare stars comes from the automated search of Kepler light curves from Davenport (2016a). This study produced the most comprehensive analysis of the Kepler field for stellar flares, processing every short (1-minute) and long (30-minute) cadence light curve in search of flares. Davenport (2016a) produced an open-source Python flare analysis codebase named appaloosa (Davenport 2016b), designed to detrend (model) Kepler light curves of both instrumental and astrophysical noise, detect flare candidates (positive, significant outliers), and determine the reliability of the detected flares via artificial flare injection and recovery tests. These completeness tests were run on each continuous local segment of light curve available for every star (i.e. sections of the light curve without gaps larger than 3 hours), resulting in variable completeness limits for each star as the light curve noise properties change. Davenport (2016a) made both the appaloosa code, and the code to generate the figures and results in the paper available online. The results presented in this paper directly extend the work of Davenport (2016a), and so we also make our analysis code available as an update to the appaloosa project online. 1

Selecting a Robust Sample
Since we are focused here on quantifying changes in flare activity with stellar age, we limit our analysis to stars with measured rotation periods so that stellar rotation periods can be used as a proxy for age. We note that while the rotation-age connection may be more complex than most gyrochronology prescriptions, e.g. the weakened rotational braking found in 1 https://github.com/jradavenport/appaloosa Solar-age stars from van Saders et al. (2016), rotation nonetheless increases continuously with age (i.e. stars nominally do not speed up while on the main sequence). Rotation is therefore a good means to sort stars by their age, even if the specific age derived from gyrochronology relations is not accurate. Our work here is further insulated from these effects, as the sample of flare star candidates from Davenport (2016a) predominantly have rotation ages of 1 Gyr. We specifically adopt for our analysis the rotation period catalog of McQuillan et al. (2014), who measured 34,040 periods from Kepler data using an Autocorrelation Function analysis of each light curve, and did extensive testing against other period-finding approaches such as the Lomb-Scargle Periodogram.
The Davenport (2016a) sample of ∼4,000 candidate flare stars contains several types of contamination from variable stars that do not exhibit flaring activity. For example, we found some eclipsing binaries and pulsating stars were able to fool the appaloosa code, due to sharp features in their light curve, particularly at the 30-minute cadence. The worst cases of this failure appear to be caused by insufficient modeling of periodic signals by appaloosa, namely by using too few sine curves to fit each significant period found and leaving sharp or "peaked" structures in the model residuals. Future versions of appaloosa will improve on this detrending algorithm, and we urge caution when adopting the Davenport (2016a) flare sample blindly. Our analysis is largely free from such contamination, as we require each star to have a rotation period identified by McQuillan et al. (2014), who rejected such eclipsing and pulsating targets.
In Figure 1 we show light curves and cumulative flare frequency distributions (FFDs) for three example flare stars from the Davenport (2016a) sample that also have rotation periods measured in McQuillan et al. (2014). These example stars will be followed through the anal- Bottom row: cumulative flare frequency distributions (FFDs) from theappaloosa flare finding analysis of Davenport (2016a) for the same three stars. The FFD from each short cadence (blue lines) and long cadence (red lines) dataset is included, as well as the mean FFD (black) with Poisson uncertainties shown. Each star's mean FFD is fit with a power law (heavy navy line), whose slope and intercept in log-log space (α and β) are noted.
ysis of this paper, and were selected to demonstrate a range of flare rates and rotation periods. Each star selected for inclusion in the Davenport (2016a) sample had, within all available Kepler data, at least 100 candidate flares of any energy, and 10 flares with energies above the 68% completeness threshold determined by automated flare injection tests. These conservative thresholds potentially eliminate real flare stars with fewer significant flare events from our analysis, as noted by Van Doorsselaere et al. (2017), and may bias our sample towards active (younger) stars. In total we study the flare rate evolution from 347 stars with rotation periods from McQuillan et al. (2014) that pass the Davenport (2016a) selection criteria for sufficient numbers of recovered flare event candidates. We use this conservatively selected, robust sub-sample of 347 stars from the Daven-port (2016a) catalog throughout this paper to explore in detail various approaches to quantifying and comparing flare activity as a function fo stellar age and mass.

Flare Stars Among the Kepler Field
As an aside, the thresholds for flare star selection from Davenport (2016a) can instead be relaxed to consider the activity from a larger sample of stars, and to better explore the incidence of flare activity in the population of Kepler field stars. As a demonstration of this, in Figure 2 we show the fraction of flaring stars found in six bins of g − i color as a function of their gyrochronology age. In this example we analyzed 27,127 stars from McQuillan et al. (2014) that had rotation periods between 0.1 and 30 days, and colors redder than g − i ≥ 0.5 mag. Stars were considered "active" here if they had at least 3 flare events above the 68% Fraction flaring stars as a function of their gyrochronology ages for 27,127 stars in six bins of g − i color. The number of stars within each color bin is noted in the legend. Each star has a rotation period determined by McQuillan et al. (2014), and had artificial flare injection tests run by Davenport (2016a). Flaring stars were selected here as having at least 3 flare events above the 68% recovery completeness threshold in Davenport (2016a). We find a trend of increasing total fraction of active flare stars, and an increasing apparent lifetime for flare activity, with decreasing mass (redder stars). completeness threshold determined by the automated flare injection tests from Davenport (2016a), and had no requirement for the total number of flare event candidates. This figure emulates the Hα activity fraction work typically done for M dwarfs, which shows an increasing lifetime of magnetic activity for stars with decreasing mass (e.g. West et al. 2008). Previous studies such as Kowalski et al. (2009) andHilton et al. (2010) have explored the fraction of flaring M dwarfs as a function of their height above the Galactic disk (a dynamical proxy for age), and have found flare activity decreases at earlier ages than Hα emission activity.
The lower mass (redder) stars in Figure 2 exhibit both a higher overall fraction of flare activity at all ages, and a qualitatively longer lifetime of flare activity. Note this sample has not been cleared of contaminants from e.g. subgiants, which have been shown by Davenport (2017b) to significantly contaminate the rotating G dwarf sample from McQuillan et al. (2014), and which may lower the overall flaring fraction for bluer stars plotted here. A more detailed census of stellar rotation periods and ages for all Kepler and K2 stars (e.g. see van Saders et al. 2018), as well as an exploration on the robustness of identifying flare stars from small numbers of events is needed to fully understand these flare activity lifetimes. Still, we believe this is the first demonstration of flare activity fractions (directly related to activity lifetimes) for G, K, and M field dwarfs together, and warrants further study.

QUANTIFYING FLARE ACTIVITY
To accurately measure the evolution of flare rates, we must find a suitable metric to characterize flare activity for an ensemble of stars. Typical magnetic activity indicators, such as Hα or X-ray flux, are used as disk-integrated measures of magnetically-driven emission from the chromosphere or corona. These quantities are often presented as relative luminosities, normalized either to a continuum flux, or to the stellar bolometric luminosity. While the overall rate and maximum intensity of a given star's flares appear related to the star's global magnetic field strength, the specific properties of individual flare events (e.g. duration, amplitude, morphology) are dependent on small-scale magnetic active regions. Since flare energy is inversely proportional to the event occurrence frequency, the flare properties measured for a given star will also be dependent on the photometric depth and temporal baseline of the observations. Our flare activity metric must therefore represent the integrated properties of many individual flare events to accurately model a star's magnetic activity state.
In this section we outline three methods for quantifying the photometric flare activity between stars, as well as the merits and challenges of each approach. These metrics include: 1) the fractional energy emitted in the Kepler band by flares (L f l /L Kp ), 2) the cumulative flare rate evaluated at a specific energy, and 3) an analytic model of the entire flare frequency distribution (FFD). While all three of these metrics have utility, we believe the latter will be of most value to future investigations, for example in studies of planetary habitability and atmosphere evolution.
We note that the pedagogical discussion here of comparing flare metrics between Kepler stars is similar in many ways to the review on flare activity by Kunkel (1975). This excellent review explored two different methods for quantifying and comparing flare activity from heterogeneous photometric studies: the integral of the light curve, and the rate at a given specific energy level, which are directly analogous to the first two approaches advanced here. The third approach, comparing the entire FFD between stars, also has a long history (e.g. see Fig. 17 of Lacy et al. 1976), and has even been used for comparing Kepler flare stars (e.g. Ramsay et al. 2013;Hawley et al. 2014).

Fractional Flare Luminosity
An intuitive metric for quantifying magnetic activity strength via flares is the total luminosity emitted by flares relative to the nominal quiescent stellar luminosity, written originally as L f l /L Kp by Lurie et al. (2015). This quantity is inspired by traditional magnetic activity measures for low-mass stars, such as L Hα /L bol (Walkowicz et al. 2004) or L X /L bol (Pallavicini et al. 1981), which are normalized relative to the bolometric stellar luminosity. In this case, as presented in Lurie et al. (2015), the flare luminosity is normalized to the star's quiescent luminosity, but only in the Kepler bandpass. This metric was used by Davenport (2016a), and also recently by Yang et al. (2017).
The relative flare luminosity has many advantages as a magnetic activity strength indicator. First, it is algorithmically simple to compute by taking the integral of the flaring portion of the light curve, as described by Kunkel (1975). For Kepler light curves, this is done by de-trending the non-flaring (quiescent) light curves, including starspot variations, normalizing them to their average relative flux, and then integrating all the identified flares. Integrating the relative flux of a single flare results in a quantity known as the "equivalent duration" (e.g. see Hunt-Walker et al. 2012), which has units of time (typically seconds). By integrating the relative flux from all flares in a Kepler light curve, we would again have units of time, and so as defined by Lurie et al. (2015), L f l /L Kp is simply computed as the integral of the relative flux of all flares, divided by the total observation duration of the light curve, resulting in a unit-less ratio.
A second appealing aspect of L f l /L Kp as a flare activity indicator is that it compresses the entire observed flare activity of a star, regardless of the duration of the observation window, into a single number. This results in a quantity that has higher signal-to-noise than a specific flare rate, for example. This makes L f l /L Kp ideal for comparing flare activity between stars, even with different observing baselines (e.g. comparing flare activity between active stars with differing numbers of quarters observed by Kepler).
Thirdly, the light curve data does not need to be flux calibrated, or have accurate distances determined to measure L f l /L Kp . Instead the metric is defined totally by the relative flux increases of the flares. This is especially useful for datasets like Kepler, where the light curves have incredible short-term precision designed to detect small amplitude exoplanet transits, but suffer from large-scale systematics that typically . Flare Ψ, the correction factor (black line) that can be used to convert the observed fractional flare energy L f l /L Kp in to the metric for comparing between stars of different masses, L f l /L bol . This Ψ was computed using the bolometric and Kepler absolute magnitudes from a 600Myr PARSEC isochrone (Bressan et al. 2012). Here we show this correction factor versus g − i color as a proxy for mass or spectral type (e.g. see Covey et al. 2007;Davenport et al. 2014b). A polynomial fit to the Ψ factor is also shown (blue line), and is described in the text below. prevent flux calibration. As in the exoplanet transit application, the % change of the light curve is the only quantity required. This also results in L f l /L Kp being easily compared for many stars at once. Lurie et al. (2015), for example, used L f l /L Kp to measure flare activity for the M5+M5 binary system GJ 1245 AB, and to relate this flare activity to other mid-to-late type M dwarfs such as GJ 1243 (Davenport et al. 2014a).
As Lurie et al. (2015) note, to correctly compare flare activity between stars of varying spectral types (or effective temperatures), the measured quantity L f l /L Kp requires a bolometric luminosity correction. Specifically, a correction must be made for the varying portion of the bolometric flux that is observed within the Kepler bandpass. This is akin to the "χ" parameter, first developed to convert Hα equivalent width measurements into L Hα /L bol , devel-oped by Walkowicz et al. (2004). Douglas et al. (2014) recently produced a thorough discussion on developing a χ factor using model spectra, and produced an updated table of χ values as a function of photometric colors in many bands. In extending this concept to the calculation of the L f l /L bol ratio, we used the letter Ψ as it follows χ in the Greek alphabet.
In Figure 3 we demonstrate a similar parameter that can be used to convert measured L f l /L Kp values into L f l /L bol , and thus more accurately compare the flare activity level between stars of different masses. The parameter Ψ was determined using the Kepler and bolometric luminosities computed for main sequence stars in a 600 Myr isochrone from the PARSEC model grid Bressan et al. (2012). The very wide bandpass of the Kepler filter means the Ψ factor is relatively close to 1 for most stars, and doesn't change much between spectral types. For ease of use, we also provide a simple polynomial fit to the curve shown in Figure 3: +0.004 (1) Note that the Ψ parameter assumes a "gray" spectral response for the flare itself over the Kepler bandpass. While Hα is effectively emitted at a single wavelength, flares emit energy over all observed wavelengths. In optical wavelengths, the shape of this emission is typically characterized as a hot blackbody with T ef f ≈ 10, 000 K, but has been observed to have significant excess emission in both the blue and red (e.g. see Kowalski et al. 2013). This effective temperature also changes throughout the flare event in a manner that is not well characterized, particularly for complex, multipeaked flare events (e.g. Hawley & Pettersen 1991;Kowalski et al. 2012). Davenport (2016a) discussed this in terms of estimating the energy of a single flare event, assuming a gray flare response across the Kepler band. Some other studies assume a flare spectral model, which can in turn imply higher energies for specific events (e.g. Gizis et al. 2013;Maehara et al. 2015). The Ψ parameter shown in Figure 3 will similarly underestimate the flare luminosity. However, as in Davenport (2016a) we believe the best approach is to assume a gray response and not assume a single flare spectrum for all moments of every flare event. Davenport (2016a) found that the fractional flare luminosity, L f l /L Kp , decreased with increasing Rossby number, which is defined as the rotation period divided by the convective turnover timescale (Ro = P rot /τ ). This result indicates that the flare activity is decreasing with stellar age, as expected from other tracers of magnetic activity. In Figure 4 we reproduce the result of Davenport (2016a), coloring each point by the stellar mass. The center panel of Figure 4 shows the Ψ corrected relative flare luminosity, L f l /L bol . Since the Ψ correction factor is near 1 for most stars, the change between these panels is modest. Interestingly, a slight gradient in the L f l /L bol as a function of mass appears for very rapidly rotating stars, particularly those within the "saturated dynamo" regime (Ro < 0.1). The higher mass (near Solar mass) stars show a larger fraction of their luminosity emitted through flares in this regime. Though our sample of slower rotators is small, this gradient in L f l /L bol seems to disappear for larger Rossby numbers.
An additional complication worth mentioning in the use of the relative flare luminosity metric is due to the varying distances and luminosities of stars in a given magnitude-limited sample. Since detection of flare events (particularly the small-amplitude, lower energy events) depends on the signal-to-noise ratio of a light curve, the distances to stars can impact the resulting L f l /L Kp measurement. For example, given two identical mass stars with the same  underlying flare activity level placed at different distances, the more distant star's lower amplitude flares will be obscured by photometric noise, and thus returning a lower L f l /L Kp measurement. Similarly, for two stars at the same distance, the lower-mass (fainter) one will have a lower signal-to-noise, and again the flare activity will be under-estimated. As a result, when comparing stars using L f l /L Kp (or L f l /L bol ), a uniform range of flare event energies must be considered. This presents a major limitation in comparing the flare activity between stars of different masses. We therefore generally recommend L f l /L Kp only be used when comparing stars within a narrow mass range.
In Figure 4 we also show the total luminosity emitted in flares as a function of the Rossby number, again with points colored by the estimated stellar mass. L f l here is computed as L f l /L Kp times the quiescent luminosity in the Kepler band (L Kp ), which is used in Davenport (2016a) in converting individual flare events from the observed "effective duration" into energy. A strong gradient in total flare luminosity is seen as a function of mass (i.e. as a vertical color gradient in the lower panel of Figure 4). The total flare luminosity (L f l ) is an estimator of the total magnetic activity strength in flares (over the Kepler band). Like the fractional flare luminosities described above, this is useful in comparing flare activity between stars, particularly for considering e.g. the total incident flare flux an exoplanet might experience.

Specific Flare Rate
Rather than integrating the relative flux from all detected flares within a light curve, as outlined in §3.1, flare activity can be expressed as a specific occurrence frequency. Since observable flares occur with a wide range of energies for a given star, such a frequency should be reported for flares of a given energy or larger (e.g. 10 flares per year with energies of 10 32 erg or larger). This is equivalent to evaluating the flare frequency distribution (FFD; see Figure 1) at a given energy. Here we present this metric as R 32 , with the subscript denoting the log energy that rate is evaluated at, and with units of cumulative number per day.
The specific flare rate has been used in some capacity for many years for comparing flare activity levels between stars (e.g. Lacy et al. 1976), and was noted by Davenport (2016a) as an effective metric for comparing flare activity levels between stars at different distances. Since the specific flare rate is generated from the FFD, it can in principle be evaluated at energies not observed for a given star by fitting and extrapolating a powerlaw function to the FFD. Note this implicitly assumes that the flare rate for a star is governed by a single power law at all energies of interest, which is not always supported by observations. For example, as Davenport (2016a) highlight for KIC 11551430, a "break" in the FFD power law is observed for superflare stars at high event energies.
Projecting the specific flare rate using the single FFD power law is very useful for comparing stars with different observing conditions, such as 1) stars with very different observing durations where one star may not have produced many large flares to compare to, or 2) comparing stars at significantly different distances where small amplitude flares from the fainter object are not detectable. What is required for projecting a flare rate to a new energy range, however, is a sufficient number of flares be observed to adequately measure the power law distribution in the FFD.
This specific flare rate metric is also appealing due to its easily understood units, i.e. number of flares per day, and can be useful when considering the impact of flares on other observable properties. For example, Davenport (2016c) report for Proxima Cen a specific rate for superflares of R 33 = 8 per year that may impact exoplanet habitability, and a rate of R 28 = 63 per day for events having amplitudes comparable to an exoplanet transit signal.
In Figure 5 we demonstrate the specific flare rate R 35 for the same sample of stars shown in Figure 4. Here we explore both the dependence on mass and Rossby number (top) and the observed color and rotation period (bottom). The energy of 10 35 erg was selected as the average event energy observed in the Davenport (2016a) flare census. For stars at a given mass or color the specific flare rate declines with increasing rotation period (or Rossby number), consistent with the results from the previous section. The specific flare rate is therefore a simple and effective way to compare flare rates between stars at different distances and under vastly different observing conditions, provided the stars are comparable in color (mass).
However, Figure 5 reveals a gradient in R 35 as a function of either stellar mass or color, where lower mass stars have smaller specific flare rates at a given energy and rotation period (or Rossby number). This manifests as an unintuitive trend where lower-mass stars appear less active as measured by their specific flare rate when compared to higher mass stars at a given rotation period. This reproduces the previously known effect that while low-mass stars produce a high rate of observable flares and can produce a large fraction of their luminosity in flares, the actual rate of events with solar-type flare energies is low.
To overcome this, we can correct the specific flare rate for the difference in quiescent stellar luminosity between stars. Rather than pick a single event energy to evaluate all FFD's at, we can instead pick a flare energy that scales with the star's luminosity. A flare event with an equivalent duration of P = 1 second has, by definition, an energy equal to the star's quiescent luminosity integrated for 1 second. A sensible energy to choose for evaluating the FFD would therefore be the star's quiescent luminosity in the Kepler band 2 . We will denote this adjusted specific flare rate for events with an equivalent duration of 1 second as R 1s . In Figure 6 we show the rotation-color space for the same sample of stars as a function of their R 1s specific flare rate. This adjusted specific flare rate exhibits the same decrease in flare activity with age (rotation), but the low-mass stars now display a higher flare activity level relative to their quiescent luminosity.

Modeling the Flare Frequency Distribution
The previous two metrics presented (L f l /L bol and R 1s ) have the useful property of reducing the complexity of a star's observed flaring behavior into a single quantity. While these metrics are useful for easily comparing flare activity between stars (given the caveats discussed above), such simple metrics neglect the important structure that is present in the full FFD (flare frequency distribution). Rather than try to fully reduce the complexity of the FFD down to a single number, here we demonstrate how to quantitatively use the full FFD to describe the evolution of stellar flare activity.
Comparing evolution of the entire FFD between stars requires considering a four dimensional space: cumulative flare frequency (the ordinate variable) as a function of flare event energy, stellar mass (or color), and age (or rotation period). Figure 7 shows the FFDs for our sample of 347 stars with measured rotation periods, separated into six bins based on their observed g − i colors (as a proxy for mass). Each line shows the mean FFD for a single star using all available quarters of Kepler data, corresponding to the average FFDs shown as heavy black lines in the bottom panels of Figure 1. Each FFD line is colored by the observed rotation period reported by McQuillan et al. (2014), with rotation periods increasing from red to blue. Thus the panels of Figure 7 represent slices in mass of the four dimensional data space we are interested in (stellar mass, age, flare event energy, flare frequency).
As presented in Section §3.2, the range of flare energies we can detect in each panel of Figure  7 is affected by observational and astrophysical biases and limits. The low-energy event cutoff for each star is determined by the automated flare injection and recovery tests described by Davenport (2016a). A bias is also seen where rapid rotators have slightly higher energy cutoffs due to increased starspot amplitudes that were not perfectly modeled out by appaloosa, which made flare recovery more difficult for the iterative algorithm of Davenport (2016a). Figure 7 shows, in every color bin, a decrease in the observed flare frequency as a function of rotation period. This monotonic decrease in flare rate as stars lose angular momentum is the evolution we wish to model. Qualitatively, the flare activity decreases more dramatically for bluer stars, consistent with results from e.g. Figure 6, and our expectations from other measures of stellar magnetic activity over time (e.g. Shkolnik & Barman 2014;Núñez et al. 2017). As Davenport (2016a) highlighted for one flaring G dwarf, a "break" in the FFD power law is apparent at the highest flare event energies. This break energy also seems to scale with the energy range of events observed within each g−i color bin (i.e. lower-mass stars have a lower break energy). Though we do not fully explore this feature here, we emphasize that it may represent an important limit on the maximum en- ergy budget of stellar active regions, or their formation timescales as a function of the total stellar magnetic field strength.

ANALYTIC FLARE EVOLUTION MODEL
To produce a general model for the FFD in physical units, we convert the observed colors and rotation periods for each star in our sample into masses and ages, respectively. The g − i color for each star was converted to stellar mass using a 600 Myr isochrone, as described in Davenport (2016a). Similarly, producing an age for each star requires adopting a "gyrochrone" (gyrochronology isochrone) model prescription. Considerable effort has been made to explore the accuracy of such models for main sequence stars at a range of ages in the Kepler era (e.g. Meibom et al. 2011;Angus et al. 2015;Douglas et al. 2016). Most models produce similar agerotation estimates for stars between roughly 500 Myr and a few Gyr. However, significant problems appear to exist for older, slower rotators, where a break in the angular momentum loss mechanism results in old stars (Solar age and older) with anomalously fast rotation periods (van Saders et al. 2016). As our sample of flare stars is biased towards the more active, younger main sequence stars, we do not expect this spindown break to affect our results. We calculate ages using the Mamajek & Hillenbrand (2008) gyrochrone model. Future gyrochronology models will produce more accurate ages as a function of observed rotation periods, at which point we can update our FFD evolution relation accordingly. In the interim, this model will produce robust relative ages, and a reasonably accurate distribution of absolute ages as a function of the observed rotation period and stellar color.
To model the time dependent FFD we have adopted a power law decrease in the flare ac-tivity rate over time. Our chosen parameterization is supported by independent observations of the decline in other magnetic activity indicators over stellar age, including the fractional flare luminosity discussed in §3.1. As a reminder, the FFD for a single star is defined (e.g. Eqn. 18 from Lacy et al. 1976) as: where ν is the cumulative rate of flares observed at a given energy, ε is the flare event energy, and α and β are the linear coefficients representing the power-law fit 3 . To parameterize our timeevolving FFD as a function of stellar mass and flare energy, we have used the following model to fit the FFDs of all stars in our sample simultaneously: Note the first part of this equation again defines a the FFD power-law distribution for the reverse cumulative number of flares per day (ν) as a function of flare event energy in erg (ε), written to take a linear form in log-log space with terms added to a and b that include a linear dependence on the log of the stellar age (t, in Myr), as well as the stellar mass (m, relative to solar). This equation allows us to fit the ensemble of individual stellar FFDs to determine a 1 , a 2 , a 3 , b 1 , b 2 , and b 3 , and thus make predictions for the rate of flares as function of ε, t, and M . We explored other parameterizations of this FFD evolution model, including additional cross terms as a function of mass and age, e.g. a 4 (m × log t). The Bayesian Information Criterion (BIC) did not yield a significant increase in the quality of the fit to the data given these extra degrees of freedom. Further, we currently have no theoretical basis for using a more complicated model for flare evolution as a function of time or mass. The possible break in the FFD power law for large energy flares, noted by Davenport (2016a) and in our Figure 1, was not included in our parameterization of Equation 3. Instead we focus on reproducing the single dominant power law FFD relationship, which has been reported to characterize the distribution of flares on the Sun and solar-type stars over more than 10 orders of magnitude in event energy (e.g. Fig. 9 from Shibayama et al. 2013). We also did not attempt to include the "saturated" activity regime, suggested by Davenport (2016a) for rapidly rotating stars with Rossby numbers lower than ∼0.03, in our FFD model. As Davenport (2016a) notes, the break does not occur at the same Rossby number as other magnetic activity indicators (typically Ro=0.1) and the support for this break in the Kepler flare data is currently tenuous. Exploring these additional model parameters is not technically difficult, but additional flare stars from missions like K2 and TESS are needed to determine if they are necessary.
Our model fitting in Python took as inputs stellar ages in log(t/Myr) and masses (in units of Solar mass), and all flare event energies in log(ε/ erg), and produced the cumulative flare rate in log(ν/day) over all flare energies measured for each star. Uncertainties in the flare rate were included in our fit, using the Poisson event counting uncertainty approximation from Equation 12 of Gehrels (1986). No errors for the specific flare energies, or in our mass and age estimates were used in the fit. The model in Equation 3 was fit to the entire flare star sample using an error weighted least-squares minimiza-  tion to generate an initial guess for the six free parameters (a 1 , a 2 , a 3 , and b 1 , b 2 , b 3 ). The initial fit for Equation 3 was then refined using the Affine Invariant Markov Chain Monte Carlo (MCMC) ensemble sampler, emcee (Foreman-Mackey et al. 2013). We first ran emcee for a "burn-in" phase of 500 steps using 100 walkers, seeded around the initial leastsquares solution. The sampler was then run for 100,000 steps to explore parameter space. The MCMC chains were well converged for all 6 free parameters after the 100,000-step run, having an autocorrelation time of ∼60 steps for each parameter. The final coefficients used for our FFD evolution model were determined using the median of the converged portion of the MCMC chains, and are given in Table 1. In Figure 8 we also present the standard corner plot (Foreman-Mackey 2016), which shows the 1-and 2-dimensional posterior probability density distributions for each free parameter for Equation 3 from the MCMC sampler.
The 2-dimensional posterior distributions shown in Figure 8 demonstrate that degeneracies are apparent between several parameters in our model. These can be identified as very narrow distributions in three panels of Figure  8: (a 1 , b 1 ), (a 2 , b 2 ), and (a 3 , b 3 ). This can be interpreted as indicating that the model adopted in in Equation 3 has unnecessary complexity, or that perhaps it could be re-cast into a more fundamental parameter space. As noted above, we have chosen the FFD evolution model in Equation 3 for ease of implementation, and for lack of a better theoretical basis. The posterior distributions shown in Figure 8 make it clear that other parameterizations should be explored in future studies.
In Figure 9 we demonstrate our FFD evolution model's ability to reproduce the observed FFDs for several stars. Here we show the combined FFD for the same three flare stars as in Figure  1, but with the predicted FFD from Equation 3 overlaid. As discussed previously, we do not attempt to model any break in the FFD slope at high flare energies. The model reproduces the specific flare rates for each stars well, as well as the dominant slopes for the FFDs. These examples represent the ability of our flare evolution model to describe the FFD for a low-mass star at a given mass and age, and we anticipate this model will be helpful for future studies of the flare activity, such as exoplanet host evolution, or updating the predicted flare yields from surveys like LSST (Najita et al. 2016).
We briefly note our lack of including null detections in this analysis. While our sample contains FFDs for 347 stars, there were nearly two orders of magnitude more stars with good colors (and therefore mass estimates) from Davenport (2016a) and rotation periods from McQuillan et al. (2014) that did not meet our strict flare activity requirements. The flare injection tests from Davenport (2016a) do also provide conservative estimates of the lowest energy flare that could have been detected in each object's Kepler light curve with appaloosa. We experimented with including these null detections in our model fitting as upper limits, using an observed flare rate of zero, and an uncertainty on the flare rate of 1 event per the total duration of observation (∼4 years for most targets). However, we found two practical challenges that prevented us from using these targets in our model fitting: 1) it was not clear over what energy range in the FFD this upper limit should be considered for each object, and 2) these upper limits dominated the sample, and thus drove the model fitting to flatten the FFD evolution to- wards zero, systematically under-predicting the flare rates observed in our entire "good" sample of 347 stars. As such, while our model provides a useful parameterization of flare activity over time, it is only constrained for active stars and may over-estimate flare rates from inactive stars (e.g. Hawley et al. 2014).
Further, while our use of emcee seeded with a least-squares fit follows standard approaches for fitting models to data in modern astronomy, it assumes Gaussian uncertainties to the data. The uncertainties in flare rates, however, are based on Poisson counting errors from Gehrels (1986), and so Poisson regression approaches such as generalized linear modeling (GLM) may be better suited in. These approaches should give nearly identical fits for large samples, but will disagree for rare events (such as large-energy flares). Indeed, this may partially account for our over-estimation of flare rates for less active stars, as demonstrated in the next section.

EXPLORING THE FLARE RATE EVOLUTION
Using our FFD evolution model generated from a sparse sample of field stars with varying masses and ages, we can now explore how a single star's flare activity evolves over its lifetime. In Figure 10 we show the FFD for a theoretical 0.5 M star (approximately an M0 dwarf), and for a 1 M star, both evaluated at four logarithmically spaced ages from 10 Myr to 1 Gyr. The solar-mass star's specific flare rate decreases more rapidly with time than the model M dwarf. The slope of the solar type star's FFD appears constant with time. In contrast, a very small change in the FFD slope is seen for the M dwarf, with the FFD getting steeper (fewer large flares) over time. However, given the small number of M dwarfs in our sample with old ages, the reliability of this change in slope is questionable. We find overall the FFD slope is effectively constant for stars at all ages.
Extending the FFD for a theoretical solarmass star to the age of the Sun (here assumed to be 4.6 Gyr), we find our model over-predicts the rate of superflares determined for "average Sunlike stars" in Kepler by Shibayama et al. (2013). Instead, our model closely fits their "most active Sun-like" sample. We believe this is due to our sample having insufficient numbers of slowly rotating G dwarfs to accurately predict the behavior of older stars. We also note a potential Malmquist bias in our flare rate determination exists due to the possible existence of activity cycles. As Shibayama et al. (2013) and Clarke et al. (2018) note, flare activity might vary by an order of magnitude over their activity cycles (see also Veronig et al. 2002), and so our sample may preferentially include stars near their highest activity states. Our FFD evolution model can also be used to predict other flare activity metrics described in this paper. For example, in Figure 11 we display the mass + age dependence our models predicts for R 35 , the specific flare rate at an absolute event energy of log ε = 35 erg, and R 1s , the metric outlined in §3.2 that scales with the quiescent luminosity of the star. The specific flare rate here is determined by evaluating the FFD at a given energy over a dense grid of ages and masses, allowing us to draw this phenomenological surface of flare activity as a function of mass and age. Encouragingly, the R 1s surface shown in Figure 11 demonstrates the expected behav- Our best-fit flare evolution model showing the predicted FFD evaluated at four age bins for a 0.5M (top) and 1.0M star (bottom). While the change in FFD slopes as a function of age is negligible, the difference as a function of mass is significant. Note: These FFDs are drawn at the same energy ranges for ease of comparison, but do not correspond to the specific flare energies detected at these masses.
ior described in §3.2, where low-mass star's flare activity declines more slowly with age.
Using our model to compute the fractional flare luminosity, L f l /L Kp , requires more care due to the considerations described in §3.1. Our best-fit FFD evolution model evaluated over a grid of masses and ages, showing the cumulative flare rate R 35 for a fixed energy of 10 35 erg (top), and at an equivalent duration of 1 second, i.e. R 1s = 1 sec ×L quies (bottom). The latter clearly shows that low-mass stars produce more flares relative to their quiescent luminosity, while all stars show a decrease in their specific flare rate over time.
the model FFD is integrated over must be chosen at every age and mass. To match observations it is important to pick an energy range of flares that scales with each star's quiescent luminosity. The fractional flare luminosity was computed using a preliminary version of our FFD evolution model for Clarke et al. (2018), which examined the change in L f l /L Kp between stars of varying mass ratios over a 1 Gyr timespan. This was computed at each age by integrating the FFD over six orders of magnitude in flare event energy, a larger dynamic range of events than typically observed for stars in Kepler, even very active stars like GJ 1243 (Hawley et al. 2014). However, the same range of energies was used for both stars, regardless of the mass ratio considered. Clarke et al. (2018) found that the ratio of L f l /L Kp between stars was roughly constant over time (up to 1 Gyr), which is again due to the model FFD slope remaining effectively the same at all ages for a given star.
Our results from Figures 10 and 11 show that flare activity in Solar-mass stars decreases faster than for late-type stars (e.g. M dwarfs). As the surface rotation rate is directly connected to the strength of the stellar magnetic dynamo (Skumanich 1972), this finding is in broad agreement with the age-rotation-activity paradigm (gyrochronology) that shows faster spin-down for higher mass stars. Indeed, flares may play a very central role in the age-rotation-activity landscape. Large flares and their associated coronal mass ejections may be a key factor in driving stellar winds and thus stellar angular momentum loss (Johnstone et al. 2015). This may explain why, for example, flare activity does not appear to "saturate" at Ro∼0.1 (Figure 4).

DISCUSSION AND SUMMARY
Here we have presented a thorough discussion of three methods for quantifying and comparing the flare activity of stars from optical light curves. Each method has advantages and disadvantages in terms of ease of implementation and computation, robustness to incomplete or low signal-to-noise data, and in the ability to compare between stars of varying masses and distances. The specific flare rate and the fractional flare luminosity are recommended for applications needing a simple, single metric, while modeling the entire flare frequency distribution (FFD) is preferred for accurately describing the details of a single, well studied flare star. All methods we studied, however, show definitively that flare activity decreases for all low-mass field stars as their rotation rates decrease (i.e. increasing age). This is qualitatively in agreement with recent studies of declining flare activity with age shown in young open clusters (e.g. Ilin et al. 2018).
In the development of the fractional flare luminosity metric, we have also introduced the "Ψ factor". This quantity was described first by Lurie et al. (2015) as a bolometric correction for the fractional flare luminosity observed in a given passband, and has been demonstrated here for the Kepler band using an isochrone to estimate the bolometric luminosity. Since the Kepler band cover a wide range of the optical regime, this correction is fairly insignificant for correcting flare activity in this sample. However, such an approach will be critical for comparing the flare activity between observations from various traditional optical bands (e.g. U -, B-, and V -bands).
In our ensemble modeling of the FFD, we have assumed a simple parameterization to continuously describe the evolution of flare activity as a function of both stellar mass and age (Eqn. 3). This model is a simple extension of the FFD power law commonly observed, with extra terms added for log(age) and mass. We have identified several interesting aspects of the FFD that have not been described by this model, including the potential break in the power law at high event energies, as well as regime of saturated activity for rapidly rotating (young) stars. As the sample of robust flare stars from Kepler, K2, and soon TESS grows, the parameterization of the FFD evolution model should be revisited, with these features added a additional free parameters. The axes of this model should also be reexamined, such as parametrizing the evo-lution in terms of Rossby number rather than age.
We find the FFD slope does not significantly change for stars as a function of their age. This suggests our FFD evolution model could be simplified in the future by keeping the slope fixed. This result also has potentially deep meaning for the physics of flares, indicating for example the same rules of self-organized criticality are at work regardless of the star's total magnetic field strength. It is not clear from our work at this time, however, over what exact energy range this slope is applicable, nor how that energy range changes with stellar age. Unfortunately Kepler's red wavelength coverage makes detecting very small energy flares difficult, and so it is not clear if this single power law slope extends to low enough energies to be an important component of support for the lower stellar atmosphere (e.g. "Ellerman Bombs"; Hansteen et al. 2017). However, if the nano/micro-flare rate evolves coherently with the larger events and super-flare rates observed here (i.e. if the power law is preserved to very low energy events), this atmospheric support mechanism must change as the total flare rate decreases. Otherwise the lower stellar atmosphere support would decrease, and the stellar radius could change over time.
We have also introduced a new metric for quickly quantifying flare activity, R 1s . Since this metric estimates the specific flare rate scaled to the quiescent luminosity of the star, it is well suited for comparing flare activity between stars of varying masses. R 1s is also useful for considering the impact of flares on exoplanet atmospheres and habitability, as it provides the rate of flares having approximately the same incident flux that a habitable zone exoplanet would experience.
Finally we note that all temporal evolution implied in this work relies on the assumption that gyrochronology accurately sorts stars as as a function of their age. This neglects, for example, any dynamical evolution in binaries that can affect the present day rotation (e.g. Lurie et al. 2017;Clarke et al. 2018). We have also not utilized the stellar open clusters available in the Kepler field, which are ideal benchmarks for stars at fixed ages. The sample of flare stars found in these clusters in Kepler is too small to be used for a statistical analysis. However, with a number of nearby young clusters available in K2, future work should focus on comparing the flare activity between stars of known ages (e.g. Ilin et al. 2018).
We thank the anonymous referee and the AAS Statistics Consultant whose comments helped us clarify the impact and limitations of this work.
JRAD is supported by an NSF Astronomy and Astrophysics Postdoctoral Fellowship under award AST-1501418.
JRAD acknowledges support from the DIRAC Institute in the Department of Astronomy at the University of Washington. The DIRAC Institute is supported through generous gifts from the Charles and Lisa Simonyi Fund for Arts and Sciences, and the Washington Research Foundation Kepler was competitively selected as the tenth Discovery mission. Funding for this mission is provided by NASA's Science Mission Directorate.  (Davenport 2016a)