A Detection Metric Designed for O'Connell Effect Eclipsing Binaries

We present the construction of a novel time-domain signature extraction methodology and the development of a supporting supervised pattern detection algorithm. We focus on the targeted identification of eclipsing binaries that demonstrate a feature known as the O'Connell effect. Our proposed methodology maps stellar variable observations to a new representation known as distribution fields (DFs). Given this novel representation, we develop a metric learning technique directly on the DF space that is capable of specifically identifying our stars of interest. The metric is tuned on a set of labeled eclipsing binary data from the Kepler survey, targeting particular systems exhibiting the O'Connell effect. The result is a conservative selection of 124 potential targets of interest out of the Villanova Eclipsing Binary Catalog. Our framework demonstrates favorable performance on Kepler eclipsing binary data, taking a crucial step in preparing the way for large-scale data volumes from next-generation telescopes such as LSST and SKA.

by its targeting of eclipsing binaries that demonstrate a feature known as the O'Connell effect.
We have selected O'Connell effect eclipsing binaries (OEEBs) to demonstrate initially our detector design. We highlight OEEBs here because they compose a subclass of a specific type of variable star (eclipsing binaries). Subclass detection provides an extra layer of complexity for our detector to try to handle. We demonstrate our detector design on Kepler eclipsing binary data from the Villanova catalog, allowing us to train and test against different subclasses in the same parent variable class type. We train our detector design on Kepler eclipsing binary data and apply the detector to a different survey-the Lincoln Near-Earth Asteroid Research asteroid survey [LINEAR,2]-to demonstrate the algorithm's ability to discriminate and detect our targeted subclass given not just the parent class but other classes as well.
Classifying variable stars relies on proper selection of feature spaces of interest and a classification framework that can support the linear separation of those features. Selected features should quantify the telltale signature of the variability-the structure and information content. Prior studies to develop both features and classifiers include expert selected feature efforts [3,4,5,6,7,8,9], automated feature selection efforts [10,11], and unsupervised methods for feature extraction [12,13]. The astroinformatics communitystandard features include quantification of statistics associated with the time-domain photometric data, Fourier decomposition of the data, and color information in both the optical and IR domains [14,15]. The number of individual features commonly used is upward of 60 and growing [16] as the number of variable star types increases, and as a result of further refinement of classification definitions [17]. We seek here to develop a novel feature space that captures the signature of interest for the targeted variable star type.
The detection framework here maps time-domain stellar variable observations to an alternate distribution field (DF) representation [18] and then develops a metric learning approach to identify OEEBs. Based on the matrix-valued DF feature, we adopt a metric learning framework to directly learn a distance metric [19] on the space of DFs. We can then utilize the learned metric as a measure of similarity to detect new OEEBs based on their closeness to other OEEBs. We present our metric learning approach as a competitive push-pull optimization, where DFs corresponding to known OEEBs influence the learned metric to measure them as being nearer in the DF space. Simultaneously, DFs corresponding to non-OEEBs are pushed away and result in large measured distances under the learned metric.
This article is structured as follows. First, we review the targeted stellar variable type, discussing the type signatures expected. Second, we review the data used in our training, testing, and discovery process as part of our demonstration of design. Next, we outline the novel proposed pipeline for OEEB detection; this review includes the feature space used, the designed detector/classifier, and the associated implementation of an anomaly detection algorithm [20]. Then, we apply the algorithm, trained on the expertly selected/labeled Villanova Eclipsing Binary catalog OEEB targets, to the rest of the catalog with the purpose of identifying new OEEB stars. We present the results of the discovery process using a mix of clustering and derived statistics. We apply the Villanova Eclipsing Binary catalog trained classifier, without additional training, to the LINEAR data set. We provide results of this cross-application, i.e., the set of discovered OEEBs.
For comparison, we detail two competing approaches. We develop training and testing strategies for our metric learning framework, and finally, we conclude with a summary of our findings and directions for future research.

Eclipsing Binaries with O'Connell Effect
The O'Connell effect [21] is defined for eclipsing binaries as an asymmetry in the maxima of the phased light curve (see Figure 1). This maxima asymmetry is unexpected, as it suggests an orientation dependency in the brightness of the system. Similarly, the consistency of the asymmetric over many orbits is also surprising, as it suggests that the maxima asymmetry has a dependence on the rotation of the binary system. The cause of the O'Connell effect is not fully understood and additional data and modeling are necessary for further investigation [22]. Our focus in this work is in the application of an automated detector to OEEBs to identify systems of interest for future work.

Signatures and Theories
Several theories propose to explain the effect, including starspots, gas stream impact, and circumstellar matter [22]. The work by [23] outlines each of these theories and demonstrates how the observed effects are generated by the underlying physics.
• Starspots result from chromospheric activity, causing a consistent decrease in brightness of the star when viewed as a point source. While magnetic surface activity will cause both flares (brightening) and spots (darkening), flares tend to be transient, whereas spots tend to have longer-term effects on the observed binary flux. Thus, between the two, starspots are the favored hypothesis for causing long-term consistent asymmetry; often binary simulations (such as the Wilson-Devinney code) can be used to model O'Connell effect binaries via including an often large starspot [24].
• Gas stream impact results from matter transferring between stars (smaller to larger) through the L1 point and onto a specific position on the larger star, resulting in a consistent brightening on the leading/trailing side of the secondary/primary. • The circumstellar matter theory proposes to describe the increase in brightness via free-falling matter being swept up, resulting in energy loss and heating, again causing an increase in amplitude. Alternatively, circumstellar matter in orbit could result in attenuation, i.e., the difference in maximum magnitude of the phased light curve results from dimming and not brightening.
Researchers have used standard eclipsing binary simulations [25] to demonstrate the proposed explanations for each light curve instance and estimate the parameters associated with the physics of the system. [23] noted other cases of the O'Connell effect in binaries, which have since been described physically; in some cases, the effect varied over time, whereas in other cases, the effect was consistent over years of observation and over many orbits. The effect has been found in both overcontact, semidetached, and near-contact systems.
While one of the key visual differentiators of the O'Connell effect is ∆m max , this heuristic feature alone could not be used as a general mean for detection, as the targets trained on or applied to are not guaranteed to be (a) eclipsing binaries and (b) periodic.
One of the goals we are attempting to highlight is the transformation of expert qualitative target selection into quantitative machine learning methods.

Characterization of OEEB
We develop a detection methodology for a specific target of interest-OEEB-defined as an eclipsing binary where the light curve (LC) maxima are consistently at different amplitudes over the span of observation. Beyond differences in maxima, and a number of published examples, little is defined as a requirement for identifying the O'Connell effect [23,26]. [22] provide some basic indicators/measurements of interest in relation to OEEB binaries: the O'Connell effect ratio (OER), the difference in maximum amplitudes (∆m), the difference in minimum amplitudes, and the light curve asymmetry (LCA). The metrics are based on the smoothed phased light curves. The OER is calculated as Equation 1: where the min-max amplitude (i.e. normalized flux) measurements for each star are grouped into phase bins (n = 500), where the mean amplitude in each bin is I i . An OER > 1 corresponds to the front half of the light curve having more total flux; note that for the procedure we present here, I 1 = 0. The difference in max amplitude is calculated as Equation 2: where we have estimated the maximum in each half of the phased light curve. The LCA is calculated as Equation 3: As opposed to the measurement of OER, LCA measures the deviance from symmetry of the two peaks. Defining descriptive metrics or functional relationships (i.e., bounds of distribution) requires a larger sample than is presently available. An increased number of identified targets of interest is required to provide the sample size needed for a complete statistical description of the O'Connell effect. The quantification of these functional statistics allows for the improved understanding of not just the standard definition of the targeted variable star but also the population distribution as a whole. These estimates allow for empirical statements to be made regarding the differences in light curve shapes among the variable star types investigated. The determination of an empirically observed distribution, however, requires a significant sample to generate meaningful descriptive statistics for the various metrics.
In this effort, we highlight the functional shape of the phased light curve as our defining feature of OEEB stars. The prior metrics identified are selected or reduced measures of this functional shape. We propose here that, as opposed to training a detector on the preceding indicators, we use the functional shape of the phased light curve by way of the distribution field to construct our automated system.

Variable Star Data
As a demonstration of design, we apply the proposed algorithm to a set of predefined, expertly labeled eclipsing binary light curves. We focus on two surveys of interest: first, the Kepler Villanova Eclipsing Binary catalog, from which we derive our initial training data as well as our initial discovery (unlabeled) data, and second, the Lincoln Near-Earth Asteroid Research, which we treat as unlabeled data.

Kepler Villanova Eclipsing Catalog
Leveraging the Kepler pipeline already in place, and using the data from the Villanova Eclipsing Binary catalog [27], this study focuses on a set of predetermined eclipsing binaries identified from the Kepler catalog. From this catalog, we developed an initial, expertly derived, labeled data set of proposed targets "of interest" identified as OEEB.
Likewise, we generated a set of targets identified as "not of interest" based on our expert definitions, i.e., intuitive inference.
We have labeled our two populations "of interest" and "not of interest" to represent those targets in the initial training set that a user has found to either be interesting to their research (identified OEEBs) or otherwise. The labeling "of interest" here is specifically used, as we are dependent on the expert selections.
Using the Eclipsing Binary catalog [27], we identified a set of 30 targets "of interest" and 121 targets of "not of interest" via expert analysis-by-eye selection based on researchers' interests. Specific target identification is listed in a supplementary digital file at the project repository. 1 We use this set of 151 light curves for training and testing.

Light Curve/Feature Space
Prior to feature space processing, the raw observed photometric time domain data are conditioned and processed. Operations include long-term trend removal, artifact removal, initial light curve phasing, and initial eclipsing binary identification; we performed these actions prior to the effort demonstrated here, by the Eclipsing Binary catalog (our work uses all 2875 long-cadence light curves available as of the date of publication as training/testing data, or as unlabeled data to search for new OEEBs). The functional shape of the phased light curve is selected as the feature to be used in the machine learning process, i.e., detection of targets of interest. While the data have been conditioned already by the Kepler pipeline, added steps are taken to allow for similarity estimation between phased curves. Friedman's SUPERSMOOTHER algorithm [28,29] is used to generate a smooth 1-D functional curve from the phased light curve data. The smoothed curves are transformed via min-max scaling 4: where f (φ) is the smoothed phased light curve, f is the amplitude from the database source, φ is the phase where φ ∈ [0, 1], and f (φ) N is the min-max scaled amplitude (i.e. normalized flux). Note that we will use the terms f (φ) N and min-max amplitude interchangeably throughout this article. We use the minimum of the smoothed phased light curve as a registration marker, and both the smoothed and unsmoothed light curves are aligned such that lag/phase zero corresponds to minimum amplitude (eclipse minima; see [22]).

Training/Testing Data
The labeled training data are provided as part of the supplementary digital project repository. We include the SOI and NON-SOI Kepler identifiers here (KIC).   Additionally, we plot the total training and testing dataset in the ∆m and OER feature space, to demonstrate the separability of our classes ("Of Interest" vs. "Not of Interest"), see figure 2. The values presented here were generated based on phased, unsmoothed, data.
As is apparent from figure 2, the data does not exactly separate based on either of the select heuristic measures into the selected categories. For a baseline estimate of performance, we use a simple 1-NN classification algorithm with our selected two heuristic ∆m and OER values, using a randomized 50/50 split of our initial Kepler training data (see section 5.1), resulting in a misclassification rate of 23%. This error rate drops to 14% if we use k-NN (with a k = 5); larger values of k, consistently decreased performance of the classifier further.
The training data is based on expert requests, with the explicit request that the detector find new observations that are similar to those "of interest" and dissimilar to those identified as "not of interest". These targets were to have a |∆m| greater than some threshold, was not to have multiple periods, and was to have a consistent structure in the phased domain. Our objective was to construct a procedure that could find other light curves that fit these user constraints.

Data Untrained
The 2,000+ eclipsing binaries left in the Kepler Eclipsing Binary catalog are left as unlabeled targets. We use our described detector to "discover" targets of interest, i.e., OEEB. The full set of Kepler data is accessible via the Villanova Eclipsing Binary website (http://keplerebs.villanova.edu/).
For analyzing the proposed algorithm design, the LINEAR data set is also leveraged as an unknown "unlabeled" data set ripe for OEEB discovery [4,30]. From the starting sample of 7,194 LINEAR variables, we used a clean sample of 6,146 time series data sets for detection. Stellar class type is limited further to the top five most populous classes-RR Lyr (ab), RR Lyr (c), Delta Scuti / SX Phe, Contact Binaries, and Algol-Like Stars with two minima-resulting in a set of 5,086 observations. Unlike the Kepler Eclipsing Binary catalog, the LINEAR data set contains targets other than (but does include) eclipsing binaries; the data set we used [1] includes Algols (287), Contact Binaries (1805), Delta Scuti (68), and RR Lyr (ab-2189, c-737). The light curves are much more poorly sampled; this uncertainty in the functional shape results from lower SNR (ground survey) and poor sampling. The distribution of stellar classes is presented in Table 3.

Prior Research
The studies on classification of time-domain variable stars often further reduce the folded time-domain data into features that provide maximal-linear separability of classes.
These efforts include expert selected feature efforts [3,4,5,6,7,8,9], automated feature selection efforts [10,11], and unsupervised methods for feature extraction [12,13]. The astroinformatics community-standard features include quantification of statistics associated with the time-domain photometric data, Fourier decomposition of the data, and color information in both the optical and IR domains [14,15]. The number of individual features commonly used is upward of 60 and growing [16] as the number of variable star types increases and as a result of further refinement of classification definitions [17]. Curiously, aside from efforts to construct a classification algorithm from the time-domain data directly [10], few efforts in astroinformatics have looked at features beyond those described here-mostly Fourier domain transformations or time domain statistics. Considering the depth of possibility for time-domain transformations [46,47,48,49], it is surprising that the community has focused on just a few transforms. Additionally, there has been recent work in exo-planet detection using whole phased waveform data (smoothed), in combination with neural network classification [50], and with local linear embedding [51]. Similar to the design proposed here, these methods use the classifier to optimize the feature space (the phased waveform) for the purposes of detection.
Here we propose an implementation that simplifies the traditional design: limiting ourselves to a one versus all approach [31] targeting a variable type of interest; limiting ourselves to a singular feature space-the distribution field of the phased light curvebased on [52] as a representation of the functional shape; and introducing a classification/detection scheme that is based on similarity with transparent results [19] that can be further extended, allowing for the inclusion of an anomaly detection algorithm.

Distribution Field
As stated, this analysis focuses on detecting OEEB systems based on their light curve shape. The OEEB signature has a cyclostationary signal, a functional shape that repeats with a consistent frequency. The signature can be isolated using a process of period finding, folding, and phasing [53]; the Villanova catalog provides the estimated "best period." The proposed feature space transformation will focus on the quantification or representation of this phased functional shape. This particular implementation design makes the most intuitive sense, as visual inspection of the phased light curve is the way experts identify these unique sources.
As discussed, prior research on time-domain data identification has varied between generating machine-learned features [54], implementing generic features [38,30,5,3], and looking at shape-or functional-based features [55,1,44]. This analysis will leverage the distribution field transform to generate a feature space that can be operated on; a distribution field (DF) is defined as [52,18] Equation 5: where N is the number of samples in the phased data, and [ ] is the Iverson bracket [56], given as and y j and x i are the corresponding normalized amplitude and phase bins, respectively, where x i = 0, 1/n x , 2/n x , . . . , 1, y i = 0, 1/n y , 2/n y , . . . , 1, n x is the number of time bins, and n y is the number of amplitude bins. The result is a right stochastic matrix, i.e., the rows sum to 1. Bin number, n x and n y , is optimized by cross-validation as part of the classification training process. Smoothed phased data -generated from SUPERSMOOTHER-are provided to the DF algorithm.
We found this implementation to produce a more consistent classification process.
We found that the min-max scaling normalization-normalized flux-if applied by itself without smoothing, when outliers are present can produce final patterns that focus more on the outlier than the general functionality of the light curve. Likewise, we found that using the unsmoothed data in the DF algorithm resulted in a classification that was too dependent on the scatter of the phased light curve. Although at first glance, that would not appear to be an issue, this implementation resulted in light curve resolution having a large impact on the classification performance-in fact, a higher impact than the shape itself. An example of this transformation is given in Figure 3.
Though the DF exhibits properties that a detection algorithm can use to identify specific variable stars of interest, it alone is not sufficient for our ultimate goal of automated detection. Rather than vectorizing the DF matrix and treating it as a feature vector for standard classification techniques, we treat the DF as the matrix-valued feature that it is [52]. This allows for the retention of row and column dependence information that would normally be lost in the vectorization process [57].

Metric Learning
At its core, the proposed detector is based on the definition of similarity and, more formally, a definition of distance. Consider the example triplet "x is more similar to y than to z," i.e., the distance between x and y in the feature space of interest is smaller than the distance between x and z. The field of metric learning focuses on defining this distance in a given feature space to optimize a given goal, most commonly the reduction of error rate associated with the classification process. Given the selected feature space of DF matrices, the distance between two matrices X and Y [19,52] is defined as Equation

7
: M is the metric that we will be attempting to optimize, where M 0 (positive semidefinite). The PMML procedure outlined in [52] is similar to the metric learning methodology LMNN [58], save for its implementation on matrix-variate data as opposed to vectorvariate data. We summarize it here. The developed objective function is given in Equation

8
: where N c is the number of training data in class c; λ and γ are variables to control the importance of push versus pull and regularization, respectively. [19] define the triplet There are three basic components: a pull term, which is small when the distance between similar observations is small; a push term, which is small when the distance between dissimilar observations is larger; and a regularization term, which is small when  If M is not PSD, then the distance axioms are not held up, and therefore our similarity that we are using d(X, Y ) = X − Y 2 M would not be a true distance, specifically if M is not symmetric then d(x i , x j ) = d(x j , x i ). This projected metric is used in the classification algorithm. The metric learned from this push-pull methodology is used in conjunction with a standard k-nearest neighbor (k-NN) classifier.

k-NN Classifier
The traditional k-NN algorithm is a nonparametric classification method; it uses a voting scheme based on an initial training set to determine the estimated label [60]. For a given new observation, the L 2 Euclidean distance is found between the new observation and all points in the training set. The distances are sorted, and the k closest training sample labels are used to determine the new observed sample estimated label (majority rule).
Cross-validation is used to find an optimal k value, where k is any integer greater than zero.
The k-NN algorithm estimates a classification label based on the closest samples provided in training. For our implementation, the distance between a new pattern DF i and each pattern in the training set is found, using the optimized metric instead of the standard identity metric that would have been used in L 2 Euclidean distance. The new pattern is classified depending on the majority of the closest k class labels. The distance between patterns is in Equation 7, using the learned metric M .

Results of the Classification
The new OEEB systems discovered by the method of automated detection proposed here can be used to further investigate their frequency of occurrence, provide constraints on existing light curve models, and provide parameters to look for these systems in future large-scale variability surveys like LSST.

Training on Kepler Data
Optimized feature dimensions and parameters used in the classification process were estimated via five-fold cross-validation [61]. Our procedure is as follows: the original set of 151 was split into two groups, 76 for training and 75 for testing. The training dataset is partitioned into 5 groups of equal sizes (14), with no replication. To evaluate the optimal DF dimensions we loop over a range of both x and y resolutions. We also include a loop for number of k-Neighbors. Within these loops we evaluate the performance of three classifiers that have limited input parameters to train using our partitioned data. This cycling over the partitioned data is the 5-fold cross-validation, within this loop we cycle over each partition, leaving it out, using the other four for training, and comparing the classifier trained on the four against the one left out.
The resulting error average over the five cycles is used as the performance estimate for the selection of x/y/k resolution. After the resulting analysis, we have three error estimates, per x/y/k resolution pairing. We select the "optimal" x/y resolution pairing based on a minimization of the PMML classifier, for all classifiers trained (the optimal resolutions for the other classifier methods resulted in roughly the same resolution at the PMML one); k was optimized per classifier. These optimal resolutions are then used to generate the DF features from all of the training data. This training data is then used to train the classifier selected, and those trained classifiers are then applied to the testing data that was originally set aside. The resulting misclassification rates are provided in the Table 6.
We make the following general notes/caveats about the process used: • The selection of partitions is a random process (there is a random selection algorithm in the partitioning algorithm). Therefore the resulting errors produced as part of the cross-validation process aren't guaranteed to be the same on subsequent runs (i.e. different partitions selection = different error rates).
• Likewise, one of the alternative classification methodologies requires the use of clustering, so similar to (3)

Application on Unlabeled Kepler Data
The algorithm is applied to the Villanova Eclipsing Binary catalog entries that were not identified as either "Of Interest" or "Not of Interest," i.e., unlabeled for the purposes of our targeted goal. The trained and tested data sets are combined into a single training set for application; the primary method (push-pull metric classification) is used to optimize a metric based on the optimal parameters found during cross-validation and to apply the system to the entire Villanova Eclipsing Binary data (2875 curves).

Design Considerations
On the basis of the results demonstrated in [31], the algorithm additionally conditions the detection process based on a maximal distance allowed between a new unlabeled point and the training data set in the feature space of interest.
This anomaly detection algorithm is based on the optimized metric; a maximum distance between data points is based on the training data set, and we use a fraction (0.75) of that maximum distance as a limit to determine "known" versus "unknown." The value of the fraction was initially determined via trial and error, based on our experiences with the data set and the goal of minimizing false alarms (which were visually apparent).
This further restricts the algorithm to classifying those targets that exist only in "known space." The k-NN algorithm generates a distance dependent on the optimized metric; by restricting the distances allowed, we can leverage the algorithm to generate the equivalent of an anomaly detection algorithm.
The resulting paired algorithm (detector + distance limit) will produce estimates of "interesting" versus "not interesting," given new-unlabeled-data. Our algorithm currently will not produce confidence estimates associated with the label. Confidence associated with detection can be a touchy subject, both for the scientists developing the tools and for the scientists using them. Here we have focused on implementing a k-NN algorithm with optimized metric (i.e., metric learning); posterior probabilities of classification can be estimated based on k-NN output [61] and can be found as (k c /(n * volume)); linking these posterior probability estimates to "how confident am I that this is what I think this is" is not often the best choice of description.
Confidence in our detections will be a function of the original PPML classification algorithm performance, the training set used and the confidence in the labeling process, and the anomaly detection algorithm we implemented. Even (k c /(n * volume)) would not be a completely accurate description in our scenario. Some researchers [62] have worked on linking "confidence" in k-NN classifiers with distance between the points. Our introduction of an anomaly detection algorithm into the design thus allows a developer/user the ability to limit the false alarm rate by introducing a maximum acceptable distance thus allowing some control in the confidence of the result; see [31] for more information.

Results
Once we remove the discovered targets that were also in the initial training data, the result is a conservative selection of 124 potential targets of interest listed in a supplementary digital file at the project repository. 3 We here present an initial exploratory data analysis performed on the phased light curve data. At a high level, the mean and standard deviation of the discovered curves are presented in Figure 4.
A more in-depth analysis as to the meaning of the distribution functional shapes is left for future study. Such an effort would include additional observations (spectroscopic and photometric additions would be helpful) as well as analysis using binary simulator code such as Wilson-Devinney [63]. It is noted that in general, there are some morphological consistencies across the discovered targets: 1. In the majority of the discovered OEEB systems, the first maximum following the primary eclipse is greater than the second maximum following the secondary eclipse.
2. The light curve relative functional shape from the primary eclipse (minima) to  primary maxima is fairly consistent across all discovered systems.
3. The difference in relative amplitude between the two maxima does not appear to be consistent, nor is the difference in relative amplitude between the minima.
We perform additional exploratory data analysis on the discovered group via subgrouping partitioning with unsupervised clustering. The k-means clustering algorithm with matrix-variate distances presented as part of the comparative methodologies is applied to the discovered data set (their DF feature space). This clustering is presented to provide more detail on the discovered group morphological shapes. The associated 1-D curve generated by the SUPERSMOOTHER algorithm is presented with respect to their respective clusters (clusters 1-8) in Figure 5.
The clusters generated were initialized with random starts, thus additional iterations  Figure 6.
We note, that based on our methodology, the selection of initial training data (i.e. the expert selected dataset) will have a strong effect on the performance of the detector and the targets discovered. Biases in the initial training data-limits of ∆m for examplewill be reflected in the the discovered dataset. Between our selection of classifier, which is rooted in leveraging similarity with respect to the initial selected training set, and the anomaly detection algorithm we have supplemented our design with, this design effectively ensures that the targets discovered will be limited to morphological features similar to the initial training data provided by the expert. If there were an OEEB with a radically different shape than those stars used in the initial training data, this methodology would likely not find those stars. An increase in missed detection rate has thus been traded for a decrease in false alarm rate, a move that we feel is necessary given the goals of this design and the amount of potential data that could be fed to this algorithm.

Subgroup Analysis
The linear relationship between OER and ∆m reported in [22] is apparent in the discovered Kepler data as well. The data set here extends from OER ∼ (0.7, 1.8) and ∆m ∼ (−0.3, 0.4), not including the one sample from cluster 3 that is extreme. This is comparable to the reported range in [22] of OER ∼ (0.8, 1.2) and ∆m ∼ (−0.1, 0.05)-a similar OER range, but our Kepler data span a much larger ∆m domain, likely resulting from our additional application of min-max amplitude scaling (i.e., normalized flux). The gap in ∆m between −0.08 and 0.02 is caused by the bias in our training sample and algorithm goal: we only include O'Connell effect binaries with a user-discernible ∆m.
The clusters identified by the k-mean algorithm applied to the DF feature space roughly correspond to groupings in the OER/∆m feature space (clustering along the diagonal). The individual cluster statistics (mean and relative error) with respect to the metrics measured here are given in Table 4. All of the clusters have a positive mean ∆m, save for cluster 6. The morphological consistency within a cluster is visually apparent in Figure 5 but also in the relative error of LCA, with clusters 5 and 7 being the least consistent. The next step will include applications to other surveys.

LINEAR Catalog
We further demonstrate the algorithm with an application to a separate independent survey. Machine learning methods have been applied to classifying variable stars observed by the LINEAR survey [4], and while these methods have focused on leveraging Fourier domain coefficients and photometric measurements {u, g, r, i, z} from SDSS, the data also include best estimates of period, as all of the variable stars trained on had cyclostationary signatures. It is then trivial to extract the phased light curve for each star and apply our Kepler trained detector to the data to generate "discovered" targets of interest. The discovered targets are aligned, and the smoothed light curves are presented in Figure 7. Note that the LINEAR IDs are presented in Table 5 and as a supplementary digital file at the project repository. 5 Application of our Kepler trained detector to LINEAR data results in 24 "discovered" 5 https://github.com/kjohnston82/OCDetector/supplement/LINEARDiscovered.xlsx Figure 7: Distribution of phased-smoothed light curves from the set of discovered LIN-EAR targets that demonstrate the OEEB signature. LINEAR targets were discovered using the Kepler trained detector. Figure 8: OER versus ∆m for the discovered OEEB in the LINEAR data set. This relationship between OER and ∆m was also demonstrated in [22] and is similar to the distribution found in Figure 6.
OEEBs. These include four targets with a negative O'Connell effect. Similar to the Kepler discovered data set, we plot OER/∆m features using lower-resolution phased binnings (n = 20) and see that the distribution and relationship from [22] hold here as well (see The pairing of DF feature space and push-pull matrix metric learning represents a novel design; thus it is difficult to draw conclusions about performance of the design, as there are no similar studies that have trained on this particular data set, targeted this particular variable type, used this feature space, or used this classifier. As we discussed earlier, classifiers that implement matrix-variate features directly are few and far between and almost always not off the shelf. We have developed here two hybrid designs-off-the-shelf classifiers mixed with feature space transform-to provide context and comparison. These two additional classification methodologies implement more traditional and well-understood features and classifiers: k-NN using L 2 distance applied to the phased light curves (Method A) and k-means representation with quadratic discriminant analysis (QDA) (Method B). Method A is similar to the UCR [64] time series data baseline algorithm, reported as part of the database. Provided here is a direct k-NN classification algorithm applied directly to the smoothed, aligned, regularly sampled phased light curve. This regular sampling is generated via interpolation of the smoothed data set and is required because of the nature of the nearest neighbor algorithm requiring one-to-one distance. Standard procedures can then be followed [59]. Method B borrows from [65], transforming the matrix-variate data into vector-variate data via estimation of distances between our training set and a smaller set of exemplar means DFs that were generated via unsupervised learning. Distances were found using the Frobenius norm of the difference between the two matrices.
Whereas Method A uses neither the DF feature representation nor the metric learning methodology, Method B uses DF feature space but not the metric learning methodology.
This presents a problem, however, as most standard out-of-the-box classification methods require a vector input. Indeed, many methodologies, even when faced with a matrix input, choose to vectorize the matrix. An alternative to this implementation is a secondary transformation into a lower-dimensional feature space. Following the work of [65], we implement a matrix distance k-means algorithm (e.g., k-means with a Frobenius norm) to generate estimates of clusters in the DF space. Observations are transformed by finding the Euclidean distance between each training point and each of the k-mean matrices discovered. The resulting set of k-distances is treated as the input pattern, allowing the use of the standard QDA algorithm [61]. The performances of both the proposed methodology and the two comparative methodologies are presented in Table 6. The algorithms are available as open source code, along with our novel implementation, at the project repository.
We present the performance of the main novel feature space/classification pairing as well as the two additional implementations that rely on more standard methods. Here we have evaluated performance based on misclassification rate, i.e., 1-accuracy given by [66] as 1−correct/total. The method we propose has a marginally better misclassification rate (Table 6) and has the added benefit of (1) not requiring unsupervised clustering, which can be inconsistent, and (2) providing nearest neighbor estimates allowing for demonstration of direct comparison. These performance estimate values are dependent on the initial selected training and testing data. They have been averaged and optimized via crossvalidation; however, with so little initial training data and with the selection process for which training and testing data are randomized, performance estimates may vary.
Of course, increases in training data will result in increased confidence in performance results.
We have not included computational times as part of this analysis, as they tend to be dependent on the system operated on. We can anecdotally discuss that, on the system implemented as part of this research (MacBook Pro, 2.5 GHz Intel i7, 8 GB RAM), the training optimization of our proposed feature extraction and PPML classification total took less than 5-10 min to run-variation depending on whatever else was running in the background. Use of the classifiers on unlabeled data resulted in a classification in fractions of seconds per star. However, we should note that this algorithm will speed up if it is implemented on a parallel processing system, as much of the time taken in the training process resulted from linear algebra operations that can be parallelized.

Strength of the Tools
The DF representation maps deterministic, functional stellar variable observations to a stochastic matrix, with the rows summing to unity. The inherently probabilistic nature of DFs provides a robust way to model interclass variability and handle irregular sampling rates associated with stellar observations. Because the DF feature is indifferent to sampling density so long as all points along the functional shape are represented, the trained detection algorithm we generate and demonstrate in this article can be trained on Kepler data but directly applied to the LINEAR data, as shown in section 5.3.
The algorithm, including comparison methodologies, designed feature space transformations, classifiers, utilities, and so on, is publicly available at the project repository; 6 all code was developed in MATLAB and was run on MATLAB 9.3.0.713579 (R2017b).
The operations included here can be executed either via calling individual functions or using the script provided (ImplementDetector.m). Likewise, a Java version of all of the individual computational functions has been generated [see JVarStar, 67] and is included in the project repository. 7

Perspectives
This design is modular enough to be applied as is to other types of stars and star systems that are cyclostationary in nature. With a change in feature space, specifically one that is tailored to the target signatures of interest and based on prior experience, this design can be replicated for other targets that do not demonstrate a cyclostationary signal (i.e., impulsive, nonstationary, etc.) and even to targets of interest that are not time variable in nature but have a consistent observable signature (e.g., spectrum, photometry, image point-spread function, etc.). One of the advantages of attempting to identify the O'Connell effect Eclipsing Binary is that one only needs the phased light curve-and thus the dominant period allowing a phasing of the light curve-to perform the feature extraction and thus the classification. The DF process here allows for a direct transformation into a singular feature space that focuses on functional shape.
For other variable stars, a multiview approach might be necessary; either descriptions of the light curve signal across multiple transformations (e.g., Wavelet and DF), or across representations (e.g. polarimetry and photometry) or across frequency regimes (e.g. optical and radio) would be required in the process of properly defining the variable star type.
The solution to this multiview problem is neither straightforward nor well understood [68].
Multiple options have been explored to resolve this problem: combination of classifiers, canonical correlation analysis, postprobability blending, and multimetric classification.
The computational needs of the algorithm have only been roughly studied, and a more thorough review is necessary in the context of the algorithm proposed and the needs of the astronomy community. The k-NN algorithm dependence on pairwise difference, while one of its strong suits is also one of the more computationally demanding parts of the algorithm. Functionality such as k − d trees as well as other feature space partitioning methods have been shown to reduce the computational requirements.

Conclusion
The method we have outlined here has demonstrated the ability to detect targets of interest given a training set consisting of expertly labeled light curve training data. The procedure presents two new functionalities: the distribution field, a shape-based feature space, and the push-pull matrix metric learning algorithm, a metric learning algorithm derived from LMNN that allows for matrix-variate similarity comparisons. As a demonstration, the design is applied to Kepler eclipsing binary data and LINEAR data. The methodology proposed-DF + Push-Pull Metric Learning-is comparable to other methods presented, with respect to the OEEB detection problem given the limited Kepler dataset we have used for training. Furthermore, the increase in the number of systems, and the presentation of the data, allows us to make additional observations about the distribution of curves and trends within the population. Future work will involve the analysis of these statistical distributions, as well as an inference as to their physical meaning.
The new OEEB systems we discovered by the method of automated detection proposed here can be used to further investigate their frequency of occurrence, provide constraints on existing light curve models, and provide parameters to look for these systems in future large-scale variability surveys like LSST. Although the effort here targets OEEB as a demonstration, it need not be limited to those particular targets. We could use the DF feature space along with the push-pull metric learning classifier to construct a detector for any variable stars with periodic variability. Furthermore, any variable star (e.g., supernova, RR Lyr, Cepheids, eclipsing binaries) can be targeted using this classification scheme, given the appropriate feature space transformation allowing for quantitative evaluation of similarity. This design is directly applicable to exo-planet discovery; either via light curve detection (e.g., to detect eclipsing exo-planets) or via machine learning applied to other means (e.g., spectral analysis).