Statistical Equivalence of Metrics for Meteor Dynamical Association

We statistically evaluate and compare four orbital similarity criteria within five-dimensional parameter space ($D_{SH}$, $D_D$, $D_H$, and $\varrho_2$) to study dynamical associations using the already classified meteors (manually by a human) in CAMS database as a benchmark. In addition, we assess various distance metrics typically used in Machine Learning with two different vectors: ORBIT, grounded in heliocentric orbital elements, and GEO, predicated on geocentric observational parameters. Additionally, we compute the optimal cut-offs for all methods for distinguishing sporadic background events. Our findings demonstrate the superior performance of the sEuclidean metric in conjunction with the GEO vector. Within the scope of D-criteria, $D_{SH}$ emerged as the preeminent metric, closely followed by $\varrho_2$. $\varrho_2$ stands out as the most equivalence to the distance metrics when utilizing the GEO vector and the most compatible with GEO and ORBIT simultaneously, whereas $D_D$ aligns more closely when using the ORBIT vector. The stark contrast in $D_D$'s behavior compared to other D-criteria highlights potential inequivalence. Geocentric features provide a more robust basis than orbital elements for meteor dynamical association. Most distance metrics associated with the GEO vector surpass the D-criteria when differentiating the meteoroid background. Accuracy displayed a dependence on solar longitude with a pronounced decrease around 180$^\circ$ matching an apparent increase in the meteoroid background activity, tentatively associated with the transition from the Perseids to the Orionids. Considering lately identified meteor showers, $\sim$27\% of meteors in CAMS would have different associations. This work unveils that Machine Learning distance metrics can rival or even exceed the performance of tailored orbital similarity criteria for meteor dynamical association.


Introduction
Within the expanse of our planetary system, remnants from its formation provide glimpses into the early stages of our cosmic neighborhood (Bottke et al., 2002;Walker & Cameron, 2006).Among these remnants, comets emerge as witnesses to the dramatic events that shaped our nearby environment.These celestial bodies can undergo processes of disruption due to various factors such as volatile sublimation when approaching the Sun, tidal forces, or impacts with other bodies.According to the theory of formation and evolution of small bodies of the Solar System (Whipple, 1951;Bredikhin, 1954;Plavec, 1954;Hughes, 1986;Babadzhanov & Obrubov, 1992), meteoroid streams are formed mainly as a result of the activity of comets or the ejection of meteoroids from cometary nuclei with various initial velocities (Chapman, 2010;Tóth et al., 2011;Gritsevich et al., 2012).Meteoroids exhibit a diverse composition, including rock, metal, or a combination of both, and span a wide range of sizes, from micrometer-scale grains to larger objects up to one meter in diameter (Trigo-Rodríguez & Llorca, 2006, 2007;Koschny & Borovicka, 2017).Despite their heterogeneous characteristics, these meteoroids share a common origin, derived from a parent body, which imparts certain similarities among them.
Additionally, though less common, asteroids can also generate meteoroid streams as a result of catastrophic impact events.Some associations have been found, such as the case of the potentially hazardous asteroid (3200) Phaethon (1983TB), whose origin could be the nucleus of an extinct comet (Zhong-Yi et al., 2020), and the Geminids meteor shower.Multiple studies have confirmed the high probability that the Geminids are dynamically associated with such asteroid (Whipple, 1983;Gustafson, 1989;Williams & Wu, 1993).However, as they traverse the space, the influence of planetary perturbations and non-gravitational forces gradually renders them indistinguishable from the background population (Olsson-Steel, 1986;Bottke et al., 2000;Pauls & Gladman, 2005;Brož, 2006;Koschny et al., 2019).
Eventually, the journey of meteoroids brings them into intersecting paths with the Earth's orbit, leading to captivating interactions with our planet (Brown et al., 2002;Murad & Williams, 2002;Gritsevich, 2009;Trigo-Rodríguez, 2022).As these meteoroids penetrate the Earth's atmosphere, they experience a dramatic transformation fueled by the intense heat generated through air molecule friction.The high-speed entry produces enormous amounts of heat, causing the outer layers of the meteoroids to rapidly vaporize (Popova et al., 2019).This process, known as ablation, leads to the formation of a glowing plasma sheath surrounding the meteoroid (Ceplecha et al., 1998;Silber et al., 2018).The energy released during atmospheric aerobraking causes the visible phenomenon known as a meteor, which is called a fireball or bolide if its magnitude surpasses that of the planet Venus.When a meteoroid stream intersects the Earth's path periodically, it gives rise to the phenomenon of meteor showers (Jenniskens, 1994(Jenniskens, , 1998(Jenniskens, , 2006;;Vaubaillon et al., 2019;Jenniskens, 2023).The meteors within them share common features, including their time of occurrence, apparent origin in the sky, known as the radiant, and their geocentric impact velocity, as well as their orbital elements in an equivalent manner.
Determining the point at which a meteor shower transitions from a cohesive entity to a collection of unrelated meteoroids (sporadic background), or establishing the criteria to accurately associate meteors with a specific shower, poses a significant challenge.To tackle the issue of orbital dynamical association, multiple endeavors have been undertaken to define similarity criteria or D-criteria.
These criteria aim to effectively differentiate between events that are associated with a specific meteoroid stream and those that are unrelated to other objects or swarms.Ultimately, analyzing the impact features can aid in associating meteorites with their parent bodies (Carbognani & Fenucci, 2023).
In this study, we assess the rank correlation, efficacy, and equivalence of four five-dimensional similarity criteria designed for quantifying dynamical associations between meteor orbits, as well as various distance metrics with two different vectors (one shared with the D-criteria).The evaluation is conducted using a comprehensive meteor database and extends to exploring alternative metrics for orbit association, as well as computing the optimal thresholds for each method.The objective is to elucidate the statistical strengths, limitations, and similarities of each approach, thereby providing a robust framework for future research in meteor associations with parent bodies or meteoroid streams.
In Section 2, we detail the database utilized and the methodology applied.Section 3 presents our findings, and Section 4 provides a summary of the key outcomes of our study.

Data and Procedures
The methodology presented herein is designed to analyze multiple meteor dynamical association approaches by comparing fivedimensional orbital similarity criteria and various vector-based distance metrics typically used in Machine Learning.For the latter, we use as a vector (1) the same parameters utilized by the similarity criteria defined by some heliocentric orbital elements, which we termed as ORBIT, and (2) the four-dimensional vector proposed by Sugar et al. (2017) and named here as GEO.It should be noted that while the term "metrics" may be appropriate to describe the D-criteria to a certain extent, in this work, we use the term "metrics" exclusively to refer to vector-based distance metrics, which are further explained.This section is subdivided into different subsections.Subsection 2.1 elaborates on the data sources utilized.Subsection 2.2 presents D-criteria for comparing the orbital elements of two orbits.In Subsection 2.3, we introduce the two vectors that will be used along with the distance metrics.In Subsection 2.4 we explain the theoretical background used for calculating the rank correlations, comparing the performances with the Top-k accuracy method, and estimating the equivalence with the Kolmogorov-Smirnov test and Top-1 event-by-event agreement.Finally, in Subsection 2.5, we detail our strategy to determine the optimal thresholds for distinguishing between sporadic background and meteor showers.All implementations of the statistical analyses were conducted utilizing the SciPy library (Virtanen et al., 2020).

Databases
CAMS, short for the Cameras for All-Sky Meteor Surveillance project (Jenniskens et al., 2011), is an international initiative sponsored by NASA and managed by the Carl Sagan Center within the SETI Institute, located in California, USA.Its primary objective is to monitor and map meteor activity through nighttime optical video surveillance, employing triangulation techniques.
It annually records an average of half a million meteor orbits, although the publication of this data stopped in 2016.The last release was the Meteoroid Orbit Database v3.0, which includes 471,582 events registered since 2010.
While there are other automated meteor detection networks, CAMS stands out as the primary and most widely recognized repository of meteor data.Nevertheless, it was noted that its performance in accurately detecting fast meteors falls short in comparison to its detection of slower meteors (Koseki, 2017(Koseki, , 2022)).To address this issue, we implement a filtering mechanism to exclude lowerquality detections and to reduce spurious data, requiring a minimum convergence angle of 15 degrees between cameras, ensuring an estimated velocity error of no more than 10% of the nominal value, not allowing hyperbolic orbits, and selecting perihelion distance compatible with impacts on the Earth.
Certainly, we rely on the classification provided by CAMS as a ground truth, which may not be infallible.However, the classification within this database did not utilize any formal dissimilarity criteria.Instead, it depended on human visual clustering within sun-centered ecliptic longitude-latitude representations, with clusters manually delineated using specific coordinates and geocentric velocity limits (Jenniskens et al., 2018).Our analyses proceeds under the presumption that the CAMS classification is accurate, a premise that, regardless, serves our primary objective of assessing the equivalence between metrics and D-criteria.
For identifying meteoroid streams responsible for meteor showers, we use the V.2 list of all known showers from the IAU Meteor Data Center, updated in January 2024 (Jopek & Jenniskens, 2011;Jopek & Kanuchová, 2013;Jopek & Kaňuchová, 2017;Jenniskens et al., 2020).To facilitate the association of these meteor showers with entries in the CAMS database, we employ the IAU numeral code.This list includes 1484 entries, 956 corresponding to unique meteor showers.To ensure a direct comparison of association performances, we filter both CAMS and IAU meteor shower datasets to include only identical, unique meteor showers.

Orbital Similarity Criteria
Orbital elements such as inclination i, eccentricity e, longitude of the ascending node Ω, perihelion distance q, and argument of the perihelion ω allow us to determine the path of any moving object following a Keplerian trajectory in our Solar System.
Likewise, it is possible to look for the connection between a meteor shower and its parent body (or any two objects) through the similarities of their orbits.This search approach is not recent.The first attempts focused on measuring the degree of similarity between orbits were designed in the second half of the last century, they were so-called D-criteria.The first D-criteria was introduced by Southworth & Hawkins (1963): where other concepts of geometry come into play such as the angles between their respective perihelion points (π BA ) and between the inclinations of the orbits (I AB ).Drummond (1981) not only defined the angle between the perihelion points on each orbit (θ BA ) by adding both the ecliptic longitude (λ) and the perihelion latitude (β), but also weighted the terms e and q to provide a metric in which each term contributed equally to the overall sum.In this way, a new variant of the D S H criterion, named D D in honor of its creator, was developed: A decade later, Jopek (1993) carried out a random perturbation model of several orbits, ignoring i, Ω, and ω, to analyze the D S H and D D criteria.He found dependency relationships of q and e values for the reference orbit; q in the case of D S H and e for the criterion D D .To reduce these dependency relationships between orbital parameters, Jopek proposed a new similarity criterion, D H , defined by: (3) Note that these D-criteria cannot be categorized mathematically as metrics due to their violation of the triangle inequality (Kholshevnikov et al., 2016).Instead, they are more appropriately defined as quasimetrics, as they adhere to a relaxed form of the triangle inequality (Milanov et al., 2019).Contemporary functions, such as ϱ 2 , enable the precise quantification of orbital similarity through consistent mathematical formulations: with cos P = sin i 1 sin i 2 sin ω 1 sin ω 2 + cos ω 1 cos ω 2 cos (Ω 1 − Ω 2 ) The limit values of such D-criteria, also called thresholds, cut-off levels, or upper limits, determine whether two objects may be associated.Being, for example, A and B a meteor and meteor shower respectively, if the distance D(A, B) between A and B is greater than this limit value, the association must be discarded.The smaller this distance is, the greater the possibility that there is a dynamical similarity between two objects, and, therefore, the meteoroid belongs to the meteoroid stream.
Some studies on the suitability of these criteria have already been carried out.For example, Galligan (2001) explored the performance of four similarity functions in the near-ecliptic region-D S H , D D , D H , and D N (Valsecchi et al., 1999)-, resulting in D N criterion being the most stable in the case of the lack of a priori information on orbital inclination regimes, while D S H , which is based on meteor shower dispersion theoretical models, is more suitable with very different cut-off levels.However, D N has not been adopted in our approach due to its less straightforward application from the standard parameters provided in meteor databases.
Likewise, Moorhead (2015) analyzed such cut-off values to determine a chosen acceptable false-positive rate and distinguish which showers are significant within a set of sporadic meteors.Jenniskens (2008) and Rudawska et al. (2015) introduced the fourdimensional metrics D B and D X , respectively.However, to maintain consistency within the parameter space domain analyzed in this study, we opt not to include these criteria.
Through these values, it has been possible to associate meteor showers with parent bodies such as the 109P ( 1862 (Matlovic et al., 2020); or the recently observed fall and recovery of the Traspena meteorite is posited to be linked with the potentially hazardous asteroid 1989 QF (Minos), exhibiting ϱ 2 =0.1059 (Andrade et al., 2023).We note the absence of a cut-off estimate works for ϱ 2 , unlike the traditional D-criteria.
Although the cases mentioned above demonstrate the usefulness of the similarity criteria, some limitations confirm the need to investigate these metrics.For example, Galligan (2001) found that, for the case of the D S H criterion, it is necessary to use different upper limits depending on the orbital inclination angle of the stream.In fact, Sokolova et al. (2014), intending to improve the reliability of identification of the observed objects, recommends analyzing the D S H threshold values independently for each meteoroid complex.Following that approach, the study of comparison of four similarity criteria carried out by Rudawska et al.
(2012) confirmed the difficulty in obtaining one specific value of threshold that would fit all cases, reaching the conclusion that the ideal threshold depends on the cluster analysis method, the meteors shower, and the sample; this latter statement is also seconded by Jopek & Bronikowska (2017).Ye ( 2018) also pointed out that the traditional D-criteria may not necessarily reflect a shared origin of two objects due to the orbital evolution influenced by planetary perturbations.
In short, these studies are clear examples of the need to analyze the effectiveness and equivalence of the different approaches to establish dynamical associations of meteors.

Meteor Vectors and Distance Metrics
In the preceding section, we discussed five-dimensional D-criteria for associating meteors with meteor showers.While these approaches are widely used, they are not without limitations.It is an active research topic for which there is no consensus on either criteria or thresholds.To search for alternatives and compare their performance, we introduce two meteor vectors -ORBIT and GEO-to evaluate multiple Machine Learning distance metrics in meteor-shower association.
The ORBIT vector focuses simply on the same five heliocentric orbital elements that are used by the above-mentioned orbital similarity criteria, which allows for a more direct comparison of the effectiveness: Note that the database has been filtered to minimize spurious events, ensuring the inclusion of only non-hyperbolic orbits (0 < e ≤ 1) that intersect Earth's orbit, specifically with 0 < q ≤ 1 au.The inclination, when normalized by 180º, spans the range [0, 1].For the circular components, ω and Ω, which range from [-1, 1], we normalize them to [0, 1] and assign half the weight to each circular component.Utilizing sine and cosine functions for the circular angle ω and Ω, we effectively account for the shortest circular distance between angles, ensuring that 358 • is recognized as 4 • away from 2 • , rather than 356 • .Consequently, all five independent parameters are normalized and weighted equally, constructing a five-dimensional space vector.
The GEO vector is based mainly on geocentric observable parameters and was proposed by Sugar et al. (2017).This sixcomponent vector (but in four-dimensional space as it has only four independent parameters) inherently addresses the issue of longitude wrapping.It normalizes the six components to ensure that each variable contributes equally.The vector's initial two components represent the meteor's position, as the meteoroid intersects the Earth's orbit.The subsequent three components define the unit vector opposite to the meteor's velocity direction.The final component represents the magnitude of the geocentric velocity, normalized by the maximum velocity allowed for the study population: In this vector, v g represents the geocentric velocity in kilometers per second, λ ⊙ is the solar longitude, β g is the geocentric ecliptic latitude of the radiant, and λ g − λ ⊙ being the Sun-centered ecliptic longitude of the radiant.All components span the range [-1, 1], except for the element related to velocity, which varies between [0, 1].Given that velocity measurements are subject to the greatest degree of error, the authors allowed a reduced weight for the velocity.
Although the D-criteria are theoretically five-dimensional, the orbits of the meteors are constrained by having impacted the Earth, virtually reducing the dimensionality by one.Consequently, this dimensionality reduction enables a comparison between the performances of the GEO and ORBIT vectors.
In the quest to develop a robust methodology for associating meteors with their parent meteor showers, we explore various distance metrics typically used in Machine Learning that can quantify the similarity between the previously defined vectors.In Table 1, we introduce the distance metrics that are employed in this study.Metric Name Formula Brief Explanation Euclidean i (u i − v i ) 2 Square root of sum of squared differences sEuclidean

Angle between vectors Canberra
We select the Kendall rank correlation coefficient (τ) to measure the ordinal association between the distance metrics and Dcriteria.Mathematically, it is defined as: where (x 1 , y 1 ), ..., (x n , y n ) are a set of samples of the variables.
τ is notable for its ability to measure the strength and direction of the relationship between two variables without requiring them to be on the same scale.Unlike parametric correlations like Pearson's, which assume linear relationships and normal distribution of data, Kendall's approach is based on the ranking of data points, assessing concordance and discordance in their relative ordering across two datasets.It focuses on rank rather than absolute values obviates the need for identical scales between datasets.Consequently, we can employ it to compare the results of the D-criteria and the distance metric without applying any normalization.
We use the asymptotic method to compute Kendall's tau, which provides an efficient and scalable approximation suitable for large datasets and handles ties effectively.
The process is as follows.For each meteor in the dataset, we first compute its similarity/closeness to every meteor shower based on predefined D-criteria and distance metrics (both for GEO and ORBIT vectors).These calculations yield two separate sets of rankings for every meteor: one set derived from the D-criteria and another from the distance metrics.Each set sorts all meteor showers from the most to the least similar to the meteor in question.Once we obtain these rankings, the τ is computed for each meteor, comparing the two sets of rankings to ascertain the degree of ordinal classification.For more information into the Kendall rank correlation coefficient applied here refer to Kendall (1938); Fenwick (1994); Hollander et al. (2013).

Top-k Accuracy
The heart of the present study centers on the evaluation of the classification accuracy of various D-criteria and distance metrics.
To address this challenge, a unified methodology is imperative for the consistent application of statistical tests across all approaches under consideration.Despite the diversity in metrics and D-criteria, they converge on a singular objective: to quantify the association between a meteor and its corresponding meteor shower.As such, the Top-k accuracy is employed as a standardizing criterion to compare the overall accuracy among the various methods (Xia et al., 2009).
It quantifies the frequency with which the correct label class is included among the first k predicted labels.In the specific context, these labels denote the meteor showers associated with individual meteoroid impacts as classified by CAMS.For each meteor in the dataset, the similarities and distances are calculated in relation to all reference meteor showers.These values are subsequently sorted in ascending order to generate a ranked list.A successful classification in the Top-1 category occurs when the meteor shower with the minimum similarity or distance aligns with the meteor shower associated with the meteor in the CAM dataset.Similarly, a Top-5 success is recorded if the associated meteor shower is among the top five labels in the ranked list, and this extends analogously for other values of k.
In the present study, multiple tests encompassing Top-1, Top-5, and Top-10 accuracy are performed to evaluate the efficacy of D-criteria and distance metrics in associating a meteor with its originating meteor shower.This multi-tiered approach enables both a precise assessment of the top prediction (Top-1) and an evaluation of the model's capacity to identify a broader set of correct associations (Top-5 and Top-10).While one might assume that the Top-1 accuracy is paramount for meteor association, it is important to consider the significance of conducting Top-5 and Top-10 analyses.These extended evaluations yield insights into the efficacy of various ranking methodologies, going beyond mere concurrence with CAMS classifications.These analyses aid in contrasting the variability in rankings produced by different metrics.It is distinct when two metrics diverge at the Top-1 level yet converge within the Top-5, compared to a scenario where they diverge up to the Top-10.

Kolmogorov-Smirnov Test
The Kolmogorov-Smirnov test (K-S test) serves as a robust, non-parametric statistical method designed to assess the goodnessof-fit and equivalence of continuous, one-dimensional probability distributions.The test is particularly advantageous due to its distribution-free nature, making it applicable to datasets without the assumption of any specific distribution.The K-S test is employed in two primary contexts: the one-sample K-S test and the two-sample K-S test.The two-sample K-S aims to compare two empirical distributions and to determine if the two samples come from the same distribution.The K-S statistic D is: where F 1,n (x) and F 2,m (x) are the empirical distribution functions of the two samples of sizes n and m, respectively.Here we follow the treatment explained in Hodges (1958).
When applying the K-S test to Top-1 test results, interpreting the results sheds light on the comparative distributions of accuracy between classification methods.Failing to reject the null hypothesis H 0 indicates no statistically significant difference in accuracy distributions, but it does not affirm equivalence in method performance.Conversely, rejecting H 0 suggests a statistically significant difference, supporting the alternative hypothesis H 1 that the samples originate from distinct distributions.This outcome implies that H 0 does not adequately explain the observed data, with the decision to reject based on the significance level α, set here at 0.05 for 95% confidence.

Top-1 Agreement
Consider two classifiers tested on a dataset consisting of two equally sized classes.The first classifier might excel in identifying Class A but fail to recognize Class B, whereas the second classifier achieves the opposite, accurately identifying Class B while mistaking instances of Class A. Despite both classifiers reporting an overall accuracy of 50%, their distinct performance on the individual classes reveals a divergent understanding and representation of the underlying patterns in the data.This example underscores the necessity of applying another test, as (1) Kendall's correlation assesses whether the order of rankings is similar between two sets of observations and (2) K-S is specifically focused on the shape of accuracy distributions rather than precise values.
For this reason, we calculate as well the percentage of Top-1 coincidence between distance metrics and D-vectors on an eventby-event basis, which provides a direct measure of agreement on the most preferred classification outcome, capturing the extent to which different approaches concur on the single best classification.This straightforward metric offers an immediate sense of the hit-and-miss between approaches.A heatmap is an optimal visualization tool for showcasing the pairwise agreement between classification metrics, using a rectangular matrix to highlight the magnitude of their coincidences.

Differentiating the sporadic background
The last part of our work deals with the effective discrimination of the sporadic background from meteor events that are associated with specific showers.We calculate the Top-1 accuracy values across the entire (non-filtered) database and construct the Receiver Operating Characteristic (ROC) curves for each D-criteria and distance metric, utilizing both the GEO and ORBIT vectors, using binary labels from CAMS (0: sporadic; 1: associated).The ROC curve represents the diagnostic ability of a binary classifier system as its discrimination threshold is varied.Using the ROC curve output, it is possible to quantify the optimal threshold that maximizes the classifier's performance with Youden's J statistic (Youden, 1950;Schisterman et al., 2005): where TP represents the true positives, FN the false positives, TN the true negatives, and FP the false positives.
To synthesize the overall performance of each classification method in differentiating the sporadic background, we utilize the Matthews Correlation Coefficient, usually denoted by MCC or ϕ (Matthews, 1975).The ϕ offers a measure of the quality of binary classifications, encapsulating sensitivity, specificity, and the balance between them.It ranges from -1 (total disagreement between prediction and observation) to 1 (perfect prediction), with 0 denoting random guessing.The ϕ is defined as:

Results
Similar to Section 2, where we detailed the database and methodology in distinct subsections, the results section is also organized into subsections for clarity and depth.Subsection 3.1 examines the dataset, Subsection 3.2 presents the rank correlation estimations, Subsection 3.3 reports on the accuracy results, Subsection 3.4 explores the equivalence between distance metrics and D-criteria, Subsection 3.5 offers the level of coincidence between approaches for the Top-1 tests, and finally Subsection 3.6 provides optimal cut-offs and false positive rates.

Population Analysis
Within the extensive CAMS database, 24.6% of its entries can be directly linked to a distinct meteor shower.In contrast, 75.4% of the data points are categorized as sporadic events, implying they are part of the broader meteoroid background rather than specific meteor showers.After applying the filters mentioned in Section 2, the database reduces its number to account for 102,680 orbits.
The number of unique meteor shower classifications is somewhat constrained, amounting to 376 distinct categories.A total of 80% of these classified meteor showers have been observed more than 10 times.A quarter of them, or 25%, boasts over 100 individual recorded meteor events.An even smaller fraction, 5%, can claim over 1000 meteor instances.Four of the meteor showers stand out due to their frequent documentation: the Perseids, Orionids, Geminids, and Southern Taurids (enumerated in descending order based on their observation frequency).Meteors belonging to these showers have been observed more than 10,000 times.
Regarding the IAU meteor shower database, after filtering it reduces its number to 724, having 355 unique IDs shared with the CAMS database.Note that ∼30% are duplicate entries, corresponding to distinct values for the same meteor shower estimated in different studies.
A key aspect of our analysis of associations is the parameter of solar longitude, that correlates meteor activity with Earth's specific orbital locations.Such a correlation is instrumental in discerning patterns and understanding recurring meteoritic phenomena.To visually represent this correlation, Figure 1 offers a histogram that plots impacting meteoroid classifications (sporadic or associated) based on solar longitude.The most active meteor showers are annotated.It can be observed an apparent concentration of the meteoroid background activity toward 180 • of solar longitude.

Degree of Rank Correlation
For each of the showers listed in the IAU database, we compute the similarity/closeness between the shower and each meteor in the CAMS database using the D-criteria and all distance metric combinations.We then calculate the Kendall rank correlation between each D-criterion and each vector-metric combination.The different figures reveal particular features in the Kendall rank correlation between D-criteria and distance metrics, as delineated by the employment of GEO and ORBIT vectors.The sEuclidean metric paired with the GEO vector consistently demonstrates the highest median correlation across all D-criteria, indicating a robust ordinal association.In contrast, the ORBIT vector presents a distinctive landscape.D D criterion, when evaluated with ORBIT vectors, achieves the highest correlation values.ϱ 2 criterion exhibits considerable variability in correlation, as evidenced by notably wide box plots for some distance metrics when using the ORBIT vectors.This behavior starkly contrasts with the other D-criteria, pointing to ϱ 2 unique response to the parameters captured by ORBIT vectors.While the GEO vector is characterized by a greater number of lower outliers, indicating instances of significantly divergent rankings, ORBIT vectors show fewer upper outliers.The results show a general tendency for the median correlation values to either be randomly centered or skewed across both vectors and all metrics.This variability suggests that no singular pattern of correlation prevails universally.Additionally, the maximum whisker extension observed with the Cosine distance metric, specifically when paired with the GEO vector and D D criterion, signals instances of high variability or dispersion in the degree of correlation.

Accuracy of Best Choices
Using both the D-criteria and the employed meteor vectors and distance metrics, each meteor is juxtaposed with all showers, as detailed in Tables 2 and 3.This approach facilitates representing, in percentages, instances where the associated shower from the CAMS database in terms of distance and similarity aligns (Top-1) ranks among the five showers with the most minimal similarity and standardized Euclidean distance values (Top-5) or falls within the top ten showers (Top-10).
The optimal D-criterion for achieving Top-1 accuracy is D S H (86.23%), whereas ϱ 2 excels in both Top-5 (95.67%) and Top-10  markedly lagging behind the others, which exhibit comparable performances.The sEuclidean when combined with the GEO vector demonstrates superior performance (87.06%) over the other D-criteria in achieving Top-1 accuracy and overly the rest of the distance metrics in all Top-k tests.When paired with the ORBIT vector, the Bray-Curtis metric delivers the highest overall accuracy (including the D D criterion in all tests), except for Top-5 accuracy, where the Euclidean metric slightly outperforms it.
Across the distance metrics evaluated, the GEO vector is found to yield better outcomes than the ORBIT vector.The Chebyshev metric exhibits the worst results with the GEO vector, while the sEuclidean and Canberra present the lower performances for the ORBIT vector.Table 4 shows the mean accuracies.The distance metrics combined with the GEO vectors offer the best overall accuracy for Top-1, while the D-criteria outstrip in Top-5 and Top-10.97.4 ± 0.5 95.9 ± 0.9 95.9 ± 0.8 The trend of minimum accuracy in meteor association is pinpointed at 180 • solar longitude, aligning with an apparent increase in the meteoroid background activity, as depicted in Fig. 1.This time frame also bridges the Perseids and Orionids, meteor showers renowned for their high activity and velocities above 60 km/s, expecting, in consequence, a diffuseness of their parameters.
Instrumental constraints correlate meteoroid velocity with measurement inaccuracies (Hajduková & Kornoš, 2020).As a result, high-velocity meteoroids are more challenging to accurately characterize.This is depicted in Figure 5, showcasing a concentration of high apparent velocities within this specific solar longitude range.It is conceivable that these meteoroids were once part of such swarms but have lost their orbital affinity due to temporal decoherence, making many of them challenging to distinguish.
Furthermore, the increased activity during these periods, characterized by similar velocities, may have influenced the association process conducted by CAMS.than or equal to 0.05 retain the H 0 label, indicating insufficient evidence to reject the null hypothesis, thus suggesting no statistically significant difference between distributions under examination.

Statistical Equivalence
The figure reveals a distinct pattern in the distribution of hypothesis testing results, particularly when evaluating the D D criterion with the GEO vector.Contrary to the other D-criteria, which generally do not reject the null hypothesis H 0 when paired with the GEO vector, D D stands out by predominantly rejecting H 0 (indicated by H 1 ), suggesting differences in distributions.This trend is reversed for the ORBIT vector, where D D results in non-rejection of H 0 , except for one the Canberra metric.This behavior is markedly different from the other criteria tested with the ORBIT vector, which rejects H 0 for the same three metrics (sEuclidean, Canberra, and Chebyshev).ϱ 2 appears the most likely compatible with both vectors at the same time.
The consistent failure to reject H 0 with GEO vector for all distance metrics under the ϱ 2 criterion does not confirm the distributions being identical but rather indicates the test lacked sufficient evidence to demonstrate statistical differences.This outcome positions ϱ 2 as the D-criterion that is most plausibly comparable to the distance metrics in terms of meteor association when using the GEO vector.Also, the ϱ 2 metric exhibits the highest probability of being compatible with both vectors simultaneously.

Event-by-Event Agreement
The heatmap on Figure 7 visualizes the agreement level between various D-vectors and distance metrics, showcasing their comparative analysis for the Top-1 results across the two meteor vectors.Each cell represents the percentage of Top-1 coincidence between pairs, with GEO-related comparisons highlighted in shades of blue for intuitive analysis, and ORBIT-related comparisons in shades of red, enabling a clear distinction between the two meteor vectors used.The diagonal, intentionally left blank, separates GEO and ORBIT results for a dual analysis within a single visual representation.The cross-accuracies of D-criteria, independent of GEO or ORBIT vectors, are outlined with a black frame in the figure's top left corner.Analyzing the heatmap reveals that the D-vector D S H has a strong event-by-event alignment with D H for Top-1 (97.43%), indicating these criteria frequently concur on their top classifications.This is closely followed by ϱ 2 and D H (94.69%).Within the GEO vector, Euclidean and Cosine (99.66%), along with Cityblock and Bray-Curtis (99.15%), show the highest levels of coincidence in Top-1.The sEuclidean metric generally shows good agreement for the GEO vector across various metrics and D-criteria (∼88%), except when paired with D D (83.37%).For the ORBIT vector, Cityblock and Bray-Curtis (98.99%), as well as Euclidean and Cosine (95.62%), exhibit the highest values.There is better alignment between D D and the ORBIT vector (reaching ∼86% with various distance metrics) than seen with the GEO vector.
The consistency observed in the heatmap resonates with the findings from Kendall's correlation and the K-S test.These statistical measures support the identified patterns of agreement and discrepancy among the classifiers, providing robustness to the analysis and confirming the reliability of these patterns.

Thresholds and Confusion Matrices
Table 5 presents the evaluation of D-criteria and distance metrics within the CAMS database, considering both sporadic and associated meteor events, where optimal thresholds and the effectiveness of different methods are encapsulated.The standout performer among D-criteria is D S H , distinctly outshining others with a ϕ of 0.6400.Conversely, D D emerges as the least effective.
When using the GEO vector, the sEuclidean metric takes precedence, exhibiting the highest overall accuracy and a ϕ value of 0.6464, closely followed by Cityblock and Bray-Curtis metrics.The scenario shifts when transitioning to the ORBIT vector, where Cityblock edges out as the frontrunner, albeit with Bray-Curtis and Euclidean not far behind, suggesting a competitive field with closely matched performances.The sEuclidean metric with ORBIT vector does not mirror its GEO vector success, hinting at vector-specific behavior that influences metric efficacy.
Cityblock, while not outperforming other distance metrics in replicating CAMS' associated Top-1 classifications, excelled in more effectively distinguishing the sporadic background on average.Except for D S H , all distance metrics applied to the GEO vector-aside from Cosine-surpass the rest of the D-criteria in terms of the ϕ.Interestingly, despite generally lower performance with the ORBIT vector, several distance metrics still exceed some D-criteria performances.Cityblock, in particular, scores relatively close to achieving the superior results of D S H and sEuclidean.
Additionally, the observed thresholds for traditional D-criteria (D S H , D D , and D H ) align perfectly with values documented in the scientific literature, reinforcing the validity of our findings.As an additional note to our findings, it is noteworthy that upon incorporating the complete list of meteor showers-not limited to those used within the CAMS database-an average of 27% of the meteor classified (Top-1) by all D-criteria and distance metrics would align better with newly recognized meteor showers.In future efforts, we aim to do a comparative analysis by testing other databases such as GMN (Vida et al., 2021) and EDMOND (Kornoš et al., 2014).

Conclusions
This study undertook a statistical evaluation of four orbital similarity criteria (or D-criteria) within a five-dimensional parameter space to probe the dynamical associations within meteor data.Utilizing the extensive data compiled by the CAMS network, we have not only relied on D-criteria (D S H , D D , D H , and ϱ 2 ) but also ventured into distance metrics commonly applied in Machine Learning (Euclidean, sEuclidean, Cityblock, Cosine, Canberra, Bray-Curtis, and Chebyshev), investigated across two distinctive meteor vectors.One vector termed ORBIT, based on heliocentric orbital elements, is essentially shared with the D-criteria, and the other one, GEO, based on geocentric observational parameters, was proposed by Sugar et al. (2017).Our methodology hinged on the Kendall rank correlation coefficient and Top-k accuracy tests to assess the correlation and performance of these criteria and metrics.We also applied the Kolmogorov-Smirnov test and computed the level of coincidence of individual Top-1 results for discerning the statistical equivalence of the different approaches.Finally, we calculated the optimal thresholds and evaluated their performances in distinguishing the sporadic background from the meteor showers.
Our key findings can be summarized as follows: • The sEuclidean metric paired with the GEO vector demonstrates superior performances than the D-criteria and the other distance metrics in achieving Top-1 accuracy (87.06%).
• Regarding the D-criteria, the D S H criterion holds the upper hand in achieving Top-1 accuracy (86.23%), while ϱ 2 maintains dominance in both the Top-5 (95.67%) and Top-10 (97.93%) categories (surpassed by D S H in Top-1 accuracy by 0.67%).
• The Bray-Curtis metric, allied with the ORBIT vector, demonstrated a consistent edge over other distance metrics, outperforming the D D criterion across all Top-k tests (83.96%, 94.10%, and 96.61%, in increasing order of k) and only slightly beaten by the Euclidean metric in Top-5 accuracy by a negligible difference (0.07%).
• D D exhibits an opposite trend to the other D-criteria when evaluating its equivalence against distance metrics with the GEO vector.
• Among the D-criteria, ϱ 2 appears as the most likely similar to the distance metrics with the GEO vector, being also the most compatible with both GEO and ORBIT vectors at the same time.
• In general terms, the D-criteria and the metric distances provide similar accuracies in Top-k tests (83.7±2.5%,93.6±1.3%, 96.2±1.0%, in ascending order of k), with the D D and the metric Chebyshev performing worse.
• We observed moderate solar longitude-dependent deviations and a common significant decrease in accuracy around 180 • of solar longitude.We tentatively linked these features to heightened meteoroid background activity and the interface with two of the most active, high-velocity meteor showers: the Perseids and the Orionids.
• Among D-criteria, D S H distinguishes itself with a ϕ of 0.6400, translating to an 84.17% accuracy rate in separating the background, while D D emerges as the least effective, with a ϕ of 0.5877 and an accuracy of 81.87%.
• Excluding Cosine, all distance metrics associated with the GEO vector surpass the D-criteria in ϕ when differentiating the meteoroid background.
• Despite the ORBIT vector's generally lower performance, various distance metrics still exceed certain D-criteria in effectiveness.
• Optimal cut-offs for all D-criteria and distance metrics are provided, founded on the CAMS database classification.
• Based on these approaches, ∼27% of associated meteors in CAMS would align with showers identified after the database's release.
• Future research will concentrate on studying effectiveness, equivalences, and thresholds for a synthetic impacting population, exploring the performance and specific attributes of the methods for each individual meteor shower.
The work culminates in the significant revelation that Machine Learning distance metrics can rival or even outperform the specifically tailored orbital similarity criteria for meteor dynamical association.This opens up new pathways for the use of computational techniques in the field of meteor science, offering an opportunity to refine our approaches to classifying meteor showers and sporadic meteors alike.
) III Swift-Tuttle comet and the Perseid meteor shower, whose first connection data from the late 19th century when Schiaparelli calculated the orbits of the Perseids and discovered their strong similarity to that of this comet.Involved on this connection,(Sokolova et al., 2014) calculated the cut-off level of D S H resulting in D S H ⩽ 0.2.Literature provides more classical examples such as the April Lyrids, whose extremely small value of the D D criterion (D D =0.009) suggests that such meteors showers have indeed come from comet Thatcher(Arter & Williams, 1997).Other recent examples are the case of a fireball detected in the night sky over Kyoto whose likely parent, with D S H = 0.0079, could be the binary near-Earth asteroid (164121) 2003 YT1(Kasuga et al., 2020); the binary asteroid 2000 UG11 associated with Andromedids (D S H =0.183 and D H =0.176) and the asteroid (4179) Toutatis, with values of D S H =0.180 and D H =0.175, that postulate it associated with October Capricornids(Dumitru et al., 2017); the meteor shower June epsilon Ophiuchids, whose values in three D-criteria (D S H =0.05, D D =0.03 and D H = 0.06) confirm that is likely to originate from comet 300P/Catalina

Figure 2
Figure 2 displays the Kendall rank correlation between the evaluated D-criteria and distance metrics.Each column corresponds to a particular distance metric, and the plots are color-coded by D-vectors.The box plots encapsulate the quartile distribution of the samples, where each sample denotes the rank correlation between the D-criteria and distance metrics for a meteor with all meteorshowers.The calculation is performed for each meteor against all meteor showers, a process executed separately for both the GEO and ORBIT vectors.Points lying outside the whiskers of the box plots are classified as outliers, positioned more than 1.5 times the interquartile range away from the median (Q2, depicted by the box's central line).A homogeneous dataset would result in a compact interquartile range, with the median equidistant from the box's extremes (Q1 and Q3), indicating symmetry.The span

Fig. 1 .
Fig. 1.Histogram of CAMS database as a function of the solar longitude.Sporadic and associated meteors are depicted.

Fig. 2 .
Fig. 2. Kendall rank correlation between D-criteria and metric distances for associated meteors in CAMS database.Each column corresponds to a unique vector (ORBIT or GEO).Each sample symbolizes the rank correlation between the similarity criteria and the distance metrics of each of the meteors from the CAMS database concerning the distinct meteor showers.Outlier values surpass 1.5 times the interquartile range of the median.

Figure 3
Figure 3 illustrates the variation in Top-k test accuracy as a function of solar longitude across different D-criteria.Similarly, Figure 4 displays the Top-k results for the sEuclidean, Canberra, Bray-Curtis, and Chebyshev distance metrics.Across all evaluations,the results are of the same order of magnitude.A distinct pattern emerges: for Top-1, the accuracy variation is irregular, whereas, for Top-10, it tends towards uniformity, except for a notable decrease (up to 50%) around 180 • of solar longitude.Visually, the lower performance of D D is prominent, and D S H and ϱ 2 excel, especially at solar longitudes between 170 • and 220 • , as well as around 70 • in Top-5 and Top-10, and 350 • in Top-1 (with a sudden increase of the accuracy of D H ). Conversely, the performances of the distance metrics generally follow the same trend, albeit less uniformly in the Top-10 distribution.Besides the common peak at 180 • , it is observed that they struggle to associate meteors at around 310 • , where Chebyshev (with GEO vector) and Canberra (with ORBIT vector) exhibit remarkably lower performances.

Figure 6
Figure6displays classification outcomes labeled as H 0 or H 1 , corresponding to the hypothesis tested for each data comparison for the Top-1 accuracy results from the metric distance and the D-criteria.Labels are determined based on p-values: instances where the p-value is less than 0.05 are marked as H 1 , indicating the rejection of the null hypothesis (H 0 ) in favor of the alternative (H 1 ), suggesting a statistically significant difference between the compared distributions.Conversely, instances with a p-value greater

Fig. 3 .
Fig. 3. Top-k accuracies along solar longitude of the D-criteria for associated meteors in CAMS database.

Fig. 5 .
Fig. 5. 2D-histogram of sporadic meteor apparent velocities and solar longitudes at impact in the CAMS database.Darker colors denote higher density.

Fig. 6 .
Fig.6.K-S test comparing Top-1 accuracies of distance metrics and D-criteria with a 95% level of confidence for associated meteors in CAMS database.H 0 indicates no statistically significant difference between distributions, while H 1 indicates a significant difference between the compared distributions.

Fig. 7 .
Fig. 7. Heatmap of cross-coincidence between D-vectors and distance metrics using GEO (lower triangle, blue colormap) and ORBIT (upper triangle, red colormap) vectors of Top-1 accuracies for associated meteors in CAMS database.D-vector's own cross-coincidences are highlighted within a black rectangle in the top left corner.

Table 1 .
Summary of distance metrics.

Table 2 .
Top-k accuracies of D-criteria for associated meteors in CAMS database.
Top-k D SH (%) D D (%) D H (%) ϱ 2 (%)from the plot to each whisker indicates the data's variability or spread, suggesting a more concentrated distribution if the span is shorter and greater dispersion if it is extended.

Table 3 .
Top-1, Top-5, and Top-10 accuracies of distance metrics for associated meteors in CAMS database.93%) categories and have a good accuracy in Top-1 (85.56%).Conversely, D D ranks as the least effective across all evaluations,

Table 4 .
Mean accuracies and standard deviations for Top-k tests across the D-criteria and distance metrics with GEO and ORBIT vectors.

Table 5 .
Threshold, accuracies, and Matthews correlation coefficients for different D-criteria and distance metrics in the CAMS database taking into account the sporadic and associated events.