An empirical"high-confidence"candidate zone for $Fermi$ BL Lacertae objects

In the third catalog of active galactic nuclei detected by the $Fermi$ Large Area Telescope Clean (3LAC) sample, there are 402 blazars candidates of uncertain type (BCU). The proposed analysis will help to evaluate the potential optical classification flat spectrum radio quasars (FSRQs) versus BL Lacertae (BL Lacs) objects of BCUs, which can help to understand which is the most elusive class of blazar hidden in the Fermi sample. By studying the 3LAC sample, we found some critical values of $\gamma$-ray photon spectral index ($\Gamma_{\rm ph}$), variability index (VI) and radio flux (${\rm F_R}$) of the sources separate known FSRQs and BL Lac objects. We further utilize those values to defined an empirical"high-confidence"candidate zone that can be used to classify the BCUs. Within such a zone ($\Gamma_{\rm ph}<2.187$, log${\rm F_R}<2.258$ and ${ \rm logVI<1.702}$), we found that 120 BCUs can be classified BL Lac candidates with a higher degree of confidence (with a misjudged rate $<1\%$). Our results suggest that an empirical"high confidence"diagnosis is possible to distinguish the BL Lacs from the Fermi observations based on only on the direct observational data of $\Gamma_{\rm ph}$, VI and ${\rm F_R}$.


INTRODUCTION
Blazars are a particular class of radio-loud active galactic nuclei (AGNs) with a relativistic jet pointing toward the Earth. The broadband (from radio up to TeV energies) emissions of blazars are mainly dominated by nonthermal components which are produced in the relativistic jet (Urry & Padovani 1995). According to the strength of the optical spectral lines, blazars can be further divided into two subclasses (Stickel et al. 1991;Stocke et al. 1991;Laurent-Muehleisen et al. 1999), namely, the flat spectrum radio quasars (FSRQs; strong emission lines with equivalent width EW ≥ 5Å in rest frame) and the BL Lacerate objects (BL Lacs; weak or no emission and absorption lines). The multi-wavelength spectral energy distributions (SEDs) from the radio to γ-ray bands of normally exhibit a twohump structure in the logν − logνF ν space. The low energy bump (peaking between millimeter and soft X-ray range) is always explained as synchrotron emission from the non-thermal electrons in the relativistic jet, while the high energy bump (peaking within MeV-GeV energy range) is inverse Compton (IC) scattering. Furthermore, based on the peak frequency (ν S p ) of the lower energy bump, blazars can also be classified as low-( ν S p < 10 14 Hz), intermediate-( 10 14 Hz < ν S p < 10 15 Hz) and high-( ν S p > 10 15 Hz ) synchrotron-peaked sources (i.e., LSPs, ISPs, and HSPs, Abdo et al. 2010).
This work utilizes the third catalog of AGNs detected by the F ermi-LAT (3LAC) sample , which is part of the first four years of the F ermi-LAT data, the third F ermi Large Area Telescope (LAT) source catalog (3FGL, Acero et al. 2015). The 3LAC clean sample (i.e., the high-confidence clean sample of the 3LAC) reports 1444 γ-ray AGNs: 414 FSRQs (∼ 30%), 604 BL Lac objects (∼ 40%), 402 blazar candidates of uncertain type (BCU, ∼ 30%) and 24 non-blazar type AGNs (< 2%) . The identification of FSRQs and BL Lacs are solid, mostly based on the clear evidence on the (non-)existing of emission and/or absorption lines. On the other hand, BCUs are those sources without a confirmed classifications due to the missing representative features on optical spectrum (BCU I), synchrotron peak frequencies of SED (BCU II), and/or their broadband emissions (BCU III) (see Ackermann et al. 2015;Acero et al. 2015 for the details and references therein). Studying such a large sample of BCUs is crucial to understand of the physics of γ-ray emission of blazars (e.g., Singal et al. 2012;Singal 2015;Fan et al. 2016;Kang et al. 2018Kang et al. , 2019aZhu et al. 2020).
Estimating the possible classification BL Lac vs FSRQ of BCUs can help to understand which is the most elusive class of blazar hidden in Fermi sample . Indeed, some potential BL Lac or FSRQ candidates can be identified from the BCUs sample in the 2FGL/3FGL catalogues using different approaches such as supervised machine learning (e.g., support vector machine [SVM] and random forest [RF]; Hassan et al. (2013)), neural network (Chiaro et al. 2016), artificial neural network (ANN; Salvetti et al. 2017), multivariate classification method (Lefaucheur & Pita 2017), and by statisical analysis of the broadband spectral properties (including spectral indices in the gamma-ray, X-ray, optical, and radio bands; Yi et al. 2017). In addition, we've identified potential BL Lacs and FS-RQs candidates from the 3LAC Clean sample using 4 different SML algorithms (Mclust Gaussian finite mixture models, Decision trees, RF, and SVM; Kang et al. 2019a [Paper I]) and from the 4FGL catalogue using 3 different SML algorithms (ANN, RF, and SVM; Kang et al. 2019b). Nevertheless, the final confirmation of the BCU nature of candidates in all above approaches is subject to the observations of optical spectroscopy or counterparts in other wavelength (e.g., Massaro et al. 2014;Álvarez Crespo et al. 2016a,b,c;Massaro et al. 2016;Marchesini et al. 2016;Klindt et al. 2017;Peña-Herazo et al. 2017;Marchesi et al. 2018;Desai et al. 2019;Marchesini et al. 2019;Peña-Herazo et al. 2019), or broadband spectral features (e.g., Fermi/LAT collaboration, Massaro et al. 2009Massaro et al. , 2012Massaro et al. , 2016Álvarez Crespo et al. 2016a,b,c). If such information is missing, classification of BCUs will become challenging especially when no training set is available (see e.g., Shaw et al. 2013;Landoni et al. 2015;Ricci et al. 2015;Paiano et al. 2017a,b;Landoni et al. 2018;Kaur et al. 2019;Kaur et al. 2019). To overcome such difficulties, in this letter, we aim to evaluate the potential classification of BCUs based on only on the direct observational properties in γ-ray and radio band. Such properties include γ-ray photon spectral index (Γ ph ), and variability index (VI) and radio flux (F R ). By perform some detailed analysis, we confirmed the existence of a high-confident zone where the condition-met BCUs are most likely BL Lac objects.
We organize the present paper as follows. In Section 2, a brief description on the sample selection is provided followed by the proposed analysis methods and results. Comparisons of between our results with some other recent results are presented in Section 3. Our results are discussed in Section 4 and summarized in Section 5.

DATA SELECTION AND ANALYSIS
We selected the data from the 3LAC Clean sample 1 in the 3FGL Catalog 2 . In order to perform the analysis, we selected the sources with available measurements of Γ ph , VI, and F R , which yields to a sample of 1418 Fermi blazars, including 414 FSRQs, 604 BL Lacs and 400 BCUs (two sources that have no radio data are excluded) In order to investigate whether there is a characteristic zone in the 3-parameter (namely, Γ ph , VI, and F R ), we first exhibited the scatterplots of the known FSRQs and BL Lacs samples. In Figure 1, the 2-D scatterplots between any two parameters of Γ ph , VI, and F R for the identified FSRQs and BL Lacs are shown in the left column. One can immediately notices that the values of Γ ph , VI, and F R of the FSRQs are normally larger than those of the BL Lacs. FSRQs feature comparatively concentrated distribution, while the BL Lacs show a relatively wider distribution. The The scatterplots of the variability index (VI), radio flux (FR) and photon spectral Index (Γ ph ) for fermi blazars (left column), where red solid squares represent FSRQs and blue empty points represent BL Lacs. The right panels represents the scatterplots of the BCUs (right column), where the BCUs (I) -are the identified BL Lacs (blue solid points) using the "ai < X zone" and the BCUs (U) -are the unidentified BCUs (red empty squares). The dotted-dashed parallel and perpendicular blue lines indicate Γ ph , logFR, and logVI is equal to 2.187, 2.258, and 1.702, respectively. The value of the t-statistic (t), the degrees of freedom for the t-statistic (df) and the p-value (p2) for the Welch Two Sample t-test are listed in Column 5, Column 6 and Column 7 respectively; Column 8 and Column 9 report the value of the test statistic (W ) and the p-value (p3) for the Wilcoxon rank sum test with a continuity correction. All data are obtained by R code (https://www.r-project.org/) (see R Core Team 2019). Note-Column 1 shows the test dataset: the 3 parameters of the 414 FSRQs. Column 2 shows the parameter used in the one-sample test. Column 3 and Column 4 give the value of the test statistic (D) and the p-value (p1) for the One-sample Kolmogorov−Smirnov test; The value of the t-statistic (t), the degrees of freedom for the t-statistic (df) and the p-value (p2) for the Welch One Sample t-test are listed in Column 5, Column 6, and Column 7, respectively; Column 8 and Column 9 report the value of the test statistic (W ) and the p-value (p3) for the Wilcoxon signed rank test with a continuity correction.
All data are obtained by R code (https://www.r-project.org/) (see R Core Team 2019.) distributions of Γ ph , VI, and F R between the FSRQs and BL Lacs groups exhibit significantly different behavior. The two-sample Kolmogorov−Smirnov test for these 3 parameters gives the value of the test statistic D = 0.514 and the p-value p 1 = 0; the Welch Two Sample t-test gives the value of the t-statistic t = 32, the degrees of freedom for the t-statistic df = 2455 and the p-value p 2 <1.0E-6; while the Wilcoxon rank sum test with the continuity correction gives the value of the test statistic W = 1826100 and p-value p 3 <1.0E-6 (obtained by R 3 code R Core Team 2019) for all the 3 parameters (see Table 1). For the other parameter combinations, either one or two parameters, the test results are also listed in Table 1. The results significantly reject the hypothesis that the two distributions (FRSQs, BL Lacs) are drawn from the same distribution. We find the two samples (marked red and blue) can be well separated by some critical lines with the following value: Γ ph =2.187, logF R =2.258, and logVI=1.702. The three critical values are obtained in the following procedure: 1. We performed a one-sample normal distribution test (e.g., KS-test, t-test, and Wilcoxon test) for the Γ ph , VI, and F R of the FSRQs and found that the distribution of those parameters are consistent with a Gaussian distribution with significant p-value (Table 2).
2. We further obtained lowest one-sided confidence interval value (a 1 = 2.187, a 2 = 2.258, and a 3 = 1.702) under the assumptions that Γ ph , VI, and F R of FSRQs are normally distributed, which are assigned as the critical value mention above.
We find that there are no FSRQs falling in a range Γ ph < a 1 , logF R < a 2 and logVI < a 3 , while some BL Lacs lie in the zone (a i < X), where a i (i=1,2,3) is set as the boundary value. Moreover, there are only 3 FSRQs in the range of a i < X (Γ ph =2.187, logF R =2.258, and logVI=1.702), where the misjudged rate η = 3/414 ≃ 0.725% for FSRQs is obtained. Here the misjudged rate η is a probability that an FSRQ is misclassified as a BL Lacs, which is defined as: η = N err /N F , where N F is the total number of FSRQs and N err is the number of FSRQs that are misclassified as BL Lacs at the a i < X range.
In order to test our hypothesis, we randomly divide FSRQs into 10 sub-samples with one sub-sample is reserved as the verification data, and the remaining 9 sub-samples are used as the training data. Then, the proposed analysis is repeated 10 times (the 10 folds), the misjudged rate η is repeatedly calculated 10 times. Finally, by averaging the 10 misjudged rates, a 10-fold cross-validation 4 misjudged rate η = 0.971% is obtained for the FSRQs. This result suggests that the zone of a i < X (e.g., the lowest one-sided confidence interval value for 1σ confidence level with ≃ 0.725% false positive rate for FSRQs) can be treated as a "inviable" region for the FSRQs or as a candidate zone for the BL Lacs, called "a i < X candidate zone" for BL Lacs.
Finally, we can test the "a i < X candidate zone" in the BCU sample. We obtain 120 BL Lac candidates which fall into the high-confidence zone with all following three conditions satisfied: Γ ph < 2.187, logF R < 2.258 and logVI < 1.702. These 120 sources are plotted as blue solid circles in Figure 1 (right column) and listed in Table 3, while the red empty squares mark the rest unidentified optical classification BCUs. Table 3. The identified BL Lac candidates using the "ai < X candidate zone"  Table 3 continued on next page 4 In a K-fold cross-validation, the original samples are randomly divided into K sub-samples. Among the K subsamples, one subsample is reserved as the verification data of the test model, and the remaining K-1 subsamples are used as the training data. Then, the crossvalidation process is repeated K times (multiple times), and each of the K sub-samples is accurately used as the verification data. The K results resulting from the folding can then be averaged (or otherwise combined) to produce a single estimate.    Acero et al. 2015). Column 3 gives the SED classifications (LSP, ISP and HSP); the radio flux (logFR) is listed in Column 4. The γ-ray photon spectral index (Γ ph ) and γ-ray variability index (logVI) and are shown in Columns 5 and 6, respectively. The BL Lac candidates using the "ai < X candidate zone" are listed in Column 7. Columns 8-11 (M8, DT8, RF8, and SVM8) indicate the BL Lac -type ("bll") candidates identified by 4 different supervised machine learning (SML) algorithms (Mclust Gaussian finite mixture models (M8), Decision trees (DT8), Random forests (RF8) and support vector machines (SVM8)) with 8 parameters in Kang et al. 2019a. Column 12 (LP17) lists the classifications ("bll" for BL Lac, "unc" for uncertain and "-" for a mismatched source by cross comparison) in Lefaucheur 36  118 109  110 108 105  4  bll  120 117 120 120  118  119  116  118  24  41  2  11  10  12  15  63  74  fsrq  0  3  0  0  2  1  0  2  0  0  0  0  0  0  0  1  unc  2  43  52 Note-Column 1 shows the classifications (− represents the number of mismatch by cross comparison, "bll" ,"fsrq" and "unc" indicate BL Lac, FSRQ and uncertain type respectively). Column 2 is the number of sources (NX ) obtained by ai < X candidate zone.  Table 3) are obtained from cross-matching these results (M16, 3FHL, D19, M19, M18, P17, A16, and 4FGL).

COMPARISON WITH LITERATURE RESULTS
We then compared our 120 identified BL Lac candidates with some other recent studies. We found that our results are mostly consistent with previous works presented in Chiaro et al. (2016); Lefaucheur & Pita (2017); Yi et al. (2017) and Kang et al. (2019a) which utilize different statistical (e.g., SML) algorithms (see Table 4 and Table 3). The exceptions are as follows: 2 sources do not find matching sources and 2 sources did not provide a clear classification in Lefaucheur & Pita (2017). In addition, only 3 sources are classified as FSRQs in M clust Gaussian Mixture Modelling (M 8 ), and two are classified as FRSQs using support vector machine (SVM 8 ) using 8 parameters in Kang et al. 2019a; 1 source is classified as an FSRQ in Chiaro et al. 2016 (Chi16), whereas two sources are classified as FRSQs in Yi et al. 2017 (Y17). The results, provided in Table 4, indicate the highest mismatch rate (e.g., rate = 3/120% ∼2.5%) is less than 3%. Hence, the selected area (a i < X candidate zone) shows a higher degree of confidence.
For these 120 identified BL Lac candidates in the work, of which 41 sources are identified as BL Lac-type in the 3FHL catalog (Ajello et al. 2017, 3FHL, see Table 4); and 63 sources are identified as BL Lac-type in 4FGL catalog (see 4FGL FITS table "gll psc v20.fit" 5 of The Fermi-LAT collaboration 2019, 4FGL). Only 1 source is classified as an FSRQ in the 4FGL catalog (The Fermi-LAT collaboration 2019). There are 24, 2, 11, 10, 12, and 15 sources that have been identified as the BL Lac-type by Massaro et al. (2016) Table 4 and Table 3). Here, the remaining ones ("46 sources") need to be further tested and confirmed by spectroscopic observations.

DISCUSSIONS
As shown in Paper I or other similar works, the SML method can return the probabilities P Bi and P F i (e.g., see the machine-readable supplementary material in Table 4 in Kang et al. 2019a) that a BCU i belongs to the BL Lacs (B) or FSRQs (F) classes, respectively. These probabilities can help to distinguish each source belonging to each class. However, it should be noted that SML algorithms provides a statistical approach (or other statistical algorithms) to address the potential classification of BCUs, but the test error rate > 0.11 (e.g., Paper I) is still very large. Due to the very large misclassified value, FSRQs and BL Lacs may be misclassified. A more efficient (high confidence) method for evaluating the potential classification of the BCUs may be necessary, and needs to be further addressed. On the other hand, in fact, in this work, our aim is to obtain a more precise conclusion with the least, most direct observation with the simplest method. Although there is still some artificiality in limiting the boundary value of "a i < X candidate zone", the result of "a i < Xcandidate zone" is stable. Here, only a part of BL Lacs are classified from the BCUs, but not the majority. The results likely provide some clues to the further study. For instance, it can contribute to subsequent source selections in the spectroscopic observation campaigns needed to confirm their Note-Column 1 shows the parameters satisfied simultaneously. Column 2 is the misjudged rate (η1σ ) in the boundary value with a one-sided confidence interval for the 1σ confidence level. Column 3 is the number of BL Lac candidates (N1σ) selected from the BCUs in the boundary value with one-sided confidence interval for the 1σ confidence level.
real nature and, possibly, determine their redshifts (see, e.g., Ajello et al. 2014), perform population studies of the remaining unassociated γ-ray sources (e.g., see Acero et al. 2013;D'Abrusco et al. 2019 for some discussions). The result of this work may provide more samples for studying the jet physics of on the population of HSP BL Lacs, or some clues for the planning of the main targets for rigorous analyses and multi-wavelength observational campaigns (e.g., Chiaro et al. 2019). The empirical candidates zone gives higher confidence results with higher probabilities for P Bi (see Table 4 in paper I) that a BCU i belongs to BL Lacs (B) classes. This can provide the observer with guidance on the selection of the observation target within the limited observation resources (e.g., observation equipment, time). However, the empirical method may still cause misjudgments in identifying the potential (optical) classification of blazars. The optical spectroscopic observations remains the most efficient and accurate way to determine the real nature of these sources. For the 120 predicted BL Lac candidates using the "a i < X candidate zone" in the work, we also test the independence between the known classification 414 FSRQs using the two sample test. The distributions of Γ ph , VI and F R between the 414 FSRQs and the 120 identified BL Lac candidates groups are significantly different. The two-sample Kolmogorov−Smirnov test gives D = 0.725, and the p-value p 1 = 0 ; the Welch Two Sample t-test gives t = 38, df = 926, and the p-value p 2 < 1.0E-6; while the Wilcoxon rank sum test with a continuity correction gives W = 416470 and the p-value p 3 < 1.0E-6 for all 3 of the parameters (see Table 1). For other (one or two) parameter combinations, the test results are also reported in Table 1. Which indicates that there is a strong separation between the 120 predicted BL Lac candidates and the known classification 414 FSRQ, which further verifies our results from another perspective.
We should note that, in Figure 1 (right column), if only two premises should be satisfied simultaneously, it would be that more sources can be selected as possible BL Lac candidates. For example, considering the lower, middle, and upper panels of Figure 1, there are an extra 57 BCUs with a misjudged rate (a probability that FSRQs are misclassified as BL Lacs) η = 10/414 ≃ 2.415% (see Table 5) in the range (Γ ph < 2.187 and logF R < 2.258) in the Γ ph − logF R panel (the lower panel of right column in Figure 1). There are an extra 22 BCUs with a misjudged rate η = 14/414 ≃ 3.382% (see Table 5) in the range (Γ ph < 2.187 and logVI < 1.702) in the Γ ph − logVI panel (the middle panel of right column in Figure 1). There are an extra 55 BCUs (obtained easily from the the 3LAC Website version) with a misjudged rate η = 10/414 ≃ 2.415% (see Table 5) in the range (logVI < 1.702 and logF R < 2.258) in the logF R − logVI panel (the upper panel of right column in Figure 1). These sources (57, 22 and 55) have a larger misjudged rate (η > 2.4%); although we did not conclusively evaluate their potential classifications (FSRQs and BL Lacs), it may be helpful for source selection in the spectroscopic observation campaigns in the future to further diagnose their optical classifications (see e.g., Yi et al. 2017;Massaro et al. 2013 for the some discussions). In addition, if only one parameter is considered, a bigger misjudged error is introduced (see Table 5). Whether these 3 parameters (Γ ph , logVI and logF R ) are the optimum combination of parameters needs to be further tested.
In addition, it must be highlighted that, in this work, the selection effects should be cautious (e. g., sample and method. see Kang et al. 2018Kang et al. , 2019a for the detail discussions), which may affect the source distributions and the results of the analysis. However, this work provides a simple direct method to distinguish the BL Lacs from the BCUs based on the direct observational data. As the expansion of the sample, whether the proposed analysis (a i < X candidate zone) in this work is always robust and effective, that uses a large and complete sample (e.g., the upcoming 4LAC) is needed to further test and address the issue.

SUMMARY
In this work, we proposed an analysis to evaluate the potential optical classification of BCUs. Based on the 3LAC Clean Sample, we collect 1418 Fermi blazars with 3 parameters of photon spectral index, variability index, and radio flux. We study the distributions of the FSRQs and BL Lacs based on the scatterplots of these 3 parameters. We find that there are almost no FSRQs falling in a range: Γ ph < a 1 , logF R < a 2 and logVI < a 3 for these 3 parameters. However some BL Lacs lie in the zone (a i < X). Therefore, we suggest that it may be an invalid zone for FSRQs, but may be a candidate zone for BL Lacs (called "a i < X candidate zone" for BL Lacs). Using one-sample normal distribution tests for the Γ ph , VI, and F R of the FSRQs, which show that these three variables have normal distributions. We assume that the lowest one-sided confidence interval values are treated as the boundary values a i of these three parameters. In the unilateral 1σ confidence level, a 1 = 2.187, a 2 = 2.258 and a 3 = 1.702 are calculated. Assuming Γ ph < 2.187, logF R < 2.258 and logVI < 1.702 are satisfied simultaneously, we apply the "a i < X candidate zone" to the BCUs, and then obtain 120 potential BL Lac candidates. We compared the 120 potential BL Lac candidates with some other recent (statistical) results, and find that almost all of the results are consistent with the results that have been identified as BL Lacs in SML (or other statistical) methods. We also compared the 120 potential BL Lac candidates with other spectroscopic certification results, and find most of the 120 (74) sources have been identified as BL Lacs by spectroscopic observations (see Table 3 and Table 4). Therefore, we suggest that the empirical candidates zone (a i < X) may be a good criterion (high-confidence) for evaluating BL Lacs candidates only based on the direct observational data of Γ ph , VI and F R . Although the proposed approach only identifies a part of BL Lac candidates in the BCUs, not the majority. The results are stable and with a higher degree of confidence.