Dublin Anti-Bullying Self-Efficacy Scales: Bifactor and Item Response Theory Models

Dublin Anti-Bullying Self-Efficacy Scales aim to measure the effectiveness of school anti-bullying programs in promoting five steps that victims and bystanders take against online and offline bullying behaviors. These steps are anti-bullying self-efficacy beliefs to recognize bullying behaviors, comprehend emergency, take responsibility, know what to do, and intervene. However, when an anti-bullying program is very effective for the majority of participants who give high scores, a considerable number of participants who give low scores are very likely to be detected as outliers. This raises two measurement issues. First, high scores create highly negatively skewed data and lead to measuring a unidimensional rather than multidimensional construct. This could be one reason why recent research has been unclear about the extent to which the scales measure a unidimensional, multidimensional, or bifactor construct. Second, should outliers be removed or be considered as participants for whom the program was ineffective? If the scales had measurement invariance across the group of outliers and non-outliers or low and high self-efficacy, it could be concluded that the anti-bullying program was ineffective for some participants. The current research aims to address these issues by testing both measurement invariance as well as unidimensional and bifactor models of anti-bullying self-efficacy. Results of Pure Exploratory Bifactor (PEBI) Analyses and Item Response Theory (IRT) with Two-Parameter-Logistic (2PL) Models of data from a convenience sample of 14-year-old students in Ireland (N = 1,222) indicated sufficient psychometric properties of both unidimensional and multidimensional scales for victim offline, victim online, bystander offline, and bystander online. Further research can use these scales for measuring the bifactor model of anti-bullying self-efficacy as well as the cut-off score for distinguishing between low and high anti-bullying self-efficacy.

Online/offline aggressive behavior is defined as bullying when it (a) happens within a societal context, (b) causes physical, emotional, and/or indirect harm to the targeted person, and (c) depends on an imbalance of power that results from social/school/institutional norms or systems (UNESCO, 2020). A growing consensus on effective ways to prevent or intervene in bullying behaviors is that anti-bullying programs should focus more on promoting victims and bystanders' anti-bullying self-efficacy beliefs (Sargioti et al., 2023). Antibullying self-efficacy refers to victims and bystanders' confidence in their own ability, as well as the ability of teachers, parents, social and school environments (norms, systems, policies), to tackle online/offline bullying behaviours (Kuldas & Foody, 2022;Sargioti et al., 2023). For example, when bullied students have no confidence in teacher efficacy and attitude (e.g., believing that the school teacher will make the situation worse, not care, or take no action to prevent or intervene in bullying), they are unwilling to ask for help or disclose victimization (Mazzone et al., 2021).
However, the consensus has been falling short of theoretical and empirical evidence by lacking an anti-bullying self-efficacy framework and measurement scale (Sargioti et al., 2023). A recent literature review (Sargioti et al., 2023) found only one study (see Andreou et al., 2007) addressing the need for a framework and scale to assess the effectiveness of an anti-bullying program in terms of both victim and bystander's self-efficacy, while other studies only focused on either victim's self-efficacy (see Salimi et al., 2021) or bystander's self-efficacy (see Knauf et al., 2018;Thornberg et al., 2017). Only recently, Sargioti and colleagues proposed both (a) Anti-Bullying Self-Efficacy Scales and (b) an Anti-Bullying Self-Efficacy Theory, which is a synthesis of the Participant Role Approach (Salmivalli et al., 1996) and Bystander Intervention Model (Latané & Darley, 1970). According to the proposed theory (Sargioti et al., 2023), anti-bullying self-efficacy is a mixture of individual and social capacity, process, and outcome of person-environment (student-teacher, child-parent, or peer-to-peer) transactions. For example, when victims have a caring and supportive teacher/parent/friend, they can demonstrate self-efficacy in tackling bullying behaviors (Kuldas & Foody, 2022). The theory hereby is not suggesting a trait-conception of self-efficacy as Bandura (1997) defined (the belief in individual ability to carry out a specific behavior in a successful way), because it lacks the account of social-ecological effects on the individual's anti-bullying self-efficacy (Sargioti et al., 2023).
Unlike the trait-conception, the anti-bullying self-efficacy theory (Sargioti et al., 2023) provides a multidimensional conception and measurement scale for the identification of five steps that victims and bystanders take to intervene in online/offline bullying behaviors. These five steps are defined as dimensions of victim and bystander's self-efficacy to: (a) recognize online/ offline bullying behaviors, (b) comprehend the need for emergency intervention, (c) take responsibility for the intervention, (b) know what to do, and (e) intervene (Sargioti et al., 2023). The effectiveness of an anti-bullying program could be measured by the extent to which it has promoted anti-bullying self-efficacy beliefs pertaining to each step (Sargioti et al., 2023). However, when a school anti-bullying program is very effective, its participants give very high scores on a scale measuring the effectiveness (see Sargioti et al., 2023). In this case, the proposed scales can be misleading due to the three chief measurement issues below.
First, school anti-bullying programs are mostly focused on the rise/fall of bullying/victimization rates as a measure of their effectiveness, which is based on students' self-reports as perpetrator, victim, or bystander after the implementation (Sargioti et al., 2023). A school anti-bullying program, which is usually focused on raising awareness, could be considered ineffective if students reported more incidents after the program (O'Moore & Minton, 2005). However, the higher rate can be a result of raised awareness about bullying behaviors rather than an actual increase in bullying incidents. Hence, the higher rate does not mean ineffectiveness but effectiveness of the antibullying program (Sargioti et al., 2023).
Second, the lowest rate also does not necessarily mean ineffectiveness of an anti-bullying program, because it could also be only for one dimension of anti-bullying self-efficacy beliefs, such as recognition or knowledge. This raises the issue of whether anti-bullying self-efficacy is measurable as a multidimensional or unidimensional construct. The extent to which the new scales allow for measuring each dimension alone and the general factor has remained unclear, mainly due to the lack of evidence for the dimensionality of anti-bullying self-efficacy construct. Recent results of four separate exploratory factor analyses (EFA) of the scales (Sargioti et al., 2023) displayed both: (a) the eigenvalue value of the first factor (e.g., 9.36 for the victim offline scale) was at least three times higher than the other factors in each scale (e.g., 2.84 for the second factor of the victim offline scale), and (b) at least three factors in each scale had considerable inter-factor correlations, ranging from .50 to .67. These results, as the ratio of the first to second eigenvalue >3.0 (Reise et al., 2010) and inter-factor correlations >.50 (Lorenzo-Seva & Ferrando, 2019a) or >.60 (García-Garzón et al., 2020), could be sufficient evidence for the co-existence of the general factor and sub-factors; therefore, the hypothesis of a bifactor model fit is appropriate to test (Reise et al., 2010). In addition, given that an accurate evaluation of school antibullying programs requires to account for students' anti-bullying self-efficacy beliefs as both the general factor and sub-dimensions, a bifactor model of anti-bullying self-efficacy beliefs can be tested.
Third, the highest scores also stand for highly negatively skewed data. Statistical analysis of such non-normally distributed data is very likely to identify some participants as outliers who gave the lowest score on a scale for a single dimension (i.e., univariate outliers) or multiple dimensions (i.e., multivariate outliers) of a latent construct (Finch, 2012). In this case, does the lowest score mean that (a) the anti-bullying program was ineffective for some participants or (b) the measurement scale failed to distinguish between scores for low and high anti-bullying self-efficacy beliefs (i.e., lacking measurement invariance)? If the self-efficacy scale measured the same construct across outliers and non-outliers, it could be concluded that the anti-bullying program was ineffective for some participants. To test this hypothesis, a Two-Parameter-Logistic (2PL) Model of Item Response Theory (IRT) could be conducted to test item discrimination parameters and measurement invariance. The present paper presents statistical tests and results of unidimensional, bifactor, and IRT-2PL models.

The Present Study
On the basis of the anti-bullying self-efficacy theory (Sargioti et al., 2023), the present research aims to address the abovementioned measurement issues by testing the unidimensional and bifactor models as well as the measurement invariance of item-responses to the four separate anti-bullying self-efficacy scales for victim offline, victim online, bystander offline, and bystander online. Although almost two decades ago, O' Moore and Minton (2005) drew attention to that school anti-bullying programs in Ireland need to focus on the enhancement and measurement of victim and bystander's self-efficacy beliefs, research to address this need is still nascent. The effectiveness of school anti-bullying programs in Ireland is generally evaluated in terms of the prevalence rates of targets/perpetrators of bullying behaviors, raising awareness, anti-bullying policies, and a positive school climate (Foody et al., 2018). Such an evaluation leaves unclear the extent to which victim and bystander's self-efficacy beliefs are effective in the prevention and/or intervention of online/offline bullying behaviors (Salimi et al., 2021). However, "there is a scarcity of published research on the measurement of both victim and bystander's self-efficacy in bullying situations across countries, including Ireland" (Sargioti et al., 2023, p. 8). Research is needed for further validation of the new scales measuring the five steps of victim and bystander's self-efficacy in tackling both offline and online bullying behavior (Sargioti et al., 2023).
Hence, the novel contribution of the present research ensues from both testing (a) the dimensionality of the anti-bullying self-efficacy construct and (b) measurement invariance, distinguishing between scores for low and high anti-bullying self-efficacy beliefs. The research hereby provides insights into the following questions: • • Is anti-bullying self-efficacy a bifactor construct?
• • Do the anti-bullying scales distinguish between scores for low and high self-efficacy beliefs or measure the same construct across outliers and non-outliers?

Procedures
This research with a cross-sectional design is part of a wider anti-bullying program, implemented in post-primary schools in Ireland, the outline and results of which are not the main focus of the present study. Among all the invited post-primary schools in Ireland (N = 730), 355 expressed their interest in implementing it, but only 197 fully implemented it (October 2021-June 2022). Participating students were invited to complete an online survey about their self-efficacy in tackling online/offline bullying after the implementation of the program (Spring 2022). The survey link, along with instructions and consent forms, was sent to students and their parents via email. The ethics committee of the authors' university granted ethical approval prior to the distribution of the survey and the program implementation.

Participants and Settings
Participants were a convenience sample of 1,222 post-primary school students (14-year-old)

Measures of Anti-bullying self-efficacy beliefs
The single-unidimensional and bifactor modelling of anti-bullying selfefficacy beliefs was based on item-responses (ranging from 5 -Very to 0 -Not at all) to the Dublin Anti-Bullying Self-Efficacy Scale for victim offline (20-item), victim online (20-item), bystander offline (20-item), and bystander online (20-item), developed by Sargioti et al. (2023). Each scale comprises five subscales: recognition (4-item), emergency comprehension (4-item), responsibility (4-item), knowledge (4-item), and intervention (4-item). Each subscale started with the statement "The Anti-Bullying programme has increased my confidence in my ability. . ." to recognize bullying behaviors, to comprehend emergency for intervention, to take responsibility, to know what to do, and to intervene (

Statistical Assumptions and Data Analyses
The current research has conducted three main multivariate statistical analyses, testing unidimensional factor, bifactor, and IRT-2PL models. The unidimensional factor models were tested with four separate EFA, whereas bifactor models were tested with four separate Pure Exploratory Bifactor (PEBI) analyses using Robust Diagonally Weighted Least Squares (RDWLS), Promin rotation, closeness to unidimensionality test, and goodness-of-fit indices as implemented in the FACTOR program (Lorenzo-Seva & Ferrando, 2019a). To improve the overall accuracy of unidimensional EFA and PEBI results by correcting for bias and skewness in the distribution of bootstrap estimates (Zhang & Browne, 2006), a bias-corrected and accelerated bootstrap with 500 samples and 95% confidence intervals (CIs) was also computed. Following the Bifactor analyses, the IRT with a 2PL model (i.e., item difficulty and discrimination parameters) was conducted to test construct validity of item-responses to the four anti-bullying self-efficacy scales, to determine a cut-off score for low and high anti-bullying selfefficacy, and to test if outliers were statistically not representative of the research population. An initial item analysis, check of missing data, detection of outliers, and normality test (Boxplot) were conducted using IBM SPSS (IBM Corporation, Statistical Package for Social Sciences, Version 27). The FACTOR program, Version 12.01 (Ferrando & Lorenzo-Seva, 2017) was used for the EFA and PEBI. For the IRT-2PL model testing, Stata Statistical Software-Release 17 (StataCorp, 2021) was used. A Microsoft excel-based tool (Dueber, 2017) and BifactorCalc online software (Ventura-León et al., 2021) were also used as calculators of bifactor dimensionality indices, especially for estimating Omega-Hierarchical (ω H ) and Omega-Hierarchical-Subscale (ω HS ) coefficients, and the Percent of Uncontaminated Correlations (PUC).
Before performing the statistical methods, their assumptions were tested. This was followed by assessments of criteria for factorability, factor analysisrotation-extraction-retention, factor reliability, factor dimensionality, and robust goodness-of-fit statistics. The following subsections present statistical assumptions/indices and reasons for why the three multivariate statistical methods were chosen.
Outlier detection method: regression factor scores. A dataset is suitable for EFA when it satisfies statistical assumptions for handling missing data, outliers, normality, and multicollinearity (Tabachnick & Fidell, 2013). In particular, because outliers bias the sample mean score and can inflate an inter-factor correlation value (Brown, 2006), they should be removed prior to EFA when there is a theoretical reason (Field, 2013). However, given that outliers are peculiar to each study using EFA, a standardized score for outlier detection may facilitate the replicability of findings. One standardized method is to weight item scores according to their relationships to each factor and, thereby, create factor scores of each case/participant that can subsequently be included in further analysis (Watkins, 2021). This method is commonly applied by using regression factor scores that indicate the location of each case/participant's relative standing on a latent common factor (DiStefano et al., 2009). Factor scores range approximately between −3.0 and +3.0, indicating standard deviation (SD) values below and above the mean (DiStefano et al., 2009). To use a specific range of regression factor scores could allow further research to test replicability of the same number of factors. To this aim, the present research used a regression factor score of ≥−2.0 (i.e., two SD below the mean), as it was the cut-off point inflating inter-factor correlations in the current dataset, and compared it with univariate outliers detected through Boxplot as well as with multivariate outliers detected through Mahalanobis' (D 2 ) distance (Field, 2013). Outliers that appeared on two tests were not included in EFA and PEBI but used as a criterion binary variable for testing measurement invariance to distinguish between the two fundamental issues as to whether the lowest score of some participants meant that the anti-bullying program was ineffective for them (or the anti-bullying scale could not differentiate between scores for low and high self-efficacy).
Factorability criteria. Factorability of item-responses to each scale was based on basic criteria for inter-item correlation (>.30 but <.90), strong pairwise correlation (for two and more items), and an adequate sample size (Tabachnick & Fidell, 2013). A Measure of Sampling Adequacy (MSA) was used first, which suggests removing an item <0.50 not measuring the same domain as the remaining items in the pool (Lorenzo-Seva & Ferrando, 2021). Next, the strength and adequacy of pairwise correlations were estimated through Bartlett's (1954) test of sphericity (p < .05) and Kaiser-Meyer-Olkin (KMO) index > .50 (Tabachnick & Fidell, 2013). Last, the sample size was considered adequate if one common factor had four or more items with loading values of 0.60 (Jung et al., 2020) or each sub-factor explained a substantial proportion of variance with ω HS ≥ .30 (Smits et al., 2014).
Bifactor analysis method. PEBI was used for the current dataset. Unlike traditional bifactor models, PEBI allows sub-factors to be correlated, to set a specific bifactor model a priori (Lorenzo-Seva & Ferrando, 2019a), to estimate construct-relevant multidimensionality in a set of ordered-categorical itemresponses (Reise, 2012), and to produce unidimensionality estimates for IRT (Reise et al., 2013).
Factor rotation method. Given that the self-efficacy dimensions are theoretically correlated, the oblique rotation method used was Robust Promin (RB). RB produces pattern loading matrices that better approximate a simple unidimensional or bifactor structure (Lorenzo-Seva & Ferrando, 2019b).
Factor extraction method. RDWLS was used for the current dataset; it uses a polychoric correlation matrix, provides more accurate parameter estimates, and yields a robust model fit when a dataset lacks univariate and multivariate normal distributions (Mîndrilã, 2010). For three and more factors, RDWLS estimates factor loadings, standard errors, and factor correlations most precisely, closest to the true model (Mîndrilã, 2010).
Factor retention criteria. For the EFA and PEBI, six basic criteria were initially considered to retain a factor having a model fit index, three or more items, adequate item-loading, convergent validity, internal consistency, and construct reliability. First, the number of factors was estimated with a Schwarz Bayesian Information Criterion (BIC) dimensionality test, a theoretically suitable and more robust simplistic model (Neath & Cavanaugh, 2012) in the FACTOR program (Gibson et al., 2020). The smaller the BIC value, the more probable the statistical model is an accurate fit for the given data (Neath & Cavanaugh, 2012). Second, to reflect or identify one factor, the minimum number of items is three (Brown, 2006) or two in oblique rotation (Abad et al., 2017). However, four items are recommended to reflect factor content (Robertson, 2019). Third, a minimum value of 0.32 for item-loading is acceptable (Tabachnick & Fidell, 2013). A cross-loading of one item on a factor should have a value of 0.20 greater than all of its loadings on other factors (Cabrera-Nguyen, 2010). An item with loading <0.32 and cross-loading <0.20 should be removed. Similar item-loading values are applicable to a bifactor analysis. A bifactor model is considered unsuitable where (a) no item of a sub-factor shows loading value >0.20 on the general factor, (b) the general factor mainly shows either lower or higher loading values than the sub-factor loadings, (c) there is no theoretical justification, and (d) goodness-of-fit indices show a poor model fit (Lorenzo-Seva & Ferrando, 2019a). If an item-loading on a sub-factor is high (>.80), that item should have a smaller loading value on the general factor, or the reverse (Robertson, 2019). An item should not have equal or very high loadings on both general and sub-factor (Robertson, 2019). Zero crossloadings indicate a very accurate estimation of loadings on both general factor and sub-factors (Reise, 2012). Higher loadings (of most items) on the general factor than the sub-factor suggest using the general factor score is more appropriate (Robertson, 2019). Fourth, as a common measure to estimate convergent validity is the Average Variance Extracted (AVE > .50), the sum of the squared loadings divided by the number of indicators (Hair et al., 2014). The AVE from each unidimensional factor scale was estimated to assess convergent validity as one criterion for construct validity. Fifth and sixth were the following internal consistency and construct replicability criteria.
Factor reliability estimation methods. An initial estimation of internal consistency of unidimensional factors was based on the Omega Index (≥.80), the factorial loads rather than the number of items (McDonald, 1999). For bifactors, estimations of the internal consistency and relative strength of the general factor and sub-factors were respectively based on Omega-Hierarchical (ω H ) and Omega-Hierarchical-Subscale (ω HS ) coefficients (Reise et al., 2010;Rodriguez et al., 2016b). The ω H represents the proportion of variance in the total score that can be attributed to the general factor after accounting for all sub-dimensions (Reise et al., 2013). A value of ω H ≥ .50 is acceptable, but closer to .75 is preferred (Reise et al., 2013). Higher values ω H ≥ .80 indicates that the latent construct can be considered essentially unidimensional (Reise et al., 2010;Ventura-León et al., 2021). As to the ω HS , it stands for the proportion of systematic variance in a subscale data that can be uniquely attributed to a sub-dimension after accounting for the general factor (Rodriguez et al., 2016b). A cut-off score ω HS ≥ .30 is substantial, ≤.29 to −.20 is moderate, and ≤.19 is low (Smits et al., 2014) proportion of unique variance explained by a subscale (i.e., unique variance not explained by the general factor). Another estimation of internal consistency is the construct replicability of the general factor and sub-factors, measuring the extent to which a latent construct is reproducible (replicable) from its own indicators (Hancock & Mueller, 2001). Hancock's and Mueller's construct reliability index (H ≥ .70) was used; it indicates a well-defined latent construct, which is more likely to be stable or replicable across studies.
Factor dimensionality assessments: single factor versus bifactor. The unidimensionality assessments of both single factor and bifactor models were based on seven criteria: Unidimensional Congruence (UniCo), Item Unidimensional Congruence (I-Unico), Explained Common Variance (ECV), Item Explained Common Variance (I-ECV), Mean of Item Residual Absolute Loadings (MIREAL), Item Residual Absolute Loadings (I-REAL), and PUC. Values of UniCo and I-Unico > 0.95, ECV and I-ECV > 0.85, MIREAL and I-REAL < 0.300 suggest that the latent construct can be treated as essentially unidimensional (Ferrando & Lorenzo-Seva, 2018, 2019. A value of ECV < 0.70 and higher ω HS indicates a multidimensional model (Rodriguez et al., 2016a).
If unidimensionality is unclear (i.e., if exploratory common factor analysis suggests the presence of sub-dimensions), a comparison of PUC, ECV, and ω H values is needed (Reise et al., 2013). Although there is no consensus over a cut-off score for this comparison, there are two suggestions to treat a latent construct as unidimensional (Ventura-León et al., 2021). When PUC is <.70, ECV should be > .70 and ω H > .80 (Rodriguez et al., 2016a;2016b); or when PUC is <.80, ECV should be > .60 and ω H > .70 (Reise et al., 2013). When PUC is >.80, a unidimensional model is possible to consider, even if a bifactor model better fits the data (Ventura-León et al., 2021). PUC > .80 but not >.90 indicates that the size of unidimensionality is high but not to a severe degree to rule out multidimensionality, thereby qualifying a latent construct as a bifactor (Reise et al., 2013;Ventura-León et al., 2021).
When PUC is very high (>.90), "the parameter estimates in the unidimensional model are the same as the general factor in the bifactor model" (Reise, 2012, p. 688).
As to identify which items contribute more to the general factor than a sub-factor, the criterion reference was I-ECV > 0.85 that indicates an influence of the general factor on the item variance (Stucky & Edelen, 2014). I-ECV near 1 indicates that an item solely reflects the general factor (Ferrando & Lorenzo-Seva, 2018).
IRT-2PL model. In order to perform the IRT-2PL model, all the items were coded as binary, based on the item discrimination test of each point within the 6-point scale, until the common cut-off score (between 0 and 5, discriminating between low and high anti-bullying self-efficacy as the general and specific factor), was found and displayed by the Test Characteristic Curve (TCC). TCC illustrates individuals' latent characteristics based on their true scores on a measurement scale (Baker, 2001), thereby displaying the discriminating ability (cut-off points) for determining anti-bullying self-efficacy levels. The scale points of 0, 1, and 2 were re-coded as 0 for low self-efficacy, whereas 3, 4, and 5 were re-coded as 1 for high self-efficacy. To test measurement invariance across sample characteristics, a binary variable for detected outliers (0) and non-outliers (1) as well as for male (0) and female (1) samples was created.
Four assumptions for the IRT-2PL analyses of both general and specific factors were met. Assumption 1, unidimensionality was based on statistical criteria outlined in the FACTOR statistical program (Ferrando & Lorenzo-Seva, 2018). Assumption 2, local independence was based on coefficient Loevinger's H > 0.30, using the mokken package in R (van der Ark, 2010). Assumption 3, monotonicity was displayed on a graph with an S shape curve (Yang & Kao, 2014). Assumption 4, measurement invariance across groups of detected outliers and non-outliers as well as male and female samples was estimated through Lord's chi-squared method for Differential Item Functioning (DIF, Lord, 1980), as implemented in Stata Statistical Software-Release 17 (StataCorp, 2021).

Statistical Results
Statistical assumptions for EFA of item-responses to the self-efficacy scales for victim offline, victim online, bystander offline, and bystander online were met satisfactorily after removing missing data (4% cases with a missing value for any variable) via listwise deletion (Field, 2013) and multivariate/univariate outliers, which inflated inter-factor correlations and had lower scores (Mean < 2.1). The cases of outliers on the scale for victim offline (n = 181, 14.8%), victim online (n = 194, 15.9%), bystander offline (n = 161, 13.2%), and bystander online (n = 200, 14.4%) were not included in the EFA of each dataset (N = 1,222) respectively. The sample size was adequate, as the 20-item of each general factor had a minimum loading value of 0.60 (Jung et al., 2020) and the 4-item of each sub-factor had ω HS ≥ .30 (Smits et al., 2014).

Results for Exploratory Unidimensional Factor Models
Dimensionality. Assessments of the closeness to unidimensionality of the single factor model with 95% CI yielded sufficient values of UniCo and I-Unico > 0.95, ECV and I-ECV > 0.85, MIREAL and I-REAL < 0.300 for all the scales and respective items, except Item 1, 2, 3, and 4 (Ferrando & Lorenzo-Seva, 2018, 2019. Hence, anti-bullying self-efficacy was essentially a unidimensional construct on 16 items (from Item 5 to 20) of the scale for:  Given that Item 1, 2, 3, 4 appeared to be not unidimensional and the BIC dimensionality test also resulted in a five-factor multidimensional solution for the item-responses to the scale for victim offline (BIC = 967.28), victim online (BIC = 987.69), bystander offline (BIC = 1045.91), and bystander online (BIC = 1091.24), a bifactor model could be tested.

Results for Exploratory Bifactor Models
Loadings. Preliminary evidence for the bifactor model was observed by the item-loading values ranging from 0.38 to 0.67 on all the sub-factors and from 0.60 to 0.75 on all the general factors (Ferrando & Lorenzo-Seva, 2018). Factor loadings were significantly larger on the general factor than the sub-factors (see Table 1). Only the scale Item 1, 2, 3, 8, and 16 for victim offline; Item 1, 2, and 17 for victim online; Item 1, 2, and 13 for bystander offline; and Item 2, 4, and 15 for bystander online loaded on the respective sub-factors more than the general factor.
Dimensionality. The UniCo to I-Unico ≤ 0.95 and ECV to I-ECV ≤ 0.85 with 95% CI (Ferrando & Lorenzo-Seva, 2018, 2019 suggested that the latent construct could not be treated as essentially unidimensional. The highest value of ECV ≤ 0.64 by the general factor and the I-ECV ≤ 0.76 suggested that all the four scale data for anti-bullying self-efficacy were sufficiently multidimensional to warrant a bifactor model. Items having an I-ECV < 0.85 were considered measuring the respective sub-factor more than the general factor (Stucky & Edelen, 2014).
However, all the four PUC values were = .84 > .80 (Reise et al., 2013), 84% of the inter-item total correlations were uncontaminated by the multidimensionality. In other words, 84% of the common variance was explained by the general factor alone, whereas the rest 16% was explained by the multidimensionality (Ventura-León et al., 2021). The high percentage of unidimensionality and low percentage of multidimensionality could still be considered sufficient to qualify the anti-bullying self-efficacy construct as a bifactor on the scale for: Reliability. Estimates of ω H ≥ .80 indicated the general factor was the main source of variance in each scale (Reise et al., 2013). However, the values of ω HS ≥ .30 (Smits et al., 2014) and H index ≥ .70 (Hancock & Mueller, 2001) indicated that anti-bullying self-efficacy could be treated as a sufficiently defined bifactor construct that would be replicable for:  and 95% Confidence Intervals (CI). Single and general factor loadings (λ) were above .60.

Results for the IRT-2PL Model
The PEBI results indicated that the bifactor model was unidimensional enough for testing as an IRT model. As shown in Table 2, results of the IRT-2PL models indicated that discrimination parameters of the 20-item for each general factor and 4-item for each specific factor were significant with the acceptable cut-off points (α > .05, z > 1.96). The plots of TCC also displayed the cut-off score as 2.0 for each general and specific factor differentiating between low and high anti-bullying self-efficacy. Table 2 also shows no item had a significant DIF, indicating measurement invariance for the outlier and non-outlier groups. The results of the measurement invariance test for the gender group also showed similar non-significant DIF values but were not reported for the sake of brevity.

Discussion
The development and measurement of anti-bullying self-efficacy beliefs are central to the prevention and/or intervention of offline and online bullying behaviors. Therefore, the effectiveness of anti-bullying programs, which is generally assessed in terms of their contributions to students' awareness of bullying behaviors (Foody et al., 2018), also depends on the development and measurement of the anti-bullying self-efficacy beliefs of victims and bystanders (Sargioti et al., 2023). However, it was unclear the extent to which antibullying self-efficacy beliefs are developable and measurable as the general factor and one specific dimension. To address this issue, the current research has taken three main steps, testing the (a) unidimensional factor, (b) bifactor, and (c) IRT-2PL models. As the main result, the bifactor model fitted the data on the scale for victim offline, victim, online, bystander offline, and bystander online better than the unidimensional factor model. The bifactor model appeared to be sufficiently fitting the data on each scale with 20 items for anti-bullying self-efficacy as the general factor with five dimensions, which are recognition, emergency comprehension, responsibility, knowledge, and intervention. The research has hereby proposed a bifactor model of anti-bullying self-efficacy beliefs and sufficient evidence for the psychometric properties of the scales, with the aim of facilitating further research on the accurate evaluation of an anti-bullying program in terms of its contributions to antibullying self-efficacy beliefs as the general factor and one specific dimension.
Although the results, particularly ω H > .80, indicated that the data could be essentially unidimensional, multidimensionality could still be considered due to five essential reasons. First, the value of ω H could be high even if the data was clearly multidimensional, particularly when the number of items was large (Reise et al., 2013). Second, the values of ω HS ≥ .30 (Smits et al., 2014) and H index ≥ .70 (Hancock & Mueller, 2001) for subscales indicated a substantial proportion of explained variance by sub-factors, which are likely to be replicable in further research. Third, a unidimensional factor solution is expected when an anti-bullying program is very effective; participants give very high scores on each scale, reducing substantial variance in item-responses. Fourth, robust goodness-of-fit indices with 95% CI indicated a strong bifactor model fit. Five, IRT-2PL model test yielded a cut-off score for low and high anti-bullying self-efficacy beliefs as the general factor and for each specific dimension, thereby allowing for measuring the extent to which the anti-bullying program was ineffective for some participants. These statistical reasons suggest that the scales can help researchers assess adolescents' anti-bullying self-efficacy beliefs as the general factor and one specific dimension in tackling online/offline bullying behaviors as both victims and bystanders.

Limitations
Although the present research did not address all the limitations mentioned in the earlier research (see Sargioti et al., 2023), it presented a novel way to measure the effectiveness of anti-bullying programs. It provided sufficient psychometric evidence for a bifactor model of anti-bullying self-efficacy beliefs and satisfied the criteria for construct validity and measurement invariance across groups of outlier and non-outlier as well as male and female. However, the research provided no result of measurement invariance by age, ethnicity, sexual orientation, religion and/or socioeconomic status groups, thereby not allowing for an empirical conclusion about how such diversity affects the scale development and the creation of an effective intervention program. Therefore, the research has insufficiently addressed issues in diversity, leaving it unclear whether the anti-bullying self-efficacy scales allow for comparing these groups. To address this issue warrants further testing for measurement invariance.

Implications
The present findings have implications for a school anti-bullying policy and practice as well as for further research. The extent of victim and bystander's self-efficacy beliefs can be used as a measure of the effectiveness of a school anti-bullying policy and program across countries, including Ireland (Sargioti et al., 2023). To measure the extent to which an anti-bullying program is ineffective for some participants or whether outliers are statistically representative of the student population attending that program, the cut-off score of 2.0 can be used to differentiate between low and high anti-bullying self-efficacy beliefs. Below-average victims and bystanders are expected to give a maximum score of 2.0 for each general and specific factor, whereas those aboveaverage would have a mean score of 2.1 and above. The proposed bifactor model allows further research to assess adolescents' anti-bullying self-efficacy beliefs in tackling both offline and online bullying situations as the general factor and specific sub-factor. Further research can operationalize the anti-bullying self-efficacy concept as the general factor alone, or focus on one sub-dimension only (recognition, emergency comprehension, responsibility, knowledge, and intervention), or both at the same time. The research hereby provides theoretical and empirical steps forward in the anti-bullying literature as it allows measuring weaknesses and strengths of both specific and general anti-bullying self-efficacy beliefs.
One essential recommendation for further research concerns using outliers as one criterion variable for distinguishing between participants for whom the anti-bullying program was effective and ineffective. Outliers are very likely to be participants who give the lowest or very low scores on each scale, indicating the program ineffectiveness for them. Outliers can inflate an interfactor correlation value, as it happened in the current research. Therefore, if they are not removed, further factor analysis might not yield the five-factor solution. However, given that outliers are peculiar to a dataset for factor analysis, a standardized score for outlier detection/removal may facilitate the replicability of the five-factor model. One standardized method is to use regression factor scores as they show the location of each participant's relative standing on a latent factor (DiStefano et al., 2009). To use a specific range of regression factor scores in comparison with Boxplot results could allow further research to test replicability of the five-factor. The present research detected a regression factor score ≥ −2.0 (i.e., two SDs below the mean) as the cut-off point inflating inter-factor correlations in the current dataset, and compared it with Boxplot results, thereby excluded outliers on both tests. This cut-off score could be considered or tested in further research.

Conclusions
The current research made novel theoretical and empirical contributions to the anti-bullying literature by allowing to measure the weaknesses and strengths of both specific and general anti-bullying self-efficacy beliefs that might occur as a result of the entire anti-bullying program. The research proposed a bifactor model of anti-bullying self-efficacy beliefs and provided statistical evidence for the psychometric properties of the four scales measuring the bifactor structure. As the main implications for further research, victim and bystander's anti-bullying self-efficacy could be operationalized/ measured as either the general or specific factor alone (i.e., recognition, emergency comprehension, responsibility, knowledge, and intervention), or both at the same time.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 101026567. The data collection was undertaken under the FUSE Anti-Bullying and Online Safety Programme in Ireland and was funded by Meta, Rethink Ireland, and the Department of Education, Government of Ireland.

Ethics Approval
This study was approved by the Research Ethics Committee of the Dublin City University (Ethics approval number: DCUREC/2020/126).