Re-examining the youth program quality survey as a tool to assess quality within youth programming

Abstract The Youth Program Quality Survey is a 24-item measure of program quality designed to evaluate participant perceptions of experiences in short- and long-term youth programs. The Youth Program Quality Survey was developed based on the National Research Council and Institute of Medicine’s eight program setting features that can contribute to the positive development of youth. This measurement tool is quite new within the field and as such little research has been conducted to determine its validity and reliability. The current study is designed to examine three previously proposed factor structures with a sample of 391 youth between the ages of 10 and 18 who participated in 38 different youth programs (e.g. sport, leadership, in-school-mentoring). Confirmatory factor analysis results indicate model fit issues with all three proposed factor structures. Therefore, an exploratory factor analysis is performed to improve model fit, and a revised 4-factor 19-item model is proposed. An analysis of invariance by age shows that the measurement model did not vary between younger and older youth participants. Practical implications and areas of future research are offered.


PUBLIC INTEREST STATEMENT
The Youth Program Quality Survey is a measure of program quality designed to evaluate youth perceptions of experiences in short-and long-term programs. The survey was developed based on the National Research Council and Institute of Medicine's eight proposed setting features of program quality that aim to facilitate positive developmental outcomes. The current study provided a revised survey with sound psychometric properties that can be used by both researchers and programmers with youth from 10 to 18 years of age in various types of youth programming. Practical implications and areas of future research are offered.

Introduction
Each year, youth development programs serve millions of youth across North America and internationally (Eccles & Gootman, 2002;Mahoney, Larson, Eccles, & Lord, 2005;National Collaboration for Youth, 2011). Such programs have proved to be important with youth across North America, as research has indicated that more than 86% of youth in Canada and 57% of youth in the United States participate in at least one organized out-of-school time activity (Guèvremont, Findlay, & Kohen, 2014;United States Census Buereau, 2014). National youth-serving organizations (e.g. Big Brothers Big Sister, Boys and Girls Club, Boys and Girls Clubs of Canada, YMCA, YWCA) consistently offer the most out-of-school time youth development programs (Boys & Girls Clubs of Canada, 2008;Quinn, 1999;The Bridgespan Group, 2005), with the context of sport identified as the most popular type of program for youth to spend their time (Guèvremont, Findlay, & Kohen, 2008;Martelaer & Theeboom, 2006;United States Census Buereau, 2014). Research has shown that youth experience more positive developmental outcomes (e.g. increased relationships with adults and peers, life skills development, levels of engagement) when participating in organized out-of-school time activities compared to passive leisure activities (e.g. spending time with friends, watching TV, reading, listening to music; Barber, Eccles, & Stone, 2001;Benson, Scales, & Syvertsen, 2011;Eccles & Gootman, 2002;Employment & Social Development Canada, 2014). Therefore, because of the potential impact such programs can have on youth participants, it is critical to deliver high-quality programs for these youth.
Researchers argue that program quality is the best predictor of not only risk prevention, but also positive developmental outcomes, which are two predominant foci of youth programs (Catalano, Hawkins, Berglund, Pollard, & Arthur, 2002;Durlak, Mahoney, Bohnert, & Parente, 2010;Roth & Brooks-Gunn, 2003;Yohalem & Wilson-Ahlstrom, 2010). The Committee on Community-Level Programs for Youth within the National Research Council and Institute of Medicine (NRCIM) responded to the identified gap in which very little research exists outlining what programs can do to facilitate development. As such, research was compiled from the developmental science literature over the course of two decades and eight contextual features were identified as the most likely to promote positive developmental outcomes within youth programming (Eccles & Gootman, 2002). These elements include: (1) physical and psychological safety; (2) appropriate structure; (3) supportive relationships; (4) opportunities to belong; (5) positive social norms; (6) support for efficacy and mattering; (7) opportunities for skill-building; (8) integration of family, school, and community efforts. Some setting features have been identified as necessary components within youth programs, such as safety and structure, while others are considered higher order elements that may not be a focus within all programs, such as integration of family, school, and community efforts. Although each feature can contribute to development individually, there is often overlap between the features in which features work together to foster greater psychosocial outcomes (Eccles & Gootman, 2002). For example, programs that outline clear and consistent rules and expectations for youth demonstrate appropriate structure, yet these rules may also help to foster a physically and psychologically safe environment. Several researchers argue the more of the eight setting features that a program has, the greater the influence it will have on the positive development of youth (McLaughlin, 2000;Merry, 2000).
A number of common features, similar to those proposed by the NRCIM, have been outlined within the literature that can contribute to program quality (e.g. Durlak, Weissberg, & Pachan, 2010;Lerner, 2004;Roth & Brooks-Gunn, 2003). However, the program quality setting features proposed by the NRCIM have been the most widely used and researched. The Youth Program Quality Survey (YPQS; Silliman & Schumm, 2013), the measure of focus in this research, was developed based on the NRCIM's eight features of program quality outlined above that have been proposed to facilitate positive outcomes in youth programming. Holt and Jones (2008) outlined that there are few valid and reliable measures for youth developmental outcomes within program evaluation and even fewer measures available to understand the processes of program implementation. To date, the YPQS is the only known measure to utilize the youth perspective in assessing program quality. Other measures do exist to examine program quality from an external evaluator or a program leader perspective (e.g. Youth Program Quality Assessment by High/Scope Educational Research Foundation, 2005; Program Quality Observation Scale by Vandell & Pierce, 2006), yet studies that assess program quality tend to be based on adult rather than youth perceptions of program quality (Urban, 2008;Yohalem & Wilson-Ahlstrom, 2010). Eccles and Gootman (2002) argue that the eight setting features need to fit well with the youth participants, as the features exist as an interaction with the program setting and are not independent from one another. Therefore, it is also critical to obtain youth participants' perspectives as youth voice is critical in program evaluation (Camino & Zeldin, 2002;Hamilton, Hamilton, & Pittman, 2004;Powers & Tiffany, 2006;Walker, 2007). A limitation of the current field is that there are few measures that attain the youth perspective regarding the quality of experiences within programs (Vandell, Larson, Mahoney, & Watts, 2015). As such, it is necessary to develop a structurally sound measure of program quality that provides youth voice in the evaluation process.
Previously referred to as the Youth Program Climate Survey (Silliman, 2008a(Silliman, , 2008b(Silliman, , 2008cSilliman & Shutt, 2010), the YPQS (Silliman & Schumm, 2013) is a measure that aims to examine youth perceptions of program contexts with the goal of understanding if a program provides a climate that promotes positive youth development (Silliman, 2008b). The survey was developed to provide a "reliable, research-based, user-friendly tool to help youth leaders incorporate youth voice in assessment of the features in the process of program development, monitoring, and improvement" (Silliman & Schumm, 2013, p. 648). As the YPQS was developed based on the NRCIM's eight program setting features, the original measure consisted of eight subscales including: Safety, Support, Social Norms, Social Inclusion, Skill-building Opportunities, Self-efficacy, Structure, and Synergy with Family and Community. Three items were developed for each of the eight contextual features that have been suggested to promote positive youth development (Eccles & Gootman, 2002), resulting in a 24-item measure. This measure was used for all youth regardless of age.
Prior studies using the YPQS have indicated moderate to high instrument reliability with Cronbach's alphas (Cronbach, 1951) ranging from .60 to .96 when examining the total scale and individual factors (Silliman, 2008a;Silliman & Schumm, 2013;Silliman & Shutt, 2010). Previous research has also found the measure reliable with both child and teen samples (Silliman, 2008b;Silliman & Shutt, 2010). Moreover, Silliman and Shutt (2010) indicated few differences when examining the measure by age, gender, and race. However, more recently, in study conducted by Silliman and Schumm (2013), the authors divided the sample of youth participants into two categories based on previous research outlining that youth of different ages are at different developmental levels (Lerner, 2009). These two categories included younger youth between the ages of 10 and 13 and older youth between the ages of 14 and 17. From this, the authors conducted two exploratory factor analyses that resulted in the construction of two factor models (one for each age group). The factor model for younger youth (ages 10-13) outlined five subscales: (1) Positive Emotional Climate, (2) Empowered Skill-building, (3) Expanding Horizons, (4) Structure, and (5) Negative Experiences. The factor model for the older youth (ages 14-17) revealed similar, yet slightly modified subscales: (1) Empowered Skill-building, (2) Positive Values, (3) Expanding Horizons, (4) Adult Support, and (5) Negative Experiences. Therefore, based on Silliman and Schumm's (2013) findings, researchers utilizing this tool should distribute and analyze two different measures based on the age of participants within a program. This infers that for youth of different ages, different elements of program quality may be of importance. From this outlined research conducted with the YPQS, there have been three proposed factor models using the items from the YPQS identified within the literature to date (an eightfactor model for youth of all ages, a five-factor model for youth ages 10-13, and a five-factor model for youth ages 14-17). However, no further research has been conducted to test these models beyond their initial proposal as a result of exploratory factor analyses. Confirmatory factor analyses have not been conducted on any of the three proposed factor models and as a result there has been little advancement in validating the YPQS in the literature.
The previously conducted exploratory factor analyses outlined above were tested with what one may argue to be a homogenous sample in terms of the type of program context. All youth within these studies were part of 4-H camps and conferences that were of short duration, lasting between 3 and 7 days. As this survey was proposed to evaluate participant perceptions of experiences in a variety of youth programs including those of both short and long duration (Silliman & Schumm, 2013), more research is necessary to examine the YPQS with youth in various types of programs and durations. Moreover, the factor structures proposed by Silliman and Schumm were based on particular age ranges which may not be in fact ideal for two reasons. First, the NRCIM's eight setting features were proposed with the purpose of encompassing program quality for youth programs globally and not by age group (Eccles & Gootman, 2002). Previous research has found that features of highquality programs serving school-age children are also similar program quality features to those for adolescents , highlighting the importance of having one program quality measure across youth of all ages. The majority, if not all, of research-based measures that examine the processes and outcomes of youth development programs are not broken down by age (e.g. Positive Youth Development Inventory (PYDI) by Arnold, Nott, & Meinhold, 2012; Youth Experience Survey (YES) 2.0 by Hansen & Larson, 2005). Second, in practice, youth programs have various combinations of ages involved. For example, some programs may have youth from ages 11 to 16 whereas others may have youth from 12 to 14 years of age. Having factor structures based on age range would make using the YPQS, or any other program quality measure, difficult as the particular programs would have to align with age groups proposed. As such, it would be most appropriate to have a validated measurement tool with a strong factor structure that can be universally utilized across all youth program contexts, regardless of age, to evaluate program quality.
Therefore, the purpose of this study was to conduct confirmatory factor analyses on three factor structures previously proposed by (Silliman, 2008c;Silliman & Schumm, 2013;Silliman & Shutt, 2010) with a large sample of youth involved in programs with various contextual program features and varying degrees of duration. Validating the YPQS with such a population would provide an important evaluation tool for programmers and administrators to assess perceived program quality through the eyes of youth participants. A secondary purpose of this study was to examine potential invariance by age with respect to the emergent factor structure of the YPQS. If the analysis of invariance conducted on the measurement model does not vary by age, this would indicate that the model (elements of program quality) does not vary between younger and older youth. As such, the model would be appropriate for all youth between 10 and 18 years of age, which would also indicate the identified elements of program quality would be appropriate for all youth within this age range.

Context and participants
Three hundred and ninety-one youth (162 male, 229 female) who participated in 38 different programs within six youth organizations across south-eastern Ontario, Canada were involved in this study. Programs ranged from 6 weeks to 9 months in duration and were run between one and three times per week throughout its duration. Programs ran between 60 and 180 min in length (M = 110). Youth ranged in age from 10 to 18 years (M age = 14.09, SD = 2.16). The sample was distributed as follows: Caucasian (68.7%), black (6.9%), Aboriginal (2.1%), Arabic (6.6%), Asian (6.9%), multiracial (6.4%) and 2.4% of the sample did not disclose their ethnicity. Participation within a given program ranged from 6 to 29 youth, with the average program having 11.5 (SD = 4.7) youth involved. Youth who completed the questionnaire were identified by program leaders as regular participants (i.e. participating in the program 75% of the time or more). Table 1 outlines additional characteristics of programs involved within the study sample, including the organization and program types, as well as the type of participants involved in each program and within the six organizations.

Measures
The YPQS is a 24-item measure that examines participants' ratings of their experiences within youth programs (Silliman, 2008c;Silliman & Schumm, 2013;Silliman & Shutt, 2010). As mentioned above, this measure focuses on examining program quality from the perspective of youth using the eight contextual setting features of youth programs shown to promote positive youth development identified by the NRCIM as a guiding framework (Eccles & Gootman, 2002). From these eight setting features, the YPQS was originally designed with eight subscales including: Safety, Support, Social Norms, Social Inclusion, Skill-building Opportunities, Self-efficacy, Structure, and Synergy with Family and Community (Silliman, 2008c). Each subscale contained three items. Based on previous use of the scale, all items were answered on a four-point scale from 1 (strongly disagree) to 4 (strongly agree). With the current sample, Cronbach's alpha was conducted on the initial 8-factor, 24-item measure proposed by Silliman (2008c). Factors ranged from .11 to .74 indicating unacceptable to acceptable internal consistency, with the majority of factors falling within the poor to unacceptable range (below .70; George & Mallery, 2003).

Procedures
Following ethical approval from the affiliated institution's Office of Research Ethics and Integrity, the lead researcher contacted various community youth organizations in south-eastern Ontario. As this was part of a larger study exploring program quality in youth programming, information about this study was communicated to interested community programmers and directors. Six organizations agreed to participate in the study and a total of 38 programs were identified based on interest. The researcher met with each program leader, and parents if available, to explain the study. The researcher then provided the parental consent forms and assured confidentiality prior to data collection. Youth who attained parental consent completed a paper version of the questionnaire at the end of the program (within the last two weeks of the program) with supervision from the first author, which provided opportunities for youth to ask questions if needed.

Data analysis
Using the structural equation modeling AMOS 23 software program (Arbuckle, 2014), three confirmatory factor analyses were conducted with a Maximum Likelihood estimation method on the data to assess fit for the three proposed models previously outlined: (1) the 8 factor, 24-item measurement model for youth of all ages, (2) the 5-factor, 24-item measurement model proposed by Silliman and Schumm (2013) for youth ages 10-13, and (3) the 5-factor, 24-item measurement model proposed by Silliman and Schumm (2013) for youth ages 14-17. However, as the main goal of this study was to create one factor structure that fit well across youth of all ages, all three of the models outlined above were tested using our entire sample (youth 10-18 years old). A ratio of 15:1 was exceeded for subjects-to-items ratio indicating that recommended requirements were attained (Everitt, 1975;Pedhazur, 1997). Latent factors were allowed to correlate, uniquenesses were not free to correlate, and a path from each latent variable to one of its indicator variables was constrained by assigning the value of 1.0. According to Awang (2012), in order to ensure uni-dimensionality of a measurement model, any item with a loading factor lower than .50 should be deleted. Model fit was assessed using a combination of indices: Comparative Fit Index (CFI), Tucker Lewis Index (TLI), Standardized Root Mean Square Residual (SRMR), Root Mean Square Error of Approximation (RMSEA), Not-for-profit 17 (7)  197 Type of program Leadership 6 (2) 72 Mentoring 2 (1) 24 Sport 30 (5)  295 Competitive 20 (2)  194 Recreational 10 (3)  101 Type of participants All girls 16 (3)  159 All boys 6 (3) 59 Mixed gender (boys and girls) 16 (3) and the χ 2 statistic (Byrne, 2010;Hu & Bentler, 1998). Fit indices were deemed to indicate good model fit if: CFI and TLI > 0.90, SRMR < 0.08, and RMSEA < 0.05 (Byrne, 2010;Tabachnick & Fidell, 2013).

Results
Prior to analyses, data were screened for missing data, which indicated less than one percent of the data were missing. When less than 5% of data are missing, influences of missing data are negligible (Tabachnick & Fidell, 2013). Missing data were replaced with multiple imputations (Yuan, 2010). Data were then screened for outliers. No outliers were identified within the data-set. All variables were normally distributed and there were no instances of multi-collinearity (r > 0.90) throughout the data-set. Statistics for Variance Inflation Factor (VIF) and tolerance ranged from 1.074 to 2.008 which fell within the acceptable range (Hair, Black, Babin, & Anderson, 2010) for all variables.
As mentioned, three different models have been proposed within the literature: (1) one eightfactor model for youth of all ages, (2) one five-factor model for youth ages 10-13, and (3) one fivefactor model for youth ages 14-17. Confirmatory factor analyses were conducted on all three of these models to test for model fit.

Confirmatory factor analysis for eight-factor model
Summary statistics for the confirmatory factor analysis indicated some issues with the model fit: CFI = .901, TLI = .878, SRMR = .0545, RMSEA = .056 (90% CI = .049 − .062), χ 2 /df = 2.206, and χ 2 = 494.032, p > .0001. Although the model approached fitting the data adequately, there were several issues of collinearity between factors (e.g. .99) and low loading items to various factors (e.g. 8 items showed low loading <.50; Awang, 2012). Finally, diagnostics indicated it was not possible to retain a factor because it contained less than three items, as it has been identified that a factor with fewer than three items is generally weak and unstable (Costello & Osborne, 2011). Therefore, the factor and the remaining items were removed. Based on the outlined issues, it was deemed that the eight-factor structure was not suitable.

Confirmatory factor analysis for five-factor model based on youth sample (10-13 years old)
The proposed five-factor model identified by Silliman and Schumm (2013) for younger (10-13) youth was tested using our total sample to test the fit of the model. Summary statistics for the confirmatory factor analysis indicated some issues with model fit: CFI = .906 TLI = .893, SRMR = .0803, RMSEA = .052 (90% CI = .046 − .059), χ 2 /df = 2.059, and χ 2 = 498.184, p > .0001. Although there were no issues with collinearity between latent factors, there were four low loading items (<.50). After removing these items one at a time starting with the lowest loading item, the model fit improved (CFI = .925, TLI = .911, SRMR = .0466, RMSEA = .054 (90% CI = .046 − .062), χ 2 /df = 2.132, and χ 2 = 341.168, p > .0001); however, the model resulted in two factors with only two items each; thereby highlighting an unstable factor structure (Costello & Osborne, 2011). Therefore, it was deemed that this five-factor structure was not ideally suited for youth between the ages of 10 and 18 years because of the unstable structure.

Confirmatory factor analysis for five-factor model based on youth sample (14-17 years old)
The proposed five-factor model identified by Silliman and Schumm (2013) for older (14-17) youth participants was tested using our total sample to test the fit of the model. Prior to analysis, it was believed there may be issues with the previously outlined factor structure, as one factor within the model was comprised of only two items, violating the recommendation to have a minimum of three items loaded to one factor (Costello & Osborne, 2011). However, despite this, the model fit was tested. Results indicated that there were issues with the model fit: CFI = .887, TLI = .869, SRMR = .0678, RMSEA = .060 (90% CI = .053 − .067), χ 2 /df = 2.388, and χ 2 = 475.282, p > .0001. Moreover, there were multiple correlations between latent factors as three exceeded .90 (Kline, 2010), suggesting issues with multicollinearity. Merging of these three factors would have resulted in a three-factor structure comprised of a 15-item factor, a 2-item factor, and a 5-item factor. Moreover, there were several low loading items (e.g. .12) to various factors. As a result of the combination of issues identified, it was deemed that this structure was not a good fit to our sample of youth between the ages of 10 and 18 years.

Exploratory factor analysis
Based on the lack of support from the initial factor analyses of the three previously proposed structural models of the YPQS, an exploratory analysis was justified to modify the measurement model. The exploratory factor analysis, using an eigenvalue = 1.0 criterion with maximum likelihood factor analysis resulted in a five-factor model. However, this was minimized to a four-factor model as only one item had loaded to the fifth factor and did not load to any of the four other factors. Therefore, this item was removed. Attempts to extract more than five factors yielded minor factors with loadings on only one item. Extracted communalities ranged between .37 and .72. All but two items loaded on to at least one factor at <.50.
Factor analysis was conducted to test the results of the exploratory factor analysis for model fit. An iterative process guided by modification indices, parameter change estimates, and theoretical justifications was used to re-specify the 23-item measurement model (Garson, 2010;Kline, 2010). Each modification occurred one iteration at a time and parameter estimates were recalculated after each modification (Garson, 2010;Kline, 2010). We did not free cross-loadings; diagnostics indicating cross-loadings resulted in the subtraction of problematic items from the model one at a time. Items were considered for removal when standardized estimates fell below .50 (Awang, 2012). All scales in the re-specified model had factor loadings ranging from .51 to .72. A total of four items were removed during the re-specification process in addition to the one initial item removed in the previous identified step, resulting in a four-factor model with a total of 19 items. Results revealed a good model fit: CFI = .932, TLI = .920, SRMR = .0456, RMSEA = .037 (90% CI = .032 − .042), χ 2 /df = 1.861, and χ 2 = 543.42, p > .0001. Table 2 outlines the descriptive statistics for the individual items of the respecified model. There was only one instance of kurtosis, which was for an item identified as a negative perceived experience and therefore, was often rated a 1 on the four-point scale. Table 3 displays the descriptive statistics of the four final dimensions of the re-specified model and sample specific correlations between subscale scores for each pair of latent factors in the final retained measurement model; correlations were below .66. Building off Silliman and Schumm's (2013) work, four subscales were identified: Appropriate Adult Support and Structure, Empowered Skill-building, Expanding Horizons, and Negative Experiences. Subscale names were used or slightly modified based on these authors' previous classifications (Silliman & Schumm, 2013).
All eight of the NRCIM's program setting features proposed to foster positive youth development outcomes were represented in the revised 19-item, 4-factor model of the YPQS (see Table 4 for the breakdown by item corresponding to the NRCIM features). The first factor loaded heavily on five items related to Appropriate Adult Support and Structure and included items from three setting features outlined by the NRCIM, including Physical and Psychological Safety, Appropriate Structure, and Supportive Relationships (Eccles & Gootman, 2002). The second factor loaded primarily on eight items and was labeled Empowered Skill-building and contained items related to Opportunities for Skill-building, Positive Social Norms, Support for Efficacy and Mattering, and Integration of Family, School, and Community Efforts. The third factor, labeled Expanding Horizons, loaded on four items, including Opportunities to Belong, Positive Social Norms, and Integration of Family, School, and Community Efforts because the items referred to broadening views of and the world in context of family and community. Lastly, the fourth factor loaded heavily on three items that were all negatively worded. These items were consistent with NRCIM's three features including Physical and Psychological Safety, Opportunities to Belong, and Positive Social Norms.

Discussion
The current study was conducted to explore the factor structure of the previously established YPQS by Silliman and Schumm (2013). Three confirmatory factor analyses were conducted and did not confirm previously identified models of the YPQS. As such, an exploratory factor analysis was conducted outlining a four-factor, 19 item measure that showed good model fit, resulting in a revised form of the YPQS. McCune (1989) cautioned that factor analysis studies of the same instrument could yield different results depending on the samples used, therefore it was not surprising that the exploratory factor analysis conducted within this study revealed slightly different factors than results identified by Silliman and Schumm (2013). However, the exploratory factor analysis conducted in this study provides good support for the model as there was a great deal of consistency between the current work and Silliman and Schumm's (2013) work. The results revealed the elimination of some of the same items that were also removed by Silliman and Schumm (2013), highlighting consistently problematic items. From this, the current study was able to build on the previous work conducted by these authors. Lastly, the proposed 4-factor 19-item resulted in being invariant across age groups for youth participants within this sample, outlining the appropriateness of using this tool as a measure of perceived program quality for youth between the ages of 10 and 18. Eccles and Gootman (2002) have argued "the boundaries between features are often quite blurred" (p. 88), indicating an overlap between program setting features does occur within youth programming. Findings from previous research that utilized the eight program setting features as a guiding framework in measurement development, have outlined a similar notion, proposing other categorizations or subscales that encompass these features (e.g. H/SERF, 2005;Silliman & Schumm, 2013). For example, the Youth Program Quality Assessment, an observational assessment tool that can be used internally by program leaders and administrators or externally by researchers to examine quality within youth programming, has outlined four domains to assess (Safe Environment, Supportive Environment, Interaction, and Engagement; H/SERF, 2005). Based on previous work Opportunities to belong examining program quality, it can be argued that although the YPQS and Youth Program Quality Assessment differ slightly from one another, there is alignment as they follow a similar structure whereby physical and psychological safety and program structure are seen as essential and foundational elements, and once satisfied, higher ordered elements of program quality can then be facilitated within program (Akiva, 2005;Eccles & Gootman, 2002).
Within this study, Negative Experiences fell under the acceptable reliability coefficient level of .70, yet Cronbach's alpha is often expected to be lower when there are fewer items, as in the current study as this subscale is comprised of three items. Moreover, this subscale measures a wide range of constructs, including perceptions of negative feelings pertaining to belonging, conflicts, and embarrassment, which may also contribute to the low reliability of this subscale. However, it is important to recognize that this study, as well as previous research on program quality (e.g. Silliman & Schumm, 2013), have integrated a subscale that measures youth perceptions of negative experiences within programs. Although this has not been identified as a program setting feature by Eccles and Gootman (2002), it is important to take into consideration the potential for negative experiences within youth program contexts. The inclusion of a subscale that measures negative experiences is common among other measures of youth development (e.g. YES 2.0 by Hansen & Larson, 2005; YES for Sport by Sullivan, LaForge-MacKenzie, & Marini, 2015;Summer Program Experiences Survey by Vandell, Hall, O'Cadiz, & Karsch, 2012).
The current results extended previous work in several ways. The sample used within this study would arguably be identified as a more heterogeneous sample than what was used in previous measurement testing with the YPQS. Previous research conducted with the YPQS included a relatively homogenous sample of youth from 4-H conferences and camps. In contrast, this study recruited youth from 38 different programs across multiple program contexts; therefore, allowing from a more heterogeneous sample that more accurately reflects the general context of youth programming, which can include a variety of different characteristics (e.g. program length, program context, age and gender of participants, type of organization). The current study is the first to provide support that the factor structure of the YPQS holds true across programs of differing lengths, as previous samples used to test the YPQS ranged from 3 to 7 days, whereas this sample recruited participants between 6 weeks and 9 months in length.
The factor structures previously proposed by Silliman and Schumm (2013) were based on particular age ranges which may not be in fact ideal. Previously identified program quality characteristics, and more specifically the eight identified setting features proposed by the NRCIM, have provided such recommendations that encompass program quality for all youth programs not by age, outlining similar program features within both school-age and adolescent programs . Researchers that have proposed best practices for program quality have consistently identified global features as opposed to features-based age. Although the structure of the program may be adapted based on the age of participants (e.g. more emphasis on the structure for younger youth, yet more focus on expanding horizons for older youth), all elements are still critical to facilitate a highquality program. Findings from this research supported the need for one measurement tool with a strong factor structure that takes into account all eight setting features of program quality proposed by the NRCIM.

Recommendations
Although Silliman and Schumm (2013) identified two different measures based on the classification of youth aged 10 and 17 by age and classified both measures as the YPQS, our analyses took into consideration one large sample of youth between 10 to 18 years. The World Health Organization (1989) defines 'youth' as individuals between 15 and 24 years of age, which is not consistent with the samples utilized within previous work with Silliman (2008c), Silliman and Shutt (2010), Silliman and Schumm (2013) or within the current study. Therefore, it would be suggested to change the name of the measure from the YPQS to the Adolescent Program Quality Survey (APQS) to fall in line with the World Health Organization's (1989) definition of 'adolescence', categorizing individuals between 10 and 19 years. Moreover, it would be recommended for future testing and use of the new proposed APQS to utilize a five-point Likert scale ranging from 1 (Strongly disagree) to 5 (Strongly agree) instead of the four-point scale that has been previously used (Silliman, 2008c;Silliman & Schumm, 2013;Silliman & Shutt, 2010) to not only guard against kurtosis and ensure more normal distributions, but also allow for the neutral response and has been recommended for use with youth (Intelligent Measurement, 2007;Kline, 2010;Malhotra, 2006). These recommendations are suggested to be taken into consideration with the measure's items outlined in Table 2.

Limitations and future directions
The new proposed APQS has the potential to become a valuable tool for assessing program quality across a variety of youth programming contexts. It is important note that the factor structure presented in this study was determined through an exploratory process using data acquired from a single sample of 391 youth involved in various programming contexts. However, youth within this sample were from several different programming contexts (e.g. sport, leadership, in-school mentoring), yet sport programming was prominently represented in this sample. Future research should continue to test the new proposed APQS using confirmatory techniques with new samples of varying types of program contexts before claims of structural or external validity can be concretely made. Participants from this study were comprised of a diverse sample of youth; however, it is suggested that future research continues to validate the new proposed APQS with a variety of samples, including various youth program types (e.g. sport, leadership, arts), contexts (e.g. competitive vs. noncompetitive, team/group vs. individual, mixed-gender vs. single gender), and organization types (not-for-profit, for-profit), as well as programs that target youth of various ages, ethnicities, and from differing socioeconomic backgrounds. Moreover, it is suggested to integrate a measure of youth engagement (e.g. Snapshot Survey of Engagement tool; Busseri, Rose-Krasnor, & Centre of Excellence for Youth Engagement, 2009) or a measure of youth psychosocial outcomes (e.g. YES 2.0; Hansen & Larson, 2005;PYDI;Arnold et al., 2012) as dependent variables to understand if indicators of quality are related to program engagement and/or outcomes.
At this time, the YPQS should be considered as an assessment tool for program quality specifically related to Appropriate Adult Support and Structure, Empowered Skill-building, Expanding Horizons, and Negative Experiences, which integrates the eight setting features proposed by the NRCIM (Eccles & Gootman, 2002). The current study proposes one factor model for all youth between the ages of 10 and 18, while Silliman and Schumm proposed two factor structures based on youth between the ages of 10 to 13 and youth ages 14 to 17. As no previous research or quantitative process or outcome measures within youth programming has created separate measures based on age, it can be argued that utilizing a program quality measure for that can be delivered across all youth programming contexts regardless of age . However, future research is needed to further examine this. It should be noted that pilot work is underway that is continuing to examine the APQS within another large and diverse sample of youth involved in various youth programs. From an applied perspective, it is recommended that the proposed APQS may be used in conjunction with other measures (e.g. Youth Program Quality Assessment) to further triangulate program quality within youth programs. Lastly, it is possible that there are additional program quality elements that are not presented in the modified version of the APQS as research in this field is continually advancing. Although the eight setting features from NRCIM are represented, future research is warranted to examine if items need to be added to the APQS to broaden the scope of program quality assessment.

Conclusion
In conclusion, the new proposed APQS offers good psychometric properties and retains the conceptualization of program quality outlined by NRCIM's eight program setting features (Eccles & Gootman, 2002). As such, even with the deletion of items with lower factor loadings, the four-factor structure provides support for the eight program setting features. At this time, it also appears that APQS offers a potentially viable and psychometrically sound measure of program quality for youth between the