Development of an observational measure assessing program quality processes in youth sport

AbstractResearch has demonstrated that quality sport programs have the potential to foster the physical and psychosocial development of youth. However, there is an absence of observational measures to assess program quality related to psychosocial development within youth sport. The purpose of this paper is to report on two studies conducted to develop a valid and reliable observational measure to assess program quality processes in youth sport. Study one outlines the process of attaining content and face validity using an expert panel approach when developing the Program Quality Assessment in Youth Sport (PQAYS) observational measure through a review of literature and collaboration with expert academics and coaches. Study two outlines further steps taken to test the internal reliability, as well as convergent and predictive validity of the measure. Results from the two studies provide initial evidence that the PQAYS is a valid and reliable measure that can be used in intervention and evaluation research ...

Abstract: Research has demonstrated that quality sport programs have the potential to foster the physical and psychosocial development of youth. However, there is an absence of observational measures to assess program quality related to psychosocial development within youth sport. The purpose of this paper is to report on two studies conducted to develop a valid and reliable observational measure to assess program quality processes in youth sport. Study one outlines the process of attaining content and face validity using an expert panel approach when developing the Program Quality Assessment in Youth Sport (PQAYS) observational measure through a review of literature and collaboration with expert academics and coaches. Study two outlines further steps taken to test the internal reliability, as well as convergent and predictive validity of the measure. Results from the two studies provide initial evidence that the PQAYS is a valid and reliable measure that can be used in intervention and evaluation research within youth sport.

Subjects: Sport and Leisure Studies; Sports Development; Sports Coaching; Research Methods
Keywords: positive youth development; measurement; psychometrics; coaches; program quality Youth sport is a context with much potential for fostering positive youth development (PYD; Côté & Fraser-Thomas, 2011;Eccles & Gootman, 2002;Holt, 2016). The guiding principle underlying the PYD framework is the shift from a deficit-reduction paradigm to a proactive, asset-building ABOUT THE AUTHOR Dr Corliss Bean is a postdoctoral fellow at the University of British Columbia Okanagan within the School of Health and Exercise Sciences. She completed her PhD at the University of Ottawa where she worked with a team of researchers, and co-authors of this manuscript, on an Insight Grant funded by the Social Sciences and Humanities Research Council of Canada that assessed program quality in youth sport. Her research has focused on program evaluation of youth sport programs; a context where this tool can be used by researchers and practitioners. Corliss is heavily involved in research within the community and has worked with organizations at the local and national levels to develop curriculum and evaluate programs.

PUBLIC INTEREST STATEMENT
The Program Quality Assessment in Youth Sport (PQAYS) is an observational measure of program quality designed to assess Eccles and Gootman's (2002) eight program setting features that have been identified as critical in positive developmental outcomes in youth programs, and specifically sport. This measure can be used by researchers to better understand the mechanisms that facilitate or hinder youth development in this context. Practitioners can use the PQAYS for program assessment and improvement.
paradigm that sees youth as resources to be developed, rather than problems to be managed (Damon, 2004). This strength-based approach has gained popularity over the past three decades and enforces the importance of teaching youth important life skills (Botvin, 2004). Given that approximately 65% of children between 6 and 17 years of age across Canada and the United States are involved in organized sport programs (Guèvremont, Findlay, & Kohen, 2008;United States Census Bureau, 2014), it is critical that we further our understanding of how PYD can be fostered within this context. For the present study, sport was defined as a social and competitive activity requiring specific physical skills and physical exertion, which occurs within an institutionalized setting (Coakley & Donnelly, 2009).
In past research, participation in youth sport has been shown to lead to both positive and negative outcomes (e.g., Merkel, 2013). On one hand, youth sport participation has been associated with improved physical and psychosocial development (e.g., Eime, Young, Harvey, Charity, & Payne, 2013;Fraser-Thomas, Côté, & Deakin, 2005). On the other hand, participation in sport has been associated with physical and psychosocial problems (injury, depression), as well as economic and cultural concerns (financial burden, ethnicity and gender inequality; e.g., Merkel, 2013). To best understand the outcomes emanating from youth sport, there is a need to examine the mechanisms related to psychosocial development (e.g., Gould & Carson, 2008;Hodge, Danish, Forneris, & Miles, 2016).

Program quality within youth sport
Program quality is a multi-faceted concept; thus, a universal definition of quality does not exist. Different definitions are necessary to deal with specific concepts under different circumstances (Reeves & Bednar, 1994). Quality within youth programming is dynamic (Larson & Walker, 2010), yet for the purpose of this study refers to the structures and processes within a program that relate to youth outcomes (Baldwin & Wilder, 2014). Specifically, program structures refer to an organization's capacity to deliver a program to youth (e.g., physical space, staffing, funding, community collaborations). Program processes refer to how the program is delivered (e.g., supportive relationships, opportunities for skill-building, autonomy). Researchers contend that how youth programs, including sport, are structured plays a key role in determining whether positive or negative outcomes occur (Côté & Fraser-Thomas, 2011;Petitpas, Cornelius, Van Raalte, & Jones, 2005;Roth & Brooks-Gunn, 2015). As a result, program quality has been outlined as one of the best predictors of the developmental outcomes resulting from participation in youth programs (e.g., Durlak, Mahoney, Bohnert, & Parente, 2010;Roth & Brooks-Gunn, 2015;Yohalem & Wilson-Ahlstrom, 2010). Although there are many factors that have been identified as contributors to the quality of a program, to date, these factors have yet to be extensively examined in the youth sport context.
One of the most acknowledged and comprehensive classifications of program quality was proposed by Eccles and Gootman (2002) who worked with the National Research Council and Institute of Medicine (NRCIM) to summarize two decades of developmental psychology research. These authors proposed eight program features linked with positive psychosocial development: (a) physical and psychological safety; (b) appropriate structure; (c) supportive relationships; (d) opportunities to belong; (e) positive social norms; (f) support for efficacy and mattering; (g) opportunities for skill building; and (h) integration of family, school, and community efforts. Since their development, these eight setting features have been utilized to guide youth programming at the research and practical levels (HighScope Educational Research Foundation [HSERF], 2005;Yohalem, Wilson-Ahlstrom, Fischer, & Shinn, 2009). Although the usefulness of these features has been recognized by youth sport researchers (e.g., Côté, Strachan, & Fraser-Thomas, 2008;Povilaitis & Tamminen, 2017;Strachan, Côté, & Deakin, 2011), little empirical research has been conducted within sport utilizing these features, despite calls to do so (e.g., Côté & Mallett, 2013). One reason may be due, in part, to the lack of available measures to assess quality within the youth sport context that incorporate the NRCIM's eight setting features.

Measuring program quality processes
Program quality can be assessed in several ways, including the use of qualitative methods, quantitative self-report measures, and observational measures. In youth sport research, to date, most studies have relied on self-report measures, with observational research being neglected (Jones, 2015). Researchers have argued for the need to integrate observational measures of program quality as observational assessment allows for the description of behavior in natural environments, leading to greater ecological validity. It can also provide more objective evidence of the behaviors and strategies coaches are using in their programs, rather than relying solely on coaches' perceptions (Flett, Gould, & Lauer, 2012;Holt & Jones, 2008;Jones, 2015).
There are few observational measures that have been developed to assess quality within youth programming (for full details, see Yohalem et al., 2009), and none that have been developed specifically for the youth sport context. The Out-of-School Time Observation Instrument (Pechman, Russell, & Birmingham, 2008) and the Youth Program Quality Assessment (YPQA; HSERF, 2005) have both been designed for the use in youth programming. The Out-of-School Time Observation Instrument is based on Durlak and Weissberg's (2007) SAFE (Sequential, Active, Focused, Explicit) features and has not been used within academic literature. The YPQA has been used to assess program quality when conducting process evaluations within out-of-school and community programs (HSERF, 2005;Smith & Hohmann, 2005) and is loosely based on Eccles and Gootman's (2002) eight program setting features. This measure was used as a starting point for the development of the PQAYS.
The YPQA represents a valid and reliable measure for youth aged 8-18 years (Smith & Hohmann, 2005) and can be used as a self, internal, and external evaluation tool. This measure has four domains: (a) safe environment, (b) supportive environment, (c) interaction, and (d) engagement and is measured on a 3-point Likert scale scored as a 1, 3, or 5 (HSERF, 2005). Given the breadth of contexts (e.g., leadership, arts, mentoring, sport) in which this measure can be used, the YPQA is not designed to decipher the contextual intricacies that define quality in various youth programming contexts.
To date, only a few studies have empirically assessed program quality in youth sport using the YPQA. For example, Flett et al. (2012) utilized the YPQA to assess quality within youth softball and baseball programs. Despite the YPQA's reported strengths in reliability and validity, the authors outlined problematic issues with distribution and psychometric properties when using the YPQA in sport, particularly related to the low internal consistency of subscales. Thus, several revisions were conducted, reducing the measure from 52 items to 26 items in attempts to improve reliability. After shortening the measure for analyses, the authors found that the sport programs studied tended to yield high scores related to providing a safe and supportive environment, but lower scores related to providing opportunities for interaction and engagement.
In another study that also used the YPQA within sport and non-sport programs (Bean & Forneris, 2016a), a 5-point scale was used to aid in variability of program quality scores. Researchers conducted 184 observations across 33 youth programs and yielded similar results to Flett et al. (2012). Specifically, the highest scores were observed for providing a safe and supportive environment and lower scores for interaction and engagement opportunities. Issues related to low reliability persisted (Bean & Forneris, 2016a). In sum, because the YPQA did not include items important to assess in sport programs (e.g., developmental opportunities for sport/physical skills), the examples were not always relevant. Therefore, in its current form, the YPQA is not optimally suited to measure program quality within the sport context. Previous empirical research has supported the notion of sport being a unique context compared to other extra-curricular activities (e.g., Zarrett et al., 2008). Specifically, sport presents unique features that are not necessarily found in other types of youth programs (e.g., opportunities to develop both physical and psychosocial skills, inherent competitiveness; Danish, Forneris, Hodge, & Heke, 2004;Fraser-Thomas et al., 2005;Pierce, Gould, & Camiré, 2017).
Recently, MacDonald and McIssac (2016) discussed how a missing element in the sport psychology literature is an understanding of the processes through which PYD occurs in sport, but that no measures are currently available to assess such processes. Systematic observation is essential as a procedural step in understanding the mechanisms of psychosocial development within sport (Brewer & Jones, 2002). Given its worth for research and evaluation, but also program improvement, funding securement, and the retention of participants, a measure of program quality is needed for initiatives to be examined in the context of everyday practice.

The present paper
Based on the existing literature, it appears that the field of sport psychology stands to benefit from the development of an observational measure to assess program quality within youth sport for three reasons. First, observational data have been under-utilized within evaluation research and there has been an over-reliance on self-report measures (Flett et al., 2012). Second, quantitative observational measures are needed to understand the processes in which PYD occurs within youth sport (MacDonald & McIssac, 2016). Third, given that sport is the most popular extra-curricular activity across North America (Guèvremont et al., 2008;United States Census Bureau, 2014), there is a need to develop an observational measure specifically for this context. Therefore, the purpose of this paper is to report on two studies conducted to develop a valid and reliable observational measure to assess program quality processes in youth sport. The use of an observational measure will help address several aforementioned limitations within the current literature. The term program is used as this is the term most widely utilized within the positive youth development and program quality literature. However, it should be recognized that this term, in sport, also refers to "sport team" or "sport club." The first study summarizes the process of developing the PQAYS and presents an overview of the final measure. The second study outlines the steps taken to assess the reliability and validity of the measure. The procedures utilized in both studies followed the development and validation processes used in other observational measurement development related to coach development in the sport literature (e.g., Allan, Turnnidge, Vierimaa, Davis, & Côté, 2016;Brewer & Jones, 2002;Erickson & Côté, 2015).

Study one: measure development
The purpose of study one was to establish content and face validity for the observational measure. A series of steps were followed in its creation: (a) conducting a review of literature; (b) developing the initial measure, instructions, response format, and scoring; (c) involving expert academics and coaches to gather feedback on the measure; (d) piloting the measure; and (e) finalizing the measure.

2.1.
Step one: conducting a review of literature The first step was to review the English-language sport psychology literature, with the goal of locating and reviewing empirical studies, meta-analyses, position papers, literature reviews, book chapters, and doctoral dissertations that touched on program quality and PYD in youth sport. This was done in October 2016. The following procedures were used to locate sources: (a) computer searches of 10 electronic databases (e.g., SPORTDiscus, SCOPUS, PsycINFO) using different combinations of the following keyword search terms: youth (child, adolescent), organized sport (sport participation, physical activity), program quality (setting features, characteristics), youth development (positive youth development); (b) full scan of the reference lists of all relevant articles; and (c) manual searches of key peer-reviewed journals to find any additional relevant articles that did not arise during the database searches. The literature search yielded 195 articles. When reviewing the literature, many youth sport researchers (e.g., Côté & Abernethy, 2012;Weiss, 2008) advocated for the use of Eccles and Gootman's (2002) eight program features. Such features were utilized to frame the measure and the processes surrounding measurement development are outlined in step 2 below.

2.2.
Step two: developing the initial measure, instructions, and scoring

Developing the initial measure
The literature review helped the researchers understand current best practices for program delivery within sport, which informed the development of each subscale. As noted, it was evident that the eight setting features (Eccles & Gootman, 2002) were the most acknowledged features of program quality in general youth programming, as well as sport-based programming. Thus, these features were used to ground the instrument's initial organizing constructs, as there was sufficient evidence to support their worth in helping guide item development and the creation of additional subscales described below. Empirical and theoretical work from the youth programming and youth sport literatures, particularly from 2002 onwards, were also used to aid in measure development (see Table 1). For example, other measures, such as the YPQA (HSERF, 2005), were used to guide specific item development. One strength of both the eight setting features and the YPQA is that foundational elements of program quality (e.g., safe environment) are used to build upon to higher-order elements of program quality (e.g., supporting efficacy, providing opportunities for interaction and engagement). This strength was maintained in the development of the new measure. Further, although not specifically assessing program quality, existing observational measures used within sport, including the Coach-Athlete Interaction Coding System (Erickson, Côté, Hollenstein, & Deakin, 2011) and the Assessment of Coach Emotions (Allan et al., 2016) were reviewed to aid as a starting point in measure development. See Table 1 for a breakdown of each subscale, with specific references provided at the end of each item to demonstrate how it was informed by research. Throughout this initial item development process, Smith, Quested, Appleton, and Duda's (2016) review of observational instruments within sport and physical education was consulted. Based on the review of literature, two subscales were added (in addition to the subscales based on the eight setting features) to the measure and all the items were adapted to be sport-specific. These changes are detailed below. Finally, to ensure this was a measure of program quality and not solely a measure of coach competence, the researchers sought to achieve a balance of items that assessed the program, the coaches, and the youth. For example, in Supportive Relationships, there are two items focusing on coach behaviour, two items focusing on youth behaviour, and one item on the activities occurring within the program.
The initial version of the PQAYS was comprised of 64 items across 10 subscales. More specifically, from the eight setting features, two features were further divided into two separate subscales. First, based on the literature review and previous recommendations (Bean & Forneris, 2016a), safety was broken into two subscales of (a) physical safety and (b) psychological safety (see 1.1 and 1.2 in Table  1) to help ensure internal consistency as each element measures distinct program characteristics. Second, past research outlines the importance of intentionally structuring the sport context to deliberately teach life skills in combination with sport-specific skills (Camiré, Trudel, & Forneris, 2012;Fraser-Thomas et al., 2005;Gould & Carson, 2008;Weiss, Stuntz, Bhalla, Bolter, & Price, 2013). Therefore, opportunities for skill-building was divided into two subscales to measure opportunities for (a) sport and physical skill-building and (b) life skill-building (see 7.1 and 7.2 in Table 1).
Individual items for the PQAYS were developed one at a time, with at least one academic reference associated to each item. For example, the fourth item within the seventh subscale, Opportunities for Skill-Building -Life Skills (7.2), outlined that "Coach(es) debrief how life skills can be applied and transferred outside of a specific sport context" is supported theoretically and empirically in the literature (e.g., Allen, Rhind, & Koshy, 2015;Pierce et al., 2017;Turnnidge, Côté, & Hancock, 2014;Weiss et al., 2013). This process ensured that all items were thoroughly developed based on research. Moreover, sport-specific examples and explanations are provided within each item to contextualize the measure specifically to sport.

Instructions
The following section outlines the instructions of the measure. The measure commences with an introduction page, explaining the purpose of the measure. Comprehensive instructions were  The presence of adults and peers who demonstrate concern and support for youth developed to provide information regarding what is to be done before, during, and after a program observation session. For example, prior to conducting observations, an interview is to be conducted with the coach(es) to acquire an in-depth understanding of their philosophy and team objectives (see Appendix A for sample interview guide questions). In addition, it is necessary for the coach(es) to complete a Program Demographic Form before the observations begin to collect additional information (e.g., frequency and duration of sessions, types of sessions, parental involvement; see Appendix A for Program Demographic Form). Having the coach(es) complete this form is specifically designed to help the observer with scoring the eighth setting feature-integration of family, school, and community efforts. The interview, Program Demographic Form, and observations should all be used to inform the scoring of the items in the measure. Both the pre-interview with coach(es) and the Program Demographic Form can help provide an understanding of contextual features of the youth sport environment prior to observation, as research emphasizes the importance of understanding a sport program's context (Strachan et al., 2011).
A minimum of two observers are to be present at each program session to allow for the assessment of inter-rater reliability. This has been extensively supported within the literature on observational research (e.g., Brewer & Jones, 2002;Hallgren, 2012). Instructions on how to accurately score the program are outlined for observers (see Appendix A). Moreover, for credible conclusions on program quality to be drawn, a minimum of three observation sessions are required to occur throughout the duration of the program. This recommendation has been supported in other naturalistic observation research (Smith & Hohmann, 2005). During observation sessions, the observers should take field notes to use as supporting evidence when conducting the objective scoring of the items. Taking field notes is a common method of documenting observations (Patton, 2002). Immediately after an observation session, observers complete the PQAYS (i.e., not during the session). Excerpts from field notes are to be included within the comment section of each subscale to provide justification of scoring.
Lastly, a second interview with the coach(es) is to be conducted at program end to follow-up on the observations conducted. The purpose of this interview is to further understand elements of the program that were observed during observations, which may clarify some aspects related to the program's quality. Such an approach helps to increase the quality of the interpretations that can be made by the researchers (Tracy, 2010).
The process outlined above is considered the optimal procedure that researchers should strive to follow to assess quality within a youth sport program. However, from a practical standpoint, we acknowledge that not all components may be feasible within a given context or situation (e.g., some coaches may not agree to be interviewed). In the present study, the measure was validated using this optimal procedure, which allows for the use of multiple methods and sources to create a comprehensive account (Yin, 2009).

Scoring
The measure uses a 5-point Likert scale ranging from 1 (never) to 5 (very often). This scale was chosen to improve issues of minimal variability experienced in the original YPQA, where items were scored using a 3-point scale. A 5-point scoring system is commonly used in youth programming (e.g., Search Institute, 2015) and observational measures (e.g., Nakaha, Grimes, Nadler, & Roberts, 2016). For some items, there is also a "Not Applicable (N/A)" option, which is combined with a footnote that provides justification as to when and why scoring an item may not be applicable. If an item is deemed not applicable, no score is given, and the item is not calculated within the subscale's mean score. For example, if program quality is assessed in an individual sport environment, items for peer interactions may not be applicable.
The final scoring of the PQAYS is calculated by computing averages for each of the 10 subscales. Specifically, the subscale of Supportive Relationships has 4 items; therefore, the total scores of these items would be summed and divided by 4 to attain a mean score for the subscale. A total score of program quality is calculated by computing a mean score of the 10 subscales' means. This is done so as not to weight certain subscales as more important than others based on the number of items within a given subscale.

2.3.
Step three: involving academics and coaches to gather feedback Schutz and Park (2004) outlined that content validity is supported if individuals who are knowledgeable about the intended construct agree that items reasonably represent the construct and are assigned to the appropriate category. After the first two steps were completed, an expert panel approach was used, which involved identifying knowledgeable individuals to provide feedback on the measure (Zamanzadeh et al., 2014). Based on previous recommendations, the experts emanated from a homogenous population within the same discipline (Clayton, 1997). For this study, two expert panels were created and involved in reviewing the measure: 19 academics (six faculty members and 13 graduate students) considered experts in youth development through sport and 34 youth sport coaches, considered applied experts in the field.

Academic experts
A three-phased review process occurred, whereby researchers and practitioners were involved based on their expertise in the field of youth development through sport. In the first phase, researchers were selected in the review process, in which five research team meetings were conducted between November 2015 and January 2016. These meetings involved six graduate students and two faculty members. The meetings included discussions around content and formatting and provided opportunities to discuss any issues with the measure. Based on the discussions, modifications were made to the PQAYS, which included moving items from one subscale to another and removing overlapping items. In the second phase, the revised version of the measure was emailed to a group of experts at another academic institution (one faculty member and six graduate students). These individuals were asked to review the measure for: (a) appropriateness and clarity of the instructions, (b) potential overlapping of items and concepts across subscales, (c) appropriateness of the examples provided within each item, (d) order of items within each subscale, and (e) additional comments that would aid in refining the measure. The exact instructions sent to this group of experts are available upon request. The experts were given 1 week to complete their review. Once all of their feedback was gathered, a meeting was held between members of the research team and the group of experts to debrief the measure. Once the phase two feedback was integrated, the measure was sent out for review to a third expert panel. Eight additional academics were contacted via email and asked to follow the same instructions provided to the group of experts in phase two. Of these eight, four individuals (three faculty members and one graduate student) provided written feedback on the measure.

Youth sport coaches
Face validity was further established through an online questionnaire hosted on FluidSurveys. Youth sport coaches (N = 34) completed a questionnaire to test the relevance of items within the measure. Coaches who had agreed to participate in the observational component of study two were asked to also complete the online questionnaire. Additionally, a convenience sample was used where the first and second authors sent out the questionnaire link to some of their coach contacts. Snowball sampling was also used, where coaches who agreed to complete the questionnaire were asked to pass on the questionnaire link to additional coaches. In total, 121 coaches completed a portion of the questionnaire; however, despite informing participants about anticipated questionnaire length and having a progress bar on screen, only 34 participants completed the questionnaire in its entirety. Thus, these individuals were included in this portion of the face validity assessment.
2.3.2.1. Coach survey and data collection. Of the 34 coaches who completed the questionnaire, 16 were male, 17 were female, and one did not disclose their gender (M age = 32.01, SD = 12.05). In all, 19 individuals held a bachelor's degree; nine held a high school diploma, three held a Master's degree, and three individuals held a college diploma or a professional degree. Years of coaching experience ranged from 1 to 40 (M = 8.08, SD = 7.51). At the beginning of the questionnaire, coaches were provided with a definition of each of the 10 subscales proposed within the PQAYS. For each item, participants were asked three questions related to: (a) the element of program quality they believe the item corresponded to from the 10 subscales (participants were given the option of "more than one subscale" and "none of the above"); (b) the relevance of the item to their current sport practice; and (c) if they believed the item was clearly worded. The second and third questions were measured on a 10-point Likert scale from 1 (not relevant at all/not clear at all) to 10 (very relevant/very clear). Participants could also provide open-ended comments for each item.

Integrating expert feedback
After all the expert feedback was gathered, the research team held additional meetings to review the feedback and revise the measure. Feedback from academics and coaches resulted in measure improvements by enhancing clarity surrounding the instructions and procedures, ensuring the congruency of items within each subscale, minimizing overlap of elements across subscales, providing appropriate examples for each item, and outlining missing items.
Specifically, academic experts provided valuable feedback to adjust some of the questions. For example, within the Physical Safety subscale, an item was changed from "Coach(es) respond appropriately" to "Program staff respond appropriately" to not discredit the coach if another individual (e.g., trainer) was tasked with directly attending to injured youth. The academic experts also suggested further questions. For example, within the Program Demographic Form, two questions were added to address the parents' level of involvement within the program (i.e., "Are parents welcome at practices?" "What is the level of parental involvement?"). Based on academic feedback, some items were also moved from one subscale to another. For example, "Coach(es) mediate exclusive/conflict behaviour from youth appropriately" was moved from Positive Social Norms to Psychological Safety (item 3). Finally, as overlap between the eight setting features exists (Eccles & Gootman, 2002), academic experts provided suggestions on how to minimize such overlap. For example, the items "Coach(es) promote empathetic behaviours amongst youth" and "Coach(es) encourage all youth to participate in the activities" within the Opportunities to Belong subscale were removed as they were deemed by academics and coaches as being redundant.
In the questionnaire, coaches rated the items as relevant to their current sport practice (M = 8.89; SD = .52; Range = 6.56-9.47) and clearly worded (M = 8.91, SD = .22; Range = 8.21-9.40). Four items were outlined as either passive or detracting from their sport practice, with mean scores below 8 out of 10. These items fell within the Opportunities for Skill-Building-Life Skills (e.g., "Coach(es) provide opportunities for youth to improve life skills through practice") and Integration of Family, School, and Community Efforts (e.g., "Program provides youth opportunities to work with their community and practice their learned skills") subscales.
The four items deemed less relevant by coaches were nonetheless retained for two reasons. First, the subscales in which the items fell have been recognized within the academic literature as elements of high-quality programs (e.g., Bean & Forneris, 2016a;Gould & Carson, 2008). Second, academics were further consulted to provide their judgement on whether to retain or remove these four items and all advised to retain them. Changes made based on the academic and coach feedback led to item reduction from 64 to 54. The 54 items were used in step four.

2.5.
Step four: piloting the measure To further assess face and content validity, seven researchers (two individuals on the research team and five research assistants) piloted the measure. To increase the quality of pilot testing, a preliminary meeting was held surrounding proper use of the PQAYS. This included outlining the purpose of the measure, how to use the measure, and how to score the PQAYS items. Scenario-based questions and case study examples were used during the meeting to test researcher comprehension prior to the commencement of pilot data collection.
The seven researchers piloted the measure within three community sport settings using the optimal procedure (i.e., completion of Program Demographic Form and pre-post interviews, minimum of two observers present). After each of the seven researchers had completed a minimum of three observation sessions, they met to discuss their overall experiences using the measure and the associated tools, including any concerns or difficulties related to clarity or scoring. This meeting was audio-recorded, transcribed, and reviewed. Following a specific recommendation from the academic panel of experts, the researchers documented the length of time it took them to complete the measure after their observation sessions. Completion of the measure ranged between 30 and 60 min, yet the length of time tended to decrease as researchers became more familiar with the process. For example, researchers came to recognize certain situations, behaviors, or interactions that fit within specific PQAYS items, and thus, they referenced these in their field notes (e.g., this behavior supports item 2.3), which made the completion of the measure more efficient. Minor wording changes were made to the measure at this time. Slight modifications were also made to the interview guides by adding and rearranging some questions to improve the flow of the interviews (Maxwell, Chmiel, & Rogers, 2015).

2.6.
Step five: finalizing the measure The piloting process presented in step four resulted in further modifications being made, with the PQAYS reduced from 54 to 51 items representative of the 10 elements of program quality (see Appendix A for full measure). Specifically, the item "There is physical evidence of positive social norms within the environment (e.g., motto/slogan present, youth wear team clothing or have team bags)" was removed from the Positive Social Norms subscale. The researchers concluded from their pilot work that wearing team clothing or having a physical motto present within the sport context did not necessarily influence the quality of social norms that were fostered within a program. Further, after piloting the measure, the importance of the Program Demographic Form was recognized to attain a comprehensive baseline of what to expect from the program prior to observation (e.g., number of participants).

Study two: measurement testing
The purpose of the second study was to further test the reliability and validity of the PQAYS developed in study one. Descriptive statistics of the 10 PQAYS subscales, including: (a) Physical Safety (n = 8 items); (b) Psychological Safety (n = 3 items); (c) Appropriate Structure (n = 7 items); (d) Supportive Relationships (n = 5 items); (e) Opportunities to Belong (n = 3 items); (f) Positive Social Norms (n = 3 items); (g) Support for Efficacy and Mattering (n = 8 items); (h) Opportunities for Skill-Building-Sport and Physical Skills (n = 5 items); (i) Opportunities for Skill-Building-Life Skills (n = 4 items); and (j) Integration of Family, School, and Community Efforts (n = 5 items) and the total measure can be found in Table 2. Two forms of reliability were examined: internal consistency and inter-rater reliability. Preliminary testing of convergent validity was also conducted by correlating scores of the PQAYS with scores from a questionnaire that assesses youth perceptions of program quality and predictive validity was assessed using a measure of youth perceived developmental experiences.

Context and procedure
Following ethical approval from the research team's university Research Ethics Boards, the lead researcher contacted various youth sport programs across Southeastern Ontario in Canada. Study information, including the overall purpose and procedures, was communicated to program leaders and coaches who were interested. Coaches agreed to participate at varying capacities (e.g., solely the observational portion of the study or the completion of both the observations and the questionnaire). Coaches from 52 programs agreed to engage in the observational portion of the study; 17 sport programs run by non-profit organizations that serve youth from low-income neighborhoods and 35 community club sport programs were involved. Within these 52 programs, there was a range of developmental (n = 20), recreational (n = 12), and competitive (n = 20) sport programs across a variety of sport (e.g., football, golf, basketball, dance, baseball, soccer, ball hockey, and ice hockey). Programs involved girls only (n = 12), boys only (n = 9), and mixed (girls and boys; n = 31) teams. Program sessions ran between 60 and 240 min in length (M = 115.48 min) and were offered between one and five times per week. Youth involved in these programs ranged from 5 to 18 years of age. Enrolment within a given program ranged from 6 to 32 youth. Coaches from 24 of the 52 programs agreed to have youth (n = 322) complete self-report questionnaires in addition to the program observations. Prior to conducting observations, consent and assent forms were distributed to and completed by coaches, parents, and youth involved in the programs.
In all, 307 observation sessions were conducted across the 52 programs, with an average of 4.89 (SD = 1.53, range = 3 to 10) sessions observed per program over the course of 24 months. Steps were taken to reduce social desirability during observations by: (a) reiterating to coaches that the objective of the study was to understand program quality as a whole, not solely coaches' performance, (b) reminding coaches that participation in the study was voluntary, (c) assuring coaches that observation scores would remain confidential, and (d) ensuring that researchers sat in unobtrusive areas while observing the program sessions.

PQAYS
This measure was outlined in study one.

Youth program quality survey (YPQS)
An adapted version of the Youth Program Quality Survey (YPQS) was used to assess convergent validity to attain youth's perceptions of program quality (Bean & Forneris, 2016b;Silliman & Schumm, 2013). Such a measure was selected because it is also based on the NRCIM's eight setting features (Eccles & Gootman, 2002). It was important to understand if the observed assessment of program quality (measured by the PQAYS) was congruent with youth's perceptions of their program experiences. As the YPQS is relatively new, few studies have utilized this measure; however, past findings have revealed moderate to high instrument reliability (α = .60-.96; Silliman, 2008;Silliman & Schumm, 2013;Silliman & Shutt, 2010).
One study (Bean & Forneris, 2016b) outlined a poor model fit and as such, modifications were made based on the results of an exploratory factor analysis that showed good model fit (CFI = .932, TLI = .920, SRMR = .0456, RMSEA = .037). The modifications included reducing the measure from a 24-item measure to a 19-item measure for youth between 10 and 18 years of age. The adapted version of the YPQS was used in the present study. The 19 items are comprised within four subscales: (a) Appropriate Adult Support and Structure (five items; e.g., "Rules and expectations were clear" and "Adults listened to what I had to say"), (b) Empowered Skill-building (seven items; e.g., "I was challenged to think and build new skills"), (c) Expanding Horizons (four items; e.g., "I gained a broader view of the world beyond my community"), and (d) Negative Experiences (three items; e.g., "I felt like I didn't belong") in which all eight program setting features are represented (Bean & Forneris, 2016b). The YPQS is measured on a 5-point Likert scale from 1 (strongly disagree) to 5 (strongly agree). With the current sample, the YPQS showed good internal consistency (α subscale range = .71-89; α total measure = .90).

Short-form youth experience survey for sport (YES-S)
To further test the validity of the PQAYS, the short form of the YES-S was used as a measure of youth's developmental experiences in sport. This scale is comprised of 23 items that assess youth's perceptions of their personal and interpersonal developmental experiences as well as negative experiences in youth sport (Sullivan, LaForge-MacKenzie, & Marini, 2015). Adapted from MacDonald, Côté, Eys, and Deakin (2012), four subscales of this measure were used: Personal and Social Skills (four items; e.g., "Learned about controlling my temper"), Goal Setting (four items; e.g., "Learned to find ways to reach my goals"), Initiative (four items; e.g., "Put all my energy into this activity"), and Negative Experiences (five items; e.g., "Adult leaders intimidate me"). The questionnaire was measured on a 4-point Likert scale from 1 (yes, definitely) to 4 (not at all). With the current sample, the internal consistency for this scale was good (α subscale range = .73-.90; α total measure = .87).

Internal consistency reliability
Internal consistency was tested using Cronbach's alpha. Nunnally's (1978) criteria of .7 is widely used as the acceptable standard for scale reliability, but alpha's as low as .5 or .6 have also been identified as acceptable (e.g., Nunnally, 1967;Peterson, 1994). Within eight of the 10 subscales of the PQAYS, Cronbach's alpha statistics demonstrated high levels of internal consistency (i.e., >.7; see Table 2). Two subscales (Physical Safety, Opportunities to Belong) fell just below .7 (Nunnally, 1978). This can happen when subscales have a small number of items or when subscales measure a wide range of constructs (Cortina, 1993;Tavakol & Dennick, 2011). For example, Opportunities to Belong may have a lower Cronbach's alpha because it is comprised of only three items. Physical Safety measures many different constructs (i.e., if the program space is free of hazards, accessibility to first aid supplies, if proper sporting equipment is worn), which may have contributed to lower reliability. What is most important to note is that the internal consistency of the overall PQAYS yielded good internal consistency (α = .84), reinforcing the importance of assessing program quality as a hollistic construct. Eccles and Gootman (2002) argued that in order to achieve a high-quality program, programmers must incorporate all eight setting features. Many researchers have identified challenges surrounding the use of Cronbach's alpha for internal consistency, whereby a low alpha is not necessarily associated with low reliability (e.g., Dunn, Baguley, & Brunsden, 2014;Henson, 2001). Further, researchers have argued that inter-rater reliability should be considered of greater importance when assessing observational measure reliability (e.g., McHugh, 2012).

Inter-rater reliability
Inter-rater reliability, using the Kappa statistic, was performed to determine consistency among raters. As two researchers were in attendance for every observation, their scores for each item were compared for consistency. For every pair of observations conducted, a score was calculated and then a total score of inter-rater reliability for each subscale was determined. Table 2 outlines the inter-rater reliability statistics for the total measure (κ = 75; [p < .0005], 95% confidence interval [.74, .76]) and each subscale (κ range = .61-.88), outlining consistent and substantial-tonear-perfect agreement between raters (Landis & Koch, 1977).

Convergent and predictive validity
As outlined above, youth from 24 of the 52 observed programs (N = 322, M per team = 13.42, SD = 7.41) completed the YPQS and YES-S short to assess their perceptions of program quality and developmental experiences, respectively. These 322 youth (50% boys) ranged from nine to 18 years of age (M age = 13.66, SD = 2.91) and their length of involvement in the given program ranged from 1 to 10 years (M = 3.36, SD = 2.89). Youth identified as Caucasian (59%), Black (14%), Asian (10%), multiracial (8%), Arabic (3%), and Aboriginal (1%) with 5% of youth not disclosing their ethnicity. Paper questionnaires were distributed by researchers to all youth at the end of a program session. Coaches were not present during questionnaire completion to minimize social desirability. Researchers answered youth's questions related to comprehension.
Convergent validity assesses the agreement between scores of two measures that are believed to assess similar constructs (Schutz & Park, 2004). To assess convergent validity, researcher-scored PQAYS data were compared to the YPQS data that were completed by the youth participants involved in the same sport programs. This procedure was completed to assess whether the PQAYS measured similar constructs to the YPQS. The identified importance of holistic program quality (e.g., Eccles & Gootman, 2002) resulted in two scores of total program quality being calculated (i.e., two mean scores of averaged subscales-one for the observed measure of program quality and one for the youth-perceived measure of program quality) for each of the 24 programs. A Pearson's correlation coefficient was used to assess if these two variables were correlated. Analysis showed that the PQAYS and YPQS were significantly and moderately correlated (r = .52, p = .001).
Predictive validity measures the extent to which a construct measured using scales predicts scores on another measure (Cronbach & Meehl, 1955). Knowing that program quality is one of the best predictors of positive developmental outcomes in youth programming (e.g., Durlak et al., 2010;Yohalem & Wilson-Ahlstrom, 2010), it is important to assess this relationship. However, to date, there has been no measure that can systematically measure program quality in youth sport. Thus, the regression included the total score of program quality regressing on the total score of the YES-S short to determine if program quality predicted developmental experiences within the 24 programs. Both total scores were averaged using the mean of the average of YES-S short subscales. A regression analysis was conducted which indicated that observed program quality significantly predicted youth's perceptions of psychosocial experiences, whereby program quality accounted for 21% of the variance (F (1, 22) = 5.73, p = .026, R 2 = .21).

Discussion
The purpose of this paper is to report on two studies conducted to develop a valid and reliable observational measure to assess program quality processes in youth sport. Study one was conducted to establish content and face validity for the observational measure and used academic and applied expert panels. Study one resulted in the creation of the 51-item PQAYS measured across 10 subscales based on Eccles and Gootman's (2002) eight program setting features. The second study further tested the reliability and validity of the PQAYS through internal consistency reliability, inter-rater reliability, and convergent and predictive validity. The results provide initial evidence to support the reliability and validity of this measure and demonstrated the potential for using the PQAYS as an observational measure of program quality within youth sport. It should be acknowledged that the results provide evidence for the validity and reliability of the PQAYS, but only when the aforementioned optimal procedures presented in study one are followed (i.e., completing the Program Demographic Form, having two observers present for each observation, conducting a minimum of three observations, conducting pre-and post-interviews).
The development of a valid and reliable measure of program quality is justified because program quality has been outlined as one of the best predictors of developmental outcomes within youth programming (e.g., Roth & Brooks-Gunn, 2015;Yohalem & Wilson-Ahlstrom, 2010). If we consider that sport is the most popular extracurricular activity for North American youth, assessing the quality of sport programs must be a priority.
The eight setting features work together as building blocks (Eccles & Gootman, 2002;HSERF, 2005), whereby providing a positive climate and creating an appropriate structure act as the foundation for higher-order elements of program quality to occur (i.e. opportunities for skill-building). Singer, Newman, and Moroney (2018) argued that program quality should indeed be viewed as hierarchical, where "participants should experience a safe and supportive environment, so that ultimately they can engage in positive relationships and skill building" (p. 196). Therefore, to achieve quality programs, programmers should strive to incorporate a balance of all eight setting features into their programs (Eccles & Gootman, 2002). In sport, this means that coaches must be intentional in their approaches, with deliberate and strategic decisions made to create opportunities that maximize their athletes' psychosocial development (Walker, Marczak, Blyth, & Borden, 2005). For example, the importance of coaches adopting explicit approaches to foster the development and transfer of life skills has recently been outlined in a continuum of intentionality (Bean, Kramers, Forneris, & Camiré, 2018) and bolsters the importance of adopting intentional approaches in delivering quality sport programs.
The development of the PQAYS has important practical implications. Researchers have called for more work to examine how youth sport programs influence youth's developmental outcomes and program quality has been identified as an important variable to assess (e.g., Holt & Sehn, 2008;Petitpas, Cornelius, & Van Raalte, 2008). The PQAYS can be used when working with coaches to understand the components needed to deliver quality youth sport programs. As program quality is dependent on the fidelity of program implementation (Baldwin & Wilder, 2014), delivering a quality program is beneficial not only for youth psychosocial development, but also for the retention of youth within a given program. Further, using the PQAYS to understand the current state of youth sport may also be helpful in identifying gaps within coach education and developing material to help close those gaps.
As outlined in the PQAYS instructions, this tool can be used in different ways depending on the research goals. Specifically, if the measure is used as part of an intervention or single case study, conducting more than three minimum observations is suggested. Further, it is critical that all steps be carried out (e.g., pre-post interviews) to gain a comprehensive perspective of the program through multiple methods. In contrast, if the purpose of using the PQAYS lies in making comparisons across many sport programs, it may not be feasible to completely follow the outlined optimal procedure, from a time and resources point of view. Nonetheless, for ideal PQAYS usage, it is recommended to conduct pre-post interviews, observe a minimum of three program sessions, have multiple observers, and complete the Program Demographic Form.
A strength of the PQAYS is that it is explicitly structured to account for the eight setting features developed by Eccles and Gootman (2002). Previous measures (e.g., YPQA, YPQS, Youth and Program Strengths Survey) relied upon inference as it relates to which items comprised each of the eight setting features. Having the PQAYS structured explicitly around the eight features allows academics and practitioners to better assess program quality within the sport context, understanding where program strengths lie, and where improvements are needed.
Findings from study two outlined that Opportunities for Skill-Building-Life Skills was rated substantially lower than all other subscales of program quality. Much theoretical (e.g., Fraser-Thomas et al., 2005;Petitpas et al., 2005) and empirical research  has emphasized the importance of intentionally teaching life skills within the sport context. Specifically, in one study, sport programs that were intentionally structured to teach life skills scored higher on program quality compared to sport programs that did not intentionally teach life skills (Bean & Forneris, 2016a). Such findings reinforce previous assertions that coach education is needed related to how coaches can intentionally teach life skills to foster the positive development of youth within sport (Vella, Oades, & Crowe, 2011). The goal of the PQAYS is to help both researchers and practitioners (e.g., coaches, youth sport directors) better assess the quality of their programs and thus be in informed positions to make choices (e.g., coaches accessing further training) to ensure that youth are afforded quality sport experiences.

Limitations and future research
This study represents a first step towards contextualizing important elements of program quality within youth sport. Having PQAYS users follow the detailed guidelines outlined in the instructions allows for a comprehensive understanding of program quality, through the use of multiple methods (interviews, observational field notes, quantitative documentation) and procedures that enhance rigor (e.g., multiple observers over multiple sessions). Despite the rigors built within the PQAYS procedures, it is inevitable that when using this tool, observers will rely on their own perceptions and lived experiences (e.g., Haerens et al., 2013), which should be acknowledged as a limitation. However, the potential for engaging in a bracketing interview prior to data collection may help minimize this bias. The goal of a bracketing interview is to enhance the researcher's reflexivity and awareness of potentially unacknowledged preconceptions that may influence the research process (Rolls & Relf, 2006). Future work with the PQAYS can include video recording program sessions, as the use of video can aid in the effective and objective use of observational measures (see Smith et al., 2016 for review). Although the Program Demographic Form and interviews with coaches can help researchers gather key information pertaining to programmatic structure, future research is warranted to refine the gathering of such information, which is crucial in contextualizing observations. Further, few individual sport programs were included within the current study despite attempts to recruit in this context. To continue to explore this area, data have recently been collected using the PQAYS within the sport of golf. Moreover, we recognize that "sport" is not a homogenous context, but rather includes a myriad of different structures (e.g., rules, competition levels). As such, research is ongoing to continue to test the validity and reliability of the measure across a variety of youth contexts (e.g., recreational and competitive; male and female; indoor and outdoor). However, it is important to note that the eight setting features have been assessed qualitatively and identified as prevalent in competitive, recreational, and summer sport camp contexts (e.g., Povilaitis & Tamminen, 2017;Strachan et al., 2011). This past research provides a foundation to build on and reinforces the use of multiple methods to explore program quality.
Another limitation relates to the lower reliability for two subscales. However, Nunnally (1967) outlined that in "the early stages of research on predictor tests or hypothesized measures of a construct, . . .reliabilities of .60 or .50 will suffice" (p. 226). As the PQAYS constitutes a new measure, further testing is underway to examine reliability of these two subscales within a different sample, as well as investigating additional characteristics of relevance that do not overlap with other subscales within the measure. Moreover, when using an observational measure, inter-rater reliability is considered of greatest importance when assessing reliability (e.g., McHugh, 2012). Several researchers have outlined issues surrounding the use of Cronbach's alpha to measure internal consistency within observational measures (e.g., Sijtsma, 2009). Within the current study, all subscales achieved substantial or almost perfect agreement between raters (Landis & Koch, 1977). In addition, as noted above, the low reliability on a subscale may be due to the wide range of constructs that fall within the subscale.
As program quality is a relatively new area of study within sport research, few measures exist to test the validity of the PQAYS. For example, the YPQS, one of only a few self-report measures of program quality, has received little psychometric testing. Given that research on youth sport is growing, there is a need for more research to validate the PQAYS and develop additional quantitative measures (MacDonald & McIssac, 2016). Future research with the PQAYS will consist of assessing predictive and structural validity with a sample of a minimum of a 10:1 participant-toitem ratio to conduct a confirmatory factor analysis (Nunnally & Bernstein, 1994). Examining how certain elements of program quality influence outcomes (e.g., Bean & Forneris, 2016c) has only recently been examined in the literature; thus, more research is needed to investigate if certain elements of program quality have greater influence on youth development than others.

Conclusion
To date, access to tools for assessing program quality within youth sport has been limited (Holt, Deal, & Smyth, 2016;Holt & Jones, 2008), with the majority of measures being self-report. Thus, the PQAYS addresses a gap by offering a research-based observational measure that can be used to assess the quality of youth sport programming. Although research has provided recommendations for quality improvement within the youth programming context (e.g., Baldwin, Stromwall, & Wilder, 2015), using the PQAYS can aid in capacity building for coaches and programs in ways that optimize the positive development of youth.
evaluators. Use of this measure can aid in outlining areas of strength and areas of improvement within a sport program.

Instructions
Prior to Conducting Observations with the PQAYS • Become familiar with the measure by reading through each of the items and footnotes. This will make it easier to take notes during your observations that will provide a more comprehensive assessment.
• If not already, become familiar with the sport context in which you will observe (e.g., rules, basic culture). To do this, complete the Program Demographic Form with the coach or have the coach complete the form. This form can be found at the end of the instructions. Information within this form will provide context to the program including the program's regular practice and competition schedule, additional sessions they have besides practices and competitions that may be of value to observe, and when youth typically arrive prior to program sessions.
• In order to gain a better understanding of the team prior to starting the observations we recommend two steps be taken. First, review the vision, mission, values, and objectives of the team and/or organization, which can often be found online. Second, conduct an interview with the coach. This interview will allow you to have the most comprehensive understanding of the environment you will observe. A sample interview guide can be found at the end of this measure.
• Finally, coordinate with the coach regarding the best time to attend the program sessions.

Attending and Observing the Program
• It is strongly recommended to have two observers complete the PQAYS each time a program session is observed for inter-rater reliability.
• Make sure to have a hard copy of the PQAYS to use as a reference when conducting an observation. However, the full scoring and completion of the PQAYS should be done once the program session is over.
• Based on the arrival time outlined in the Program Demographic Form, plan to arrive at the same time as youth to begin your observation (typically between 15 and 45 min prior to regular practice/competition time). This will give you time to not only prepare yourself, but also to communicate with the coach(es), if necessary. The coach(es) often interact with youth prior to the commencement of a program session and such interactions can be valuable for observation.
• When observing a session situate yourself as unobtrusively as possible.
• Observations should be conducted up to 15 min after the scheduled program as well. This is to fully capture the program experience and coach-youth interactions.
• Observe a minimum of three practice sessions at different time points in the program's duration (the more the better). Observing sessions throughout the season/program allows for the measurement of inter-rater reliability and allows for a good understanding of what regularly occurs within the program.
• If the purpose of using this measure is to make comparisons across various sport programs, then three observations is sufficient. However, if this measure is part of an intervention or case study, we recommend more than three observations.
• The majority of observations should be done at practices, as this is when most youth-coach interaction takes place; however, it is strongly encouraged to observe other sessions that have been outlined in the Program Demographic Form as regular program components, if possible.
• During the observations make sure to refer to all of the corresponding footnotes in the PQAYS for supplementary information.
• Take detailed field notes during the program session.
○ These field notes can be taken by hand or using an electronic device and should be objective and factual, specific, and chronological (use of time markers may be helpful). For example, field notes should include, but not limited to, sequences of events, descriptions of interactions, quotations of coach/youth interactions, and list of materials that provide evidence for individual items with the PQAYS.
• If feasible, and approved by coach(es), use video recording to aid with corresponding field notes.
• At the end of a given session, ask coach(es) any follow-up questions you may have.

After Attending the Program
• Complete the PQAYS and supporting "Comments" sections immediately after the program session is over, thereby basing program quality scores on observational evidence and preventing problems related to recall. The "Comments" sections are there to provide justification and evidence for the scores provided within the subscale, and provide space for text from the detailed field notes taken during the program observation.

Scoring
• Score each item from a 1 (never) to 5 (very often). A score of 5 means that the behavior or program characteristic being observed for a given item it is highly evident and consistent. A score of 4 means that the behavior or program characteristic is evident but perhaps not as explicit or consistent. A score of 3 means that the behavior or program characteristic is evident but is fairly inconsistent. A score of 2 means that the behavior or program characteristic is not very evident or explicit (e.g., one occasion or example when you would expect more consistency) and finally a score of 1 indicates no evidence of a behavior or program characteristic. For a few specifically outlined items, there is the option of "N/A" (not applicable). This will ensure that a particular item does not lower the score for that subscale if it was not present (e.g., if there was no conflict between youth there was no need for the coach(es) to manage the conflict). However, the majority of the items within the measure should be evident; therefore, if it is not observed during the session, when it typically should be, observers should score a 1 and not N/A (e.g., if the coach(es) did not discuss the importance of developing life skills with youth, this should be scored as a 1 and not N/A).
• Also, keep in mind that there may be multiple coaches working with the same team. The score needs to reflect all of the coaches and not just one. For example, if more than one coach is actively interacting with youth while two others are not then the program should not be scored a 5 but perhaps a 3 or 4 depending on the level of interaction of all three coaches combined.
• Scoring of the PQAYS can be calculated by computing averages for each subscale. A total score of program quality can be calculated from computing an average of all items within the measure.

After all Program Observations are Completed
• A second interview should be conducted to follow-up with the coach(es) based on the observations. The purpose of this interview is to further understand elements of program quality observed which may clarify some aspects of the program. A sample interview guide that includes suggested questions to use in the post-observation interview can be found at the end of the measure.

Note:
The above instructions detail the process that we strongly recommend when using the PQAYS. However, we understand that not all components may be feasible (e.g., some coaches may not agree to be interviewed). In these cases, it is encouraged that individuals using this measure to gain as comprehensive of an assessment of the program as possible.

Program Demographic Form
As mentioned above, this form should be completed prior to starting any observations. Key questions to be asked while discussing this form with the coach concern the structure of the program including how long the season is, how many practices occur per week, how long the practices are, when do youth and coach(es) typically arrive for practices, how often competitions or games occur, are there regular activities outside of the practice time that are important to observe (e.g., mental training sessions, dry-land training, or social events). This would ensure the whole program is being observed effectively and no component is being missed. Moreover, answers to some of these questions may help score items within the last subscale, which is Integration of Family, School, and Community Efforts, and give light to any planned community events.    Comments: 4. If there is only one coach, mark as Not Applicable . 5. If observation is not done early on in the program, observer may miss the establishment of expectations; therefore, focus on the reinforcement of the expectations. This could be something that is followed up within coach interview. 6. If the sport being assessed is an individual sport, mark as Not applicable. Bean et al., Cogent Social Sciences (2018) 7. If the sport being observed is individual, mark as Not Applicable . 8. Completion of the Program Demographic Form and pre-interview will help in answering this item. It is also important to follow-up with coach(es) throughout the program as activities and opportunities may change 9. These activities do not have to take place outside of regularly scheduled program time, yet is an activity designed to provide opportunities to develop relationships (e.g., holiday party). Comments:

POSITIVE SOCIAL NORMS
10. Higher levels of choice results in a higher score; use comments section and field notes to document the types and quality of choices (e.g., youth choosing a drill for warm-up would be scored higher than choosing position or jersey colour) . 11. A task-mastery climate is one where coach(es) offers positive reinforcement when youth demonstrate hard work, improvement, and by eliciting an equal contribution from each youth. 12. If the sport being observed is individual, mark as Not Applicable. 13. Examples of life skills include: emotional skills (e.g., focus, emotional regulation, conflict resolution, moral character); intellectual skills (e.g., decision-making, critical reasoning, goal-setting); and social skills (e.g., teamwork, communication, cooperation, connectedness to adults and teammates). 14. Completion of the Program Demographics Form will help to answer certain items within this subscale. It is also important to follow-up with the coach throughout the program as activities and opportunities may change.

1 OPPORTUNITIES FOR SKILL-BUILDING-SPORT AND PHYSICAL SKILLS
15. Item should be scored based on if parents are welcome to the program, not if they attend.