Action Inquiry Into the Use of Standardized Evaluation Tools for Music Therapy A Real Life Journey Within a Parent -Child Community Program

Sing & Grow is an early intervention music therapy pro t that provides community group music therapy programs to families with young children who encounter risk factors that may impact on parenting and optimal child develop variety of evaluation tools were devised and used over the first 3 years of the project. Upon the subsequent funding and expansion of the project at the end of this period, it was necessary to find, test and devise more rigorous, valid and reliable meas res to withstand the scrutiny of researchers, and to combat the concerns and criticisms associated with the previous methods of data collection. An action inquiry project was therefore undertaken with two groups of project participants to trial the use of the Parenting Stress Index and Depression, Anxiety and Stress Scales, both recommende by leading psychologists. Key findings that will be discussed include the friction between the deficit-focussed nature of many psychometric tools and the strengths-based approach taken in service delivery, the level of difficulty in terms of literacy and comprehension for vulnerable respondents, and the lack of one tool with the ability to comprehensively measure all aspects of a broad scoping program. Abstract Registered the sessions, with the aims of: stimulating child development; increasing positive parent - child interactions; promoting positive parenting skill and improving social connectedness for participating families.


Evaluation of Music Therapy with Families
PRE PUBLICATION VERSION -SEE VOICES ONLINE JOURNAL 2006 VOL 6 (2) JULY 1. session conducted, observable behaviours were recorded at 5 second intervals, using 6 codes each for mother behaviour and child behaviour. I is not clear how the codes and their meanings were derived, but they "were chosen to ry and record how engaged the clients were in the sessions and the amount of interaction that took place" (p. 32). Four codes were related to the parents and children interacting with each other, staff or peers in the group, 1 code was for negative behaviour (not doing what was required or actively resisting) and another for being engaged in the group (i.e. doing what was required). One investigator undertook all of the video tape analysis and whilst a second investigator "helped her to achieve consistent results and made sure her analyses were reliable" (p. 31), it is unclear as to whether any inter-rater reliability tests were carried out. A total of 18 families were involved in approximately 8 hours of contact (music therapy, music and play sessions), therefore real-time video analysis of each dyad involved would have taken the rater at least 52 hours to complete. Video analysi showed consistently high levels of engagement for both mothers and children "showing that the treatment met one of it major aims -to engage the mothers and children in positive activities in play and music therapy sessions" (p. 33), however no or limited chang over the course of sessions was demonstrated.
Questionnaires were given to some of the parents involved, bu not those who were having the most difficulties with parenting and family relationships at that time as "it was felt that it would not be fair to put additional pressure on this particular group of parents at that time" (p. 32). All other parents were asked to omplete a 4 item survey at the end of each session, using Likert scales to assess the mot rs' perceptions of their children's behaviour in social and play situations over the last ek, or in the session that week. It is not clear how these items were derived, but it appears hey may have been organic [1]. One hundred and twenty six surveys, or 504 items of data would have been collected for analysis if there was a consistent 100% attendance and response rate. Results of the questionnaires yielded between group differences in parental perception of children's behaviours but did not yield any change over the course of sessions provided. For one particular set of parents, each play and music therapy session was followed by a discussion between the parents and involved staff. These meetings were audio taped and "although it did not prove possible to analyse the aud tapes in a quantitative way, there were many observations that could usefully be made" (p. 32).
Shoemark's (1996) music therapy program for families w hildren with special needs, presented in a playgroup context, was evaluated by "debriefing sessions with the professional team, a survey of families, and spontaneous comments from families" (p. 12). The survey was brief and used primarily open-ended questions to gauge the benefits of participation in the program for families and any c nge in their use of music in the home. It also asked for feedback that could be used to further refine the program. Positive comments were yielded from both staff and parents and primarily concerned with the enjoyment and usefulness of the tape resource provided to families, and the flexible and immediate nature of the therapist's facilitation of sessions. Over one year, 11 families were involved, with an average of 5 families attending each week. The use of an initial intake survey to ascertain any base levels was not mentioned, and a relatively small number of families were involved, yielding manageable amounts of data for analysis. JULY 1. (2005), used a once-off 21 item questionnaire which appears to be organically devised to explore parents perceptions of the value of a group music therapy program (Music Together) provided to 'well' families. The survey consisted of mostly yes/no questions with some listed options and one item usinga Likert scale. Respondents were typically married couples with an average of 1.4 children aged an average of 2.2 years, and were of a mid to high socio-economic status. Results yielded that: 93% of respondents reported that the program enhanced interac ions with their children in session; 91% said that it enhanced interactions at home; 49% indicated they had incorporated program activities into their everyday li s to develop new parenting strategies; 82% reported they had made friends, with 3 meeting with other program families outside of sessions.

McKenzie & Hamlett
These examples demonstrate some of the challenges that may arise in the evaluation of such music therapy programs including: the generation of large amounts of data taking many hours to annotate and analyse, the appropriateness of requiring highly vulnerable and highly stressed participants to complete any kind of self-report measure, the lack of pre and post test and/or control group des gn and, the inability of measures to capture change over time.
Historically, the Sing & Grow project utilized organic measurement tools, grown out of the clinical experiences and collaborative efforts of the clinicians involved with the project in its initial stages. The original funding co tract gave some guidance as to the outcomes the project was expected to achieve, but these were stated in broad, ge eric terms, and no specific data collection or evaluative frameworks were specified or suggested.
In attempting to incorporate more rigorous evaluation tools, the project team referred to published work in the area (as discussed above) and also devised a number of tools for use through a series of collaborative meetings and trialing of the documents with families participating in the program. These underwent several revisions but ultimately included a commencement information sheet (primarily used as assessment and program planning information and offered to all families on their first week of attendance) an evaluation survey that was offered in week 5 (mid) d week 10 (final) of each program to attending families a clinician observation checklist against 19 objectives and a follow-up survey conducted by phone call with a sample of parents 6 months post completion of their program.
Tools were devised keeping in mind the project was to provide services to at least Many concerns and criticisms of these tools were fed back to project management by project clinicians and other practitioners as well as researchers in the field. These centered on the fact that they did not provide a basel comparison (i.e. the same questions were not consistently asked across the initial information sheet and the evaluation survey), the data was considered 'soft' in that many questions were openended, asking for comments, or gave a yes/no option on y, rather than a more discrete scale option (e.g. Likert scale), and the observations made by clinicians were not tested for validity or reliability. Indeed the objectives to e reported on via clinical observation were so large in number that it was difficult to make accurate observations of all families within the group across all objectives each session.
On the positive side, as the surveys were strengths ba simple in nature, they were anecdotally experienced by parents as non-intrusive and user friendly. They were also easy to process and analyze given the yes/no structure employed and produced positive statistics at the project level, in lay language, that contributed to the success of further funding applications. For example, high levels of pare t satisfaction (100% enjoyment; 94% would like to participate again); a positive perception of the program's impact on parent-child relationships (70% reported feeling closer to their child); and a translation of activities to the home setting (87% used music for behavior management purposes at home) (W illiams & Abad, 2004).
The observation checklist and long hand exceptional reporting u ertaken by the program facilitators, whilst time-consuming, also provided very useful information in terms of case vignettes and 'good news stories' that were used to highlight family improvement in parent-child interactions, parenting skills and child development over the course of each program. For each program, survey respo and observational notes were collated into an evaluation report that took approximately 6 hours to complete.
Following the first 3 years of funding (and the subsequent granting of a further 3 years) a process of action inquiry was undertaken over a 3 month period to explore the options for future evaluation of the project. The cycles of action lection, and co-inquiry processes employed in the action inquiry paradigm (Ell s & Kiely, 2000), made it particularly suited to this endeavour and the setting. Project staff, community professionals referring families to the program, expert evaluators and participating families were all used as key informants and were involved in each cycle of action (adapting and trialing of various evaluation options) and reflection (considering their impact on clients and the work itself).

Action Inquiry
Rationale PRE PUBLICATION VERSION -SEE VOICES ONLINE JOURNAL 2006 VOL 6 (2) JULY 1.
The premise under which the project team undertook this investigation was finding a balance between utilizing evaluative frameworks that would provide data that held up to scrutiny by the scientific and research professions (a funding bodies), and ensuring that participating families (often very vulnerable families) were not inappropriately burdened or ostracized from participat in the program. The fact that the project, at that time, was expected to service a minimum of 600 families over 3 years (this has since been expanded to 2000 families over 4 years), from extremely diverse socio-economic and cultural backgrounds, was also held in mi d.
The project team began a consultative process with leading academics in the field of parent-child and family research, along with leading practitioners from the fields of ocial work and community family support. The accumulated pra ice wisdom and experiences of the project team over the last 3 years was also uti ized, with two senior project staff coordinating the action inquiry process. Two groups of families (parents and their children aged 3 years and under) were also formed to r present two significant areas in which the project had historically worked: young paren s and parents who had children with disabilities.
A search of tools used by parenting programs around th world was undertaken along with the suggestions of the professionals consulted. T yielded a list of pre-existing psychometric tools that were considered and reviewed for use including: the Emotional Availability Scales (Birigen, Robinson, & Ende, 1998), Knowledge of Infant Development Inventory (MacPhee, 1981), Developmental Observation Checklist System (Hresko, Mi uel, Sherbenou & Burton, 1994), Parenting Sense of Competence Scale (Johnston & Mash, 1989), Beck Depression Inventory (Beck & Steer, 1987), Post Natal Depression Inventory (Cox, Holden & Sagovsk , 1987), Parenting Stress Index (Abiding, 1995) and, the Depression, Anxiety and Stress Scale (Lovibond & L bond, 1995).
None of these tools alone measured the full scope of outcomes targeted by the project and all were found to have one or more of the following co cerns associated: level of language and comprehension required too high ucation level of participants, length of time required to complete tool too long, various levels of uncertainty as to whether or not when used as a repeat-measure, the tool could show change over 10 weeks, a range of applicability in regards to different ages f children (the project serviced families with newborns through to children turning f ur), varying levels of qualifications and time/resources required to administer, score and analyse, the deficit focus of the majority of items in conflict with the strengths-based nature of program delivery, the reliability and validity of the measure when applied to culturally and linguistically diverse groups and, access for non-English speaking participants.
Ultimately, two tools were chosen for trial with stron endorsement from one of the leading psychologists and researchers consulted. These were the Parenting Stress Index (PSI), and the Depression, Anxiety and Stress Scale (DASS).These instruments have been used in both clinical and research settings and there is a substantial body of published research attesting to the validity and reliability f the tool across a range of populations. The PSI in particular has been found to m intain reliability and validity across a wide range of ethnic and cultural groups.
The long form version of the PSI includes 120 items and 13 subscales, measuring across the 4 domains of total stress, child stress, parent stress and life stress. It is stated that the long form takes 20 to 30 minutes to complete and so th short form version was chosen for trial. This consists of 36 items to be answered on a 5-point Likert scale (mostly from strongly disagree to strongly agree) and includes such items as " I feel trapped by my responsibilities as a parent", "My child is not able t do as much as I expected" and "M y child makes more demands on me than most children". The tool is designed for parents with children aged 3 months to 10 years of age.
The full version of the DASS consists of 42 items, with the 21-item, short version (DASS21) chosen for this inquiry. The items are answered on a 4-point Likert scale from "did not apply to me at all over the past week" to "ap lied to me very much, or most of the time over the last week". Items include "I couldn't seem to experience any positive feeling at all", "I felt I wasn't much worth as a person" and "I felt scared without any good reason". The tool is designed to measure current state or change in state over time on the three dimensions of depression, anxiety and stress.
The organic measures previously employed by the project were also significantly revised through the consultative process in an attempt to address some of the concerns mentioned, particularly the ability to track change through repeat measure questions. This resulted in three tools being prepared for trial: Week 1 Survey, Final Week Survey and Clinical Observations. The offering of a survey at the mid point of each program (as per previously) was discontinued to cut down on the amount of data collected for analysis.
In order to trial the above measures, regular Sing & Grow early intervention music therapy sessions were conducted once per week for 6 weeks with the two groups of representative families formed, with feedback invited each week. Data was collected via documented verbal dialogue with parents and community staff involved in these groups, ongoing reflective discussion amongst project staff in olved in the process, reference back to and use of ongoing discussion with collaborating professionals (e.g. social workers, psychologists etc), and written feedback from participating parents and community staff.
Parents reported that they were comfortable completing these short surveys, but some did have difficulty with the question asking them to ident eir strengths as a family. This question had been suggested by a social worker, in ord r to better reflect the strengthsbased practice model used by the project, but it confused and confronted some families.
The use of Likert scale response options was a new addition to these tools and aimed to give more finite measurements, and to assist in tracking change. A random reversing of scales was employed, however, this caused some confusi with parents who did not carefully read the scale each time and were inclined t tick the far right or far left hand option regardless of its meaning. In order to combat this issue the extreme negative responses of each scale were printed in bold.
The inclusion of the question asking parents what they would like to get out of the project in week 1 was a useful one in terms of planning the project and gauging the parent's understanding of the program content and what they mig t benefit from. Various ambiguous answers that were given to this question also led project staff to realize that information sharing and communication with parents prior t mmencement of programs needed to be improved and so a brochure was designed and is now in good use across the project. This is an example of feedback gai ed from evaluation tools being used for immediate program improvement, whilst not necessarily ontributing to quantitative evaluative data reporting.
These 2 instruments were trialed in Week 1 of the two ction inquiry groups. General parental feedback on the tools included that they took too long to complete and parents were disappointed that this cut into actual session ti e. Many parents also commented that the questions were often irrelevant to the age of their children (babies). The young parent group were particularly vocal and reflective in their comments, identifying that they thought the DASS was "for people who are depressed" and it wasn't relevant to them. They also identified the negative and deficit focus of the tools, stating that completing them made them "feel worse than before". Staff from collaborating organizations who referred families to the program suggested that many families would have difficulty comprehending the language used, and felt that these tools were confrontational, intrusive and in conflict with the strengths-based ethos of the program provided.
One mother in the children with disabilities group, wh was known to have some mental health difficulties, became visibly upset whilst completing the forms, demonstrating that the choice of evaluation tool can undermine the project's goal of promoting confidence and wellbeing in parents. One young mother reported to e collaborating organisation that she felt the program facilitator was looking for nformation to accuse her of being a bad parent. This then lead to a situation in which the mother was resistant to therapy and was unable to develop trust and rapport with the facil ator, again undermining the therapeutic goals of the project. Logistically speakin , some parents completed the tools quickly whilst others took more time (up to 40 minutes), making it diff ult to keep the group moving and it was difficult to safely engage/mind children whilst their parents were concentrating on the instruments and asking quest ns of the therapist.

Results
In order to establish a true base-line data, it was recommended (by a psychologist) that participating parents complete the measures immediately prior to the first music therapy session (previously the organic tools had been complet d at the end of the first session). This lead to a feeling of intrusion with the families who had opportunity to develop rapport with the session leader before being r quired to answer very personal questions. Many parents, in particular the young parents, gave 'perfect' responses to both instruments, resulting in an above norm baseline dataset.
Whilst the administration and scoring of the PSI is able to be done by a non-psychologist, the skills for full analysis and interpretation of the data yielded requires training as a psychologist or related profession. It is unclear as to whether music therapy would qualify as a 'related profession' and what additional training would be required. It was also unknown as to whether these tools would indeed be show change in families over the 10 week period of programs.
The number of objectives which required observation by e session leader was reduced from 19 to 13 and were written in such a way as to war nt only a cross or a tick each session, with a space provided for a general description of each family's patterns of interacting in week 1 and week 10 in order to highlight, in a narrative way, any change. Several difficulties were noted by staff trialing this measure including: an inability to accurately observe all 10 families in regards to each objective, and therefore using more of an intuitive (potentially unreliable) approach to r ponding. They also noted that the language used in objectives could be interpreted in va us ways by different people as definitions were not always clear. These issues impacted on attempts to undertake interrater reliability tests. A further barrier to inter-rater reliability testing is that the person who facilitates the group has a different level of awareness when observing, than one who is there to observe only.
There was some concern noted by both project staff and ofessionals regarding the attempt to measure any change in child development ove a 10 week program, when children at this age (3 years and under) are constantly developing. Any changes over 10 weeks could not in fact be attributed to the efforts o he program, without the use of a matched control group which was outside of the capacit of the project. Given the diversity of developmental stages over the population served (birth to 3 years) it was also difficult to develop a set of objectives relevant to all participating children, with some developmental objectives not relevant at some ages, an too simple at other ages. There was also some concern as to whether the observation tool would capture subtle changes in families over the course of 10 weeks, as once famil had reached a 'tick' on objectives there was not capacity to show further impr vement. Indeed some families began programs with a 'tick' for a particular objective, but still improved considerably over the course of the program. Given the diversity of the cultural groups participating in the project (including Indigenous Australians, Samoans and Vietnamese families), difficulties arose concerning the different parenting values present in these cultures where overt affection towards children may not be used as ex ed and so may not be a valid measure of positive parent-child relations.
Although it was anticipated that the trial would include the parents completing both the PSI and DASS again at the end of the project so that data could be analysed for validity, given the action inquiry design of the trials and the ong resistance to the tools encountered and feedback to staff, flexibility was employed. The instruments were given to the young parents group during the final week with the instructions "please only complete what you are comfortable with". Each parent chose to complete the organic Final week survey and did not complete the PSI or the DASS. Given that all young parents in this group, gave 'perfect' responses to both instruments, it was anticipated that had parents completed the instruments again at the end f the program, once they had developed some trust and rapport in the facilitator, they may ha given more authentic responses, resulting in the data actually reflecting an increase in parenting stress and depression and anxiety symptoms in participating famil es. The instruments were not given to the children with disabilities group during the final week as the parents had been told that their feedback and opinions would be valued and that the nature of action inquiry was that things would be changed as their feedback was given. During the second last week parents also specifically requested that they not be given the instruments again. In searching for, trialing and devising measurement tools to be used for the evaluation of the early intervention parent-child music therapy project several concerns and important considerations arose including: The friction between the primarily deficit-focussed nature of many psychometric tools and the strengths-based approach taken in service delivery. This is not a new issue, with the increased emphasis on accountability and measured outcomes presenting challenges for strengths-perspective practitioners and programs for some time (Early, 2001); The inability of measures to track significant change over a short-term program when used as repeated-measures; The level of difficulty in terms of literacy and comprehension for respondents and the amount of time taken to complete, leaving less time for actual intervention; The appropriateness of using any tools normed on Weste opulations with culturally and linguistically diverse families; The reliability of observational data collected across a wide range of aims and objectives by the group facilitator who is also provid hands-on intervention with up to 10 families at once; The relevance and appropriateness of measuring changes in child development over 10 weeks as a measure of program success given the rapid developmental changes occurring in children in natural settings at t is age (3 years and under); The lack of one tool to comprehensively measure all aspects of a broad scoping program; and, The time, financial and human resources required to collect, analyse and interpret large amounts of data.

Discussion & Future Directions
Using an action inquiry approach in this instance illu nated many of the above concerns and allowed for in-depth probing into issues using a wide range of informants. It also allowed for participant's feedback to be immediately acted upon, with measures consequently adjusted and re-trialed as time allowed. Feedback regarding interventi ns employed and their value to families was also used for general program improv ment and development purposes. This approach, however, presented significant limitations itself to data collection, because in responding immediately to eedback in a cyclical nature, neither the PSI or DASS were able to be repeated in the final week of t programs. Therefore it could not be ascertained if these tools would show change over the short term. Given the often 'perfect' responses of parents in the first instance, it is unlikely that any valid results would have been garnered and in any case, project staff valued the action inquiry method, and the participants, enough to sacrifice this opportunity.
Since this inquiry was undertaken the project has rece ved further funding to expand nationally, with 10% of the budget allocated to evaluation. An external evaluation team from a leading university has been contracted and has collaborated with the project team to devise tools, which whilst based on valid, reliable and normed psychometric measures, do attempt to alleviate many of the concerns discussed in this paper. These will be reported on in the future.
The delicate balance between focusing on the provision of a quality parent child intervention program and the requirement to evaluate these programs remains in flux. Whilst it is of course best practice to evaluate the outcomes of any clinical work undertaken in any setting, the degree to which evaluat on methods impact on the work itself (and the clients involved), and whether or not the work is seen as primarily a research project or a service-provision project, has ramifications for the day to day workings of such initiatives. W hen the values that underlie the practice include a strengths-based approach, family empowerment and the use of creative methods, as in many music therapy programs, the matching of evaluative frameworks that will simultaneously withstand scientific scrutiny whilst up olding the main tenets of the program philosophy continues to be an ongoing (and challenging) journey.