Using the creativity support index to evaluate a product-service system design toolkit

: The design of product-service systems is one of the more recent evolutions in the field of design and innovation. The approach for designing products and services in an integrated way holds the opportunity for developing more value for the user and the entire value chain. Despite the existence of various PSS design tools and methods to optimise this creative development process, it remains unclear to what extent the full array of tools supports the design team in their creative work. In this paper, we present the results of four years of iterative evaluation of a PSS Design Toolkit deployed in a graduate education setting, using the creativity support index (CSI), a psychometrically-validated instrument. By using the CSI longitudinally, the results enabled us to iteratively improve the PSS Design Toolkit to better support future generation designers for the challenges that come with designing these product-service systems.


Introduction
Technological changes, social issues, the environment, health and well-being are examples of challenges being addressed by businesses and governments worldwide.Organisations are forced to adapt and explore new avenues for innovation to address these issues.One of these prominent avenues is coined as product-service systems (PSS).PSS aim for customer utility and added value by integrating products and services (Boehm and Thomas, 2013).PSS design requires a structured process in order to integrate tangible products with intangible services as early as possible and to secure added value throughout the process and the outcome.Besides the need for new and adapted tools, it is important that these tools support designers to creatively explore the design space of PSS.When it comes to evaluating PSS design, authors often use performance indicators to measure design process efficiency and conceptual solutions resulting from it (Mourtzis et al., 2015;Ness et al., 2016).With regard to creativity, literature shows that companies are striving for quick decisions, design ideas, and technical directions to follow (Rondini et al., 2016;Sassanelli et al., 2019).Unfortunately, this thinking process avoids the exploration of the problem space and a detailed analysis of ideas and possible solutions (Badke-Schaub, 2007).
Although evaluations based on elements closely connected to design creativity, e.g., fluency, flexibility, originality, elaboration and problem sensitivity, are encouraging (Kim et al., 2011), they still focus on the output, instead of supporting creativity in the design process.Furthermore, creativity does not necessarily consists of novel ideas, it can result from combinations of what is already known or an unusual juxtaposition of formerly unrelated ideas (Fulea and Brad, 2011).Fulea and Brad describe additional elements supporting creativity specifically relevant in the PSS design process, e.g., 'essential contributors and collective knowledge', 'understanding the problem', 'disseminating suitable information', 'knowing what success represents', 'filtering information from conscious awareness to the subconscious'.Regrettably, they do not provide any measures for these criteria.
The present study does not intend to evaluate the usefulness of an integrated PSS design approach nor the outcome of applying such an approach, rather we want to understand to what extent the PSS Design Toolkit (Dewit et al., 2018) supports creative work during the design process.The aim of this paper is also to illustrate the use of the creativity support index (CSI) (Carroll et al., 2009;Cherry and Latulipe, 2014) as a means for process evaluation.
In this paper, we first describe the PSS Design Toolkit, the educational setting in which it was tested and the purpose of using the CSI.Second, we present our research methodology and explain the materials and methods used for data collection.Third, we present our findings and interpretation of the six individual factors of creativity support: Exploration, Expressiveness, Immersion, Enjoyment, Results Worth Effort, and Collaboration.Finally, we discuss how the PSS Design Toolkit supports creativity on the basis of these six factors and suggest avenues for future research.

Research methodology
The evaluation of the PSS Design Toolkit using the CSI was part of a more integrated research project on PSS, leading to the development and evaluation of the toolkit (Dewit, 2019).The research project used a design-inclusive approach that typically goes through a six stage procedural approach (Horváth, 2008(Horváth, , 2013)).The general process of design inclusive research naturally decomposes to three parts: explorative research actions (top left of each design cycle), creative design actions (right side of each design cycle), and evaluative research actions (bottom left of each design cycle).The CSI was used in the validation phase of each design cycle (Figure 1).In order to evaluate the elaborated PSS Design Toolkit throughout the process, we used a multifaceted evaluation.Table 1 shows how and when the PSS Design Toolkit was evaluated, using a variety of different methods.We conducted two case-studies of toolkit use (Dewit et al., 2016(Dewit et al., , 2017)).Inquiries into the individual tools resulted in immediate usage feedback during research cycles 2 and 3 (Figure 1) and a follow-up inquiry was made by the same students one year after each research cycle.Weekly checks with the expert panel and a GPRC (guaranteed peer reviewed content) by a double blind reviewing commission selected by the publishing agency UPA (Dewit et al., 2018) finalised our evaluations.In this paper we detail how we used the CSI to triangulate toolkit evaluation.
Figure 1 shows the overall research process.The grey boxes represent the parts of the process where CSI was used and how it affected the following iteration.This iterative process ultimately resulted in the final version of the PSS Design Toolkit (see Figure 2).

The PSS design toolkit
A variety of tools and methods that support PSS design already exist (Baines et al., 2007;Costa et al., 2018;Vasantha et al., 2012).Although, previous research (Haase et al., 2017) has demonstrated that delivering value to the customer through advanced user experience and interaction, and providing support for both service and product integration, are distinguishing requirements for PSS design.Iteratively growing throughout the design process stages, design tools for PSS should enable their users to convey insights through different kinds of representations, helping them to grasp the complexity and define meaning for the value proposition.The PSS Design Toolkit facilitates stakeholders to diverge and converge upon ideas and concepts, making it easier to define tasks, responsibilities and benefits, and verifying the users' interest in the final solution early in the design process.Figure 2 represents an overview of the PSS Design Toolkit (2017-2018) and shows how the process is set up using different tools.The inner zones of the circle show how the different tools work together.Table 2 provides more detailed descriptions on the separate tools -shown in the outer zone of the circle-used throughout the three big phases of the PSS design process: 'understand', 'explore', and 'define'.Quite naturally, there are tools that relate more to divergent thinking and others for convergence.In this way, they enable constant discussion and convergence and how the results of one tool are used to fuel another.For divergent thinking, we introduce tools such as the 'stakeholder dimensions' (understanding the perspectives, needs and expectations that will influence your future PSS), 'paradoxical thinking' (generating unusual viewpoints of a problematic situation to achieve solutions for the whole, it is about AND thinking instead of OR thinking), and the 'lotus blossom' (creativity technique for finding ideas by means of lateral thinking).Convergence-focused tools include the 'design challenge' (which is based on all gathered insights -a deep understanding of the system, context, interaction, rational, emotional and (non-) technical requirements -and formulated in one sentence: "who does what, with what/who to achieve"), the 'selection matrix' (evaluation criteria, discussing the relative importance of each idea, weighing the impact on the user, and value for the client), and the 'PSS map' (a visual representation of the future system, helps discussing operational validity with all stakeholders).

Design research and educational setting
The On a weekly basis, student teams present a short, personalised visualisation of their project using the PSS Design Toolkit templates, which allows for a comparison between the different student teams' design process and progress.
As a result of our research approach, the methodology and the tools have evolved throughout the research project.Based on continuous evaluations, we have been able to gradually build-up a collection of interlinked tools to explicate and support the process of PSS design.Figure 3 provides an overview of the evolution of the toolkit's content.

Creativity support tools
Creativity is critical for the advancement of society, both from the standpoint of economic development through innovation and in terms of individual mental well-being and personal development (Amabile, 1996;Csikszentmihalyi, 2013).Creativity support tools (CSTs) or creativity support environments (CSEs) are important because they help individuals and groups engage in scientific, engineering, humanist, and artistic endeavours (Latulipe, 2013;Shneiderman, 2007).

The creativity support index
Creativity does not have well-defined or agreed upon metric, so measuring how well a design tool supports creativity is very challenging.With a wide range of relevant definitions and theories, there is no single agreed-upon methodology for evaluating creativity.This makes it particularly difficult to evaluate the effectiveness of tools that support designers during a creative process.A primary goal of the research reported in this paper is to understand which part(s) of the PSS Design Toolkit worked better than others.Therefore, we chose to use the CSI (Carroll et al., 2009;Cherry and Latulipe, 2014) which is a psychometrically-validated survey designed to assess the ability of any tool to support the creative process of its users.Its theoretical foundation is based on literature from the fields of psychology, business, engineering and human-computer interaction.Creativity is investigated in relation to play, notions of expression and creative flow, and CSTs, including Shneiderman's design principles for CSTs (Shneiderman, 2007).The CSI is one tool in the toolbox for researchers involved in the design and creation of CSTs or environments.Not a single evaluation metric is likely to capture all aspects of creativity.The CSI focuses on how well a tool supports the user(s) during their creative work process.It does not evaluate the outputs of the process, nor is it a measurement of individual creativity.The CSI is meant to be used in combination with other metrics (which may focus on those other aspects).(Frich et al., 2019).The CSI has been used to evaluate comic strip composition tools (Mencarini et al., 2015), the 'Mural' design thinking tool (Lattemann et al., 2017), and most recently to evaluate support for human exploration of reinforcement learning parameters (Scurto et al., 2021).In a particularly relevant recent study, Ardito et al. (2020) used the CSI to conduct a comparative analysis of creativity support of three different design frameworks for smart environment design.

Survey administration
The CSI generates a creativity support score that represents how the tool supports the user in a particular creative task.This means that different scores may result from different tools, different user types, or different tasks.Typically, the CSI is best assessed holding two out of three parameters constant.Our research addresses first year Masters students engaged in the PSS design course across different iterations of the PSS toolkit.Thus, the user type and task type remain invariable and we look longitudinally at iterations of the toolkit.
The first section of the CSI survey consists of scoring statement agreements (SA) for the following six factors: Collaboration, Enjoyment, Exploration, Expressiveness, Immersion, and Results Worth Effort.For each factor, there are two SA.The inclusion of two agreement items for each factor increases statistical power by providing reliability data for each factor (researchers can calculate similarity between the two different statements).The SA are shown in Appendix 1. Participants respond to each statement using a 'Highly Disagree' (0) to 'Highly Agree' (10) scale.Research participants complete this section, responding to two different statements for each factor (though they do not see factor names or know there are two statements that represent the same factor).A higher factor score indicates that the tool being studied better supports that aspect of creative work.The second part of the survey, a paired-factor comparison, consists of each factor paired against every other factor for a total of 15 comparisons.When presented with each pair, a user will choose a factor description in response to the following statement: "When doing this task, it's most important that I'm able to..." (Table 3).In these comparisons, the participant is asked which factor in a pair was the most important to them for the activity that they just completed.By reporting which aspects of creativity are most important to them in this particular task, the CSI factor priority counts (PC) allow for a CSI calculation that is weighted by the most important aspects of creativity in this task.The factor PC themselves are useful data, because they indicate which aspects of creativity are most critical in a particular task.Within the scope of this paper, this factor counts represent how participants would indicate factor importance when using either the PSS or 'any other ad-hoc tools to address the same task' (Carroll et al., 2009).

Table 3
The paired-factor comparison statements Source: Carroll et al. (2009) Besides the individual factor scores (from the SA) and the factor counts (from the pairwise comparisons), a single CSI score out of 100 is calculated, with a higher score indicating better creativity support (see Figure 4).The CSI is scored by first summing the SA for each factor to get a factor subtotal.Each factor subtotal is then multiplied by its factor count (i.e., the number of times it was chosen in the factor comparisons).Finally, these are summed and divided by three for an index score out of 100.

CSI usage scenarios
The CSI can be employed in a variety of ways within research studies (Latulipe, 2013).
The three scenarios that are applicable to this research include longitudinal studies, using the CSI for non-comparative studies, and using the CSI individual factor ratings to drive iterative improvement.For longitudinal studies, we were interested in looking at impacts of an iterating version of the toolkit with the same type of participants.In terms of noncomparative studies, a researcher may not be doing a comparative study of multiple CSTs but may still calculate the CSI for a single tool and report it as a comparison metric for other researchers studying similar CSTs.We report the CSI results here so that others who develop PSS tools can then compare how their tools work to support creativity and use our results as a benchmark.In terms of individual factor ratings, researchers can examine differences between factor scales to better understand how a tool supports different aspects of creative processes in the domain task.The results we report here shed light on which creativity support factors are most relevant when engaging in PSS design and helped to inform iterative design of the toolkit.

CSI administration in the design course
We administered the CSI using the executable jar file application designed for research experiments. 1Participants could easily complete the CSI within five minutes.The CSI is meant to be a fast metric.In all of the psychometric evaluations during the development of the CSI, users were asked to respond to the metric immediately, not after measured, timely consideration.The application scores each test automatically, saving the results in a comma-separated file, labelled with participant ID and condition number.Because the students worked in teams in the design course, we asked them to complete the CSI together as team (n = 66).Part of this choice was to ensure participation -if we had asked everybody individually (n = 197) to complete the survey, many probably would not have bothered.By giving it to them as a group activity, it was perceived more as part of their course, and that helped to ensure a relatively high response rate over the four years, with fifty-eight out of sixty-six student teams completing the CSI, for an 87.8% response rate.Also, the decisions they made as a group during the course influenced the way they felt supported and should be reflected in the CSI as well.The students also reported to the instructor that discussing the CSI survey served as a good meta-learning reflection tool.

CSI interpretation
Individual CSI scores reflect individual differences, and are less useful than scores that have been aggregated.In our work, we have aggregated across all the students in each year, as they were exposed to the same version of the toolkit.We have then compared across years, looking at iterating versions of the toolkit, to understand how the iterative versions of the PSS supported the creative design process.
The Statement Agreement score (SA) is the sum of both SA responses for each factor.The agreement factors are scored between 0-10, so the maximum factor score for each factor is 20.Higher scores indicate that the tool better supports that aspect of creativity.The Priority counts (PC) are the paired factor comparison counts and represent the number of times that teams chose that particular factor as important to the task.The highest possible PC for any particular factor is 5, indicating that participants chose it as more important than every other factor.We also report an overall CSI score for the PSS Design Toolkit so that this toolkit can be compared to other emerging toolkits in the future.However, the SA and the PC are the central source of data that we were most interested in, for the purposes of potentially improving the toolkit in the future and to better understand what users expect from such a toolkit.

CSI aggregated results
In Tables 4 and 5, the aggregated CSI results are summarised for the four iterations of the PSS Design Toolkit.Results for the four years are presented in order, left to right, with aggregation across the four years presented in the last section on the right of the tables.In Table 4, the factors are listed top to bottom, in descending order by PC, ranking the factors from the most important to least important, as judged by the respondents.In Table 5, the overall CSI score is given.We evaluated the variability of the results between the years.Table 4 shows the averages of the SA and the PC by year.We ran an ANOVA analysis to test the hypothesis that the scores are the same over the years.A post-HOC analysis with the Tukey HSD test was performed to identify the significant differences between the years on the collaboration and results worth effort PC.
For Collaboration priority only year 2011-2015 scores significantly different (α = 0.05) and higher than in the other years.For results worth effort priority only year 2014-2015 scores are significantly lower (α = 0.05) than the scores in year 2016-2017.Between the other years all these priorities are stable with p-values > 25%.
Table 8 shows the ANOVA table for the CSI overall score.This shows stability of the CSI score over all four years with a p-value > 50%.With respect to the reported results on SA, PC and the overall CSI score, we conclude that the values reported in Tables 4 and 5 are stable over the years.The overall average CSI score (reported in Table 5) of 57 is expected to be a stable benchmark for this tool.It can serve as reference when the CSI metric is used to evaluate the applicability of the PSS Design Toolkit in other cases and in other circumstances or for comparison with other PSS design process supports.[54;60] is the 95% confidence interval for the average CSI.It should be noted that the variability of a single measurement, as indicated by the standard deviation, is significantly larger than the confidence interval we established for the average CSI measurement.

Factor support vs. factor priority
As results are stable over the years, we analysed the average scores over the four years and compare how well the actual SA scores of a factor matches the PC given by the teams.
Figure 5 shows the ranked position of the six factors with respect to the average PC (expected) given by the 58 student teams on the horizontal axis and with respect to the actual performance score (delivered) given by the 58 student teams on the vertical axis.On the horizontal axis, the factors are ranked from left to right by increasing PC-what student teams find more or less important in accordance with what they expect from any given PSS Design Toolkit.The vertical axis ranks the factors from bottom to top by increasing SA scores, indicating to what extent the PSS Design Toolkit being studied delivers support to that specific aspect of creative work.This shows the degree of fit between the expected and the delivered support given by the PSS Design Toolkit.Factors in the upper left corner of the graph ('zone of exceeding expectation') exceed expectations by 2 ranks or more, whereas factors in the lower right corner of the graph ('zone of underperformance') perform 2 ranks or more lower than the priority assigned to that factor by the users of the tool.The graph shows a rising profile within the 'zone of expectation' which indicates that the tool is fit for the demands of the given design task, as higher PC match higher SA about tool support for each factor.The dots represent the intersections of the expected and the delivered factors, the grey area around it shows the range of the confidence interval for that factor.In case of using a graph with factor values (Appendix 2), the below tables would provide the same information, but be less explanatory.The ranks make it easier to compare the factors and how they relate to the PSS Design Toolkit.

CSI individual factor results
In this section we interpret the results for each individual factor and how results from the CSI prompted improvements in the toolkit over the four years.

Exploration
Exploration is critical and is seen as the most important factor by student teams engaging in PSS design.In the pairwise comparisons, the average PC for exploration ranges from 4.00 to 4.50.This means that participants strongly agree that tools for PSS design should provide the necessary support for creativity by making it easy to explore different possibilities, try out new ideas and outcomes.Thus, in each phase of the design process, we made an effort to increase the number of possibilities, different ideas and outcomes that students would consider.In 2014-2015 we added tools like framing, system mapping, solution areas, free brainstorming, and the touchpoint matrix.In 2015-2016, we added tools like leverage points (which later became the intervention strategy), factors and themes, and paradoxical thinking.Business ideation canvas, and value proposition tools were added in 2016-2017.These tools were added to increase the possible variety of ideas and the search for alternatives.
The average SA for Exploration increased from 12.06 to 13.50 over the four years.This is a reasonably high agreement and it indicates that the users could explore many different ideas, options, designs, or outcomes, using the PSS Design Toolkit.This suggests that the PSS Design Toolkit does a good job of supporting exploration, which is important because the PSS Design Toolkit was created to engage designers in deep exploration during the front-end of innovation.

Results worth effort
Results worth effort captures the tradeoff in complexity of the tool, how much work is required by the tool and the quality and variety of things that can be produced using the tool.There are tools that support specific parts of a design process, but they do not allow a person to do very much.The CSI is one of the only creativity metrics that captures the fact that the PSS Design Toolkit takes a lot of effort to use.It is very complex but it also allows users to accomplish a lot.Student teams have many exercises and tools they have to go through.This may be overwhelming and frustrating during the process but in the end, it leads to a comprehensive PSS solution that has been thoroughly designed and implemented.Therefore, it is interesting to see that results worth effort comes out as an important factor.The average PC for the results worth effort factor started in the first year around 2.25 and gradually built up to around 3.69.It was the second most important factor in the first, third and fourth year, demonstrating its importance to the students.In 2014-2015, we noticed a substantial decrease of this factor score, and we therefore made three choices for upgrading the toolkit in 2015-2016.We added a number of tools to help make the results more valuable: rich pictures, meta examples, interaction mood boards, narratives, multiple types of prototyping, and the final stakeholder test.A second initiative involved formalising all the tools into a coherent package for all students to work similarly throughout the entire process, giving the output a more professional look and feel.Thirdly, we moved from paper templates of their final designs to exhibition-like printed boards and also created a final event for stakeholders to participate in, including a more externalised, formal and consistent way of presenting all the material.All of this was designed to make the results more valuable, and thus make the effort required feel more in line with the results produced.
Even though the PSS design process takes a significant amount of time, and using the PSS Design Toolkit requires many steps, the SA scores (going up from 8.75 to 12.63) show that the student teams appreciated the plug and play approach of the newer iterations of the PSS Design Toolkit.The groups were increasingly satisfied with the results they obtained using the PSS Design Toolkit.

Expressiveness
The PC for Expressiveness ranged from 2.6 to 2.92 which indicates that this is of moderate importance to student teams engaged in PSS design.Based on a comparison of the factor scores and the PC in the first year, we felt the need to increase support for expressiveness, therefore we added tools such as mood boards, (LEGO) serious play, and storytelling techniques before the 2014-2015 academic year.And in 2016-2017, we added multiple types of prototyping (appropriate-, provocative-, low-, medium-, and high-fidelity prototyping, make believe and the use of metaphors), again to enhance the level of expressiveness.
The high PC for this factor indicates that the teams expect to be able to express ideas clearly while doing PSS design.Over the four iterations, including the changes detailed above, the PSS Design Toolkit grew correspondingly in its SA score from 10.92 to 11.46, indicating that the groups became more satisfied with the way the tools in the PSS Design Toolkit enabled them to be expressive.

Collaboration
Our PSS Design Toolkit was designed to support collaborative work.The PSS toolkit engages users in a long, complex process and collaboration is absolutely essential.The PC average count for collaboration increased from 1.8 in Year 1 to 2.3 in Year 4, indicating that as the toolkit expanded, collaboration in using the toolkit was seen as increasingly important.Since the earliest development of the PSS toolkit, the integrative, multidisciplinary approach has been supporting collaborative effort and stakeholder involvement.Group dynamics and co-creation between future users and potential service providers are considered important when using the PSS Design Toolkit.The average groups' SA ratings for Collaboration are fairly high, with scores from 11.25 up to 12.65.This shows that the PSS Design Toolkit enables the group to share ideas, designs and work easily in teams.However, after collaboration received a slightly lower (factor) score in 2015-2016, we decided to add tools like concept mapping to promote earlier collaboration.We also note that the collaboration factor had the highest variance in pairwise counts, with a particular change in year 2. We delve into the anomalous year 2 data in the discussion section.

Enjoyment
The PC average factor count for enjoyment is low, ranging from 0.50 to 0.99 over the four years.This indicates that enjoyment is not particularly important to student teams when engaged in a PSS design course.The teams do not expect to be happy to use this system or tool on a regular basis.This makes sense, as this is a creativity task that is also and mainly a work task.
Despite the fact that following a PSS design course is not something people do just for fun, we felt it was still important that the task be as enjoyable as possible.In the design of the PSS toolkit, we iteratively investigated ways to enhance the visual aspects of the tool design and make the interaction more fun.To get the students more satisfied with the way they were engaged in the use of the PSS toolbox, we switched from a digital document to a tangible book & toolkit.The final form and downloadable version of the PSS Design Toolkit is in vector format and can now be manipulated more easily by the groups.The student teams scored the SA for the factor enjoyment at a moderate level in year 1 (10.24), but that score went up over the four years to 12.50 in year 3 and 11.56 in year 4. Again, we note that the rating in year 2 was anomalously low at 7.67, indicating that year 2 had some peculiarities in the course project process.We note that because of lower scores early on, we were able to make iterations.Later scores indicate that the teams enjoyed using the toolkit more after the iterations.

Immersion
Immersion appears to be the least relevant factor in the CSI for PSS design.The student teams were not particularly 'immersed' when using the PSS Design Toolkit.The average PC for immersion was quite low (increasing from 1.56 to 2.52), which suggests it is not important for groups engaged in the PSS design course.The SA scores were also not particularly high across the iterations (ranging from 5.62 to 8.02).The teams want PSS design tools to support creative design work, but do not appear to expect a fully immersive experience during this work.This makes sense, as there is no immediate feedback loop in PSS design process, as you could assume for mixing music, sketching, or other creative endeavours.In those endeavours, losing track of time because of deep immersion in the creative process would be more expected.However, when going through the PSS Design Toolkit, the teams have to keep track of their time in order to consciously plan and execute the PSS design process steps.Their attention is fully tuned to the activity, but they are never immersed in such a way that they forget about the PSS Design Toolkit they are using.

Discussion
We have shown that the CSI research results help frame a consistent story for how the PSS Design Toolkit works to support students engaged in a PSS design course.The CSI is powerful because of the way it captures individual factors of creativity instead of treating creativity as one integrated parameter.This allows us to collect and analyse data about the particular factors that are most relevant to support the creative work processes in PSS design.The six factors addressed in the CSI survey are drawn from the creativity literature and they are often considered relevant in creative processes across domains.However, not every factor is relevant for every domain, which is why the CSI survey includes a weighting system.The factor pairing questions generate a factor weighting, which allows the factor ratings to be weighted appropriately for the task/domain, in this case, PSS.This means that if, in a particular creative process, one of the factors is not really relevant, the ratings for that factor do not affect the overall CSI score.We noticed that in PSS, immersion and enjoyment were not considered highly important factors, and so the ratings on those scales contribute less to the overall CSI score.There may be other factors that are highly specific to a particular domain that may not be captured by the CSI.That is why the CSI is meant to be one tool in a toolbox of creativity evaluation instruments.
The PSS toolkit evolved during the study period and one can imagine that as features are added to a tool (or tools added to a toolkit), expressiveness might increase.However, satisfaction could decrease if that expressiveness comes at the expense of a more complicated, and less easy to use tool.These are precisely the interesting tradeoffs that the categories of the CSI are meant to illustrate.What we saw in our study was that expressiveness and results worth effort both increased, suggesting that the addition of tools to the toolkit had a positive effect.
Exploration, Results Worth Effort and Expressiveness were ranked as the most relevant factors to this PSS design approach.Students engaging in this PSS course have an elaborated task that requires a lot of creativity, thinking and experimentation, so Exploration is really important.Due to the comprehensive approach, the Results Worth Effort is a really important parameter as well.Another essential part of the PSS Design process is to visualise the outcome during the different steps and communicate with multiple stakeholders, which is represented by the factor Expressiveness.The fact that the results for Expressiveness were consistent over the four years of study, shows that the PSS design process the students are engaging in remains similar and they are understanding it in a similar way, even as the tool support was iteratively changing over the four years.
The results around Collaboration were interesting.While the student teams agreed that the toolkit supported collaboration well (and this is not surprising, given that the toolkit was designed to be used by teams), the student teams did not rank Collaboration as a particularly important factor.As noted by Howe (1999), there is a strong tendency for people to believe in the myth of the creative 'lone wolf' and it is possible that these students also felt that the collaboration support was less important and that the creative inspirations were more likely to come from individual genius inspirations than from collaborative effort.
The CSI results show that the student teams perceived the tool to be helpful in the PSS design process.Given that this data is based on self-reports of student perceptions, we acknowledge that it is possible the tool did not help them at all, and they only thought it did.However, an analysis of our previous two case-studies (Dewit et al., 2016(Dewit et al., , 2017)), the inquiries into the individual tools (Dewit et al., 2014) and the follow-up inquiry [online results page] provide convincing evidence that the PSS Design Toolkit is helpful in the process.Some of the most pertinent statements come from a student team in 2015-2016 (see Figure 3), demonstrating the effectiveness of the 'matured' toolkit: 1 "The PSS toolkit cannot be seen independent of the design process in order to achieve meaningful innovation.It enables us to comprehend the entire context and expectations of all possible stakeholders"; PSS Design Toolkit forced them to do extensive research in the 'understand' phase which resulted in a clear vision of goals.In hindsight they realised that this prolongation stimulated a more thorough comprehension of the system, meaningfully adding to the 'explore' phase (Dewit et al., 2017).

Conclusions
The CSI helped us to understand how well the PSS toolkit supported the creative processes.We analysed the data over the years and the differences between the years are hardly or not at all significant.More striking is the coherence between the PC and the SA: what users find valuable seems to align with the actual agreement scores.We have corroborated the CSI results with other evaluation results in order to create a more detailed and nuanced picture of creativity support provided by the PSS Design Toolkit.
Together with the CSI results, our evaluation methods focused on what parts of the PSS Design Toolkit perform better than others and whether or not process steps or even specific tools were missing.The CSI was helpful in the iterative development of the tool and could be useful to others developing creativity support toolkits in other domains.

Study limitations
The design teams took the survey over four different semesters and the CSI shows consistent aggregated results.Though, we acknowledge that these results might be biased for following reasons: 1 The CSI survey was completed by teams as opposed to individually (n = 197).This felt appropriate given the fact that the students used the toolkit together as a team, but it is unclear if the results would have been different if the students had completed the survey individually.
2 In the second year, the scores for collaboration and results worth effort seem to have switched places and show a different weighting opposed to the other three very similar years/iterations.This might indicate that there was higher variance across groups on this particular scale, or results might tell something different when it comes to the standard deviation (SD).However, neither the variance across groups, nor the standard deviations across groups showed significant differences.To ensure that collaboration was successful, we had each team participate in two peer reviews throughout the process, and proactively intervened when groups were not working well.It is possible that dysfunctional teams may have rated the tools less positively, resulting in a lower CSI score or on the factor score related to results worth effort.
3 The CSI was typically completed before students received final grades.Thus, students might have felt pressured to rate the PSS highly since the creator of the toolkit was their teacher.However, the students were assured that the CSI scores would not be looked at before final grades were submitted.
4 This particular study is focused on evaluating the PSS with novice users, a different study with a different population would be needed to study the impact on PSS design creativity for expert PSS designers.
5 The CSI is a general instrument for measuring how well design tools support the creative process.Thus, the CSI does not evaluate specific features that are particular to PSS (such as enabling user-centeredness or early interdependence between the product and service).

Future research
Our evaluation of the toolkit's efficacy using the CSI helps to demonstrate the ways in which the toolkit supports creative PSS design.There are a number of ways the CSI can support study of the PSS toolkit in the future.CSI metrics could continuously inform refinements for the PSS toolkit, focusing on the capacity to support an immersive process and by tweaking some of the individual tools to make them more playful.We could learn considerably from others implementing the PSS Design Toolkit in teaching and using the CSI to validate the results.Likewise, it would be helpful to see how this PSS Design Toolkit compares with other sets of tools for PSS design and the CSI would be a good metric for undertaking such a comparative evaluation.It would also be interesting to see how the toolkit supports the creative work of professionals (rather than students) engaging in PSS design.Obviously, there is a difference between using and evaluating a design toolkit in an educational setting and an industrial setting.Although we have to be careful with generalisation of the research results, the PSS Design Toolkit is being applied in projects beyond the scope of this paper and its educational setting, and so we aim to assess the applicability in design agencies, industry and governmental institutions.

Figure 1
Figure 1 Research process

Figure 4
Figure 4 CSI formula

Figure 5
Figure 5 Factors plotted by ranking agreement of support (y-axis) against priority (x-axis)

Table 1
PSS design toolkit evaluation methods used in each academic year

Table 2
PSS design tools (2017PSS design tools ( -2018) )and descriptions throughout the PSS design process Results Worth Effort, and Collaboration.The overall CSI score indicates how well a tool supports creative work overall and the individual factor agreement scores indicate which aspects of creativity support are well supported or need to be improved.This is particularly useful for designers and developers who are tasked with the redesign or improvement of a design tool.The CSI is becoming a widely used metric across a variety of domains to measure creativity support, and is called 'one of the most notable attempts' to evaluate CSTs

Table 4
Factor statement agreements, priority counts per year and aggregated over four years in the final column Table 6 shows the ANOVA table for the SA.For Enjoyment and Results Worth Effort, the ANOVA table reveals a different score over the years.For the other factors the ANOVA tableshowsa high level of stability in the measurements with p-values well above 30%.A post-HOC analysis with the Tukey HSD test was performed to identify the significant differences between the years on the Enjoyment and Results Worth Effort scores.For both scores only year 2014-2015 is significantly different (α = 0.05) and scoring lower than in the other years.Between the other years Enjoyment and Results Worth Effort scores are stable with p-values > 30%.

Table 6
The ANOVA table for the statement agreements over the years Table 7 shows the ANOVA table for the PC.For Collaboration and Results Worth Effort the ANOVA table reveals a different PC over the years.For the other priorities the ANOVA table shows a high level of stability in the measurements with p-values well above 30%.

Table 7
The ANOVA table for the priority counts over the years

Table 8
The ANOVA table for the CSI overall score over the years