Introduction

Background literature

Both in the educational environment at various levels and in the workplace, traditional models of skill development have been changing; one area still ripe for reform is the development of skills related to teamwork and problem solving. In this regard, collaborative problem solving (CPS) is a more promising area of investigation than individual problem solving, since the former includes, among others, the following potential advantages: effective division of labor; the ability to draw on multiple people’s knowledge, viewpoints, and experiences; and the possibility to improve solutions through mutual feedback (Organisation for Economic Co-operation and Development [OECD] 2013). Further, with the recent developments in web technologies for information sharing and communication, collaborative problem solving online is likely to become a more important means of delivery in many fields. Davis et al. (2011) identified virtual collaboration as one of the ten key skills for future workplaces; and 94 % of 921 industries in North America and Europe utilize or plan to utilize web-based technologies, including e-mail, videoconferencing, instant messaging, and others, to facilitate collaborative problem solving (OECD 2013). Therefore, facility with web-based CPS skills will be an important advantage for those applying for a job in future workplaces.

More and more countries, including for example Singapore and Israel, are becoming concerned about fostering key skills for the twenty-first century and are in the process of conducting curriculum reform to address this issue; collaborative problem-solving skills have been one of their major concerns (Darling-Hammond 2011). Greene (2011) believes that the emphasis for the past 40 or 50 years on teaching students how to solve questions rooted in a particular subject matter domain is mistaken, and that the focus should not be on reaching a solution per se, but on the process of problem solving, using a collaborative approach. This, Greene says, will lead to various changes in the education system; for example, the traditional role of subject matter will change, and as far as pedagogy, the most important thing will be not how teachers “teach” their students, but how they “communicate with” their students and try to help them solve problems collaboratively.

Since the collaborative problem-solving skills are important to our students (Darling-Hammond 2011; Greene 2011), how to explore students’ performance in collaborative problem-solving skills is an important issue. With respect to the development of a system to assess collaborative problem-solving skills, two important issues are worthy of further exploration as follows: (1) What are the merits of using a computer agent (rather than a human agent)? Many scholars (e.g., Johnson and Valente 2008; Rosen and Tager 2013; VanLehn et al. 2007) used a computer agent in training or evaluating, but further exploration is needed on the best way for using a computer agent in assessing students’ collaborative problem-solving skills. (2) There is a need for a sound theoretical basis for the design of the collaborative tasks. The OECD developed a matrix of collaborative problem solving (OECD 2013) that is promising and we apply it here, but recent evidence is that it has been difficult for researchers in designing collaborative tasks and assessing students’ performances using the matrix of collaborative problem solving. For example, Kuo (2014) developed a collaborative problem-solving system, but the tasks were simple, and there were difficulties in assessing the students’ performance based on skills derived from the matrix of collaborative problem solving. That is, a more effective design of the collaborative tasks is needed for further exploration. In addition to these cited challenges in assessing collaborative problem-solving skills, Kuo and Wu (2013) also found that only 19 out of 66 computer-based assessments took advantage of dynamic and interactive media for item presentations. Therefore, in light of these issues and challenges, our goal was to explore more fully on how to design and assess a good collaborative task, while using the computer as the agent during collaboration, including a dynamic and interactive environment, to assess students’ collaborative problem-solving skills.

Rationale for the study

In order to construct a dynamic and interactive environment, this study included the design of a series of collaborative problem-solving tasks that are related to junior high school students’ daily life experiences, instead of adopting traditional subject matter-based problems. By incorporating tasks more grounded in daily life experiences, we hope to increase the motivation and engagement of the students during the CPS tasks. In order to solve these real-life problems collaboratively, students have to use science, technology, engineering, and mathematics (STEM) knowledge, as recommended in modern STEM education educational systems for the twenty-first century (Bybee 2013). To sum up, this study develops and applies an assessment system for evaluating students’ collaborative problem-solving skills in STEM education using a web-based delivery system and the computer as the collaborative agent. The study employs the collaborative problem-solving framework proposed by the Organisation for Economic Co-operation and Development (2013) as a theoretical basis for the design of the CPS tasks and CPS skills included in the assessment system, and we have analyzed its effectiveness when applied among 222 junior high students in Taiwan. The assessment system has eight modules each based on a different situation in daily life; and a computer agent (or two agents) that interact with students (simulating human kinds of verbal interactions) in order to assess their collaborative problem-solving skills.

Research questions

Two research questions guided the study: (1) Is there evidence that the eight modules have good validity in assessing junior high school students’ collaborative problem-solving skills? (2) What are some of the difficulties in developing an assessment system for collaborative problem solving in STEM education?

Related work

Collaborative problem solving

Scifres et al. (1998) explored the effectiveness of different web-based technologies for the development of students’ collaborative problem-solving skills; the results showed that although participants showed greater skill gain than with traditional learning, participants were not satisfied with their team members and that most felt they each made more effort than the other team members. Findings such as this led other studies to conclude that supporting collaborative learning activities and procedures requires specialized tools instead of just standard, web-based technologies (Harasim 1999; Isenhour et al. 2000).

In recent years, more and more studies have tried to use computer-supported approaches to help students’ learning, in particular their attention and their attitudes toward collaboration (Roschelle et al. 2010; Warwick et al. 2010). For example, some studies have focused on developing systems to solve open assignments by supporting collaboration (Looi et al. 2010; Nussbaum et al. 2009), while others have adopted interactive whiteboard technology to try to facilitate student interaction and collaboration in order to help them solve problems actively (Warwick et al. 2010; Wood and Ashfield 2007). Although these approaches have had some good results, Alvarez et al. (2013) argue that further experiments and assessments are still needed. In a similar spirit, the work of Ding (2009) tried to visualize the sequential process of knowledge elaboration in computer-supported collaborative problem solving, and the results showed that there were three different collaboration patterns that emerged in terms of joint and individual knowledge elaboration. Because the number of participants in that study was limited (only six), at least two issues are likely to be important for further exploration: (1) more patterns or mechanisms may have been revealed if more participants were included, and (2) with a stronger data set, the correlation of the elaboration patterns with the learning performance could have been explored. Kuo et al. (2012) utilized a cognitive apprenticeship approach to facilitate students’ collaborative problem solving, and Witken’s field dependence theory to explore their performance. Kuo et al.’s (2012) results showed that learners with a field independence cognitive style had better learning performance in a collaborative cognitive apprenticeship. Kuo (2014) also developed a collaborative problem-solving system in assessing junior high students’ collaborative problem-solving skills, but the tasks in that system were focused on students’ school life, e.g., students had to make an exercise plan in their physical education. However, students were not able to explore their performances in relation to the matrix of collaborative problem-solving skills because the design of the tasks was too simple in Kuo’s (2014) collaborative problem-solving system.

Based on the insights from these previous studies, and some of the needs they revealed for improved designs, our study aimed to develop a web-based collaborative problem-solving system that is more clearly situated in theory based on the OECD matrix, and includes a sufficient range of tasks, with a much larger number of participants, to potentially provide a more robust assessment of the web-based CPS learning experience. The use of the computer as agent adds an additional novel component that we hope will provide further insights into the potential for this kind of computer-based affordance in helping students learn and reflect upon their CPS skills.

Computer agent

A computer agent is a computer-simulated participant with the ability to propose goals, execute assignments, communicate messages, respond to other participants’ messages, detect and adapt to environments, and to learn (Franklin and Graesser 1996). There are many systems using computer agents for training or evaluating, across fields such as using computer agents to find solutions collaboratively (VanLehn et al. 2007); developing a reading, writing and communicating system (McNamara et al. 2007; Johnson and Valente 2008); developing an inferential problem-solving system (Azevedo et al. 2010; Biswas et al. 2010; Cai et al. 2011); and so on.

However, additional insights about how to apply computer agents in the design of an inferential problem-solving system to assess and develop students’ collaborative problem-solving skills is a topic worthy of further investigation, especially given the limited foci of some current approaches that lacked computer agents. For example, Azevedo et al. (2010) measured learners’ cognition and meta-cognition in the learning process using multimedia, and Biswas et al. (2010) used a social interaction approach to measure skill at self-regulation of learning. Cai et al. (2011) focused on integrating the assessment of users’ input into an intelligent teaching system. None of these previous studies applied computer agents in web-based collaborative problem solving, per se, and the design of a dynamic and interactive environment for evaluating students’ collaborative problem-solving skills by applying computer agents that meet Franklin and Graesser’s (1996) criteria is a crucial matter.

In addition, the effects of different collaborative approaches (e.g., learner–learner, or learner–computer agent) on students’ performance in collaborative problem solving are also worthy of exploration. Rosen and Tager (2013) compared the effects of different collaborative approaches for students’ collaborative problem-solving skills, and found that students had somewhat better performance in collaboration with computer agents than their collaborative interactions with other learners, but the difference was not significant. To sum up, collaboration between learners and computer agents has been shown to be useful for developing students’ collaborative problem-solving skills, but more in-depth studies are needed. Consequently, this study utilizes a computer agent method instead of a learner–learner collaborative method. Moreover, this study incorporated a new assessment system for collaborative problem solving in STEM education that encouraged students to engage with practical problems during the CPS that are relevant to their daily life; and this is an additional difference compared to other collaborative problem-solving systems.

Theoretical framework

Web-based collaborative problem solving

Collaborative problem-solving skill is the capacity of an individual to effectively engage in a problem-solving process wherein two or more agents are working together by sharing the understanding and effort required to come to a solution, and integrating their knowledge, skills, and efforts to reach that solution (OECD 2013). As for web-based collaborative problem solving, the core concept is of course the effective use of web-based technologies in collaborative problem solving. However, it has been demonstrated that normal web-based tools, such as e-mail, videoconferencing, newsgroups, and so on, are inappropriate for use in learning, since the original design of these tools is not for educational purposes (Harasim 1999).

Therefore, more and more researchers have come to believe that a successful web-based, collaborative problem-solving assessment system must consist of at least two elements: normal web-based tools, such as the planning of group space and personal space on the Web, and tools that specifically support learning activities and procedures, such as a multimedia interface that integrates learners’ ideas into solutions (Harasim 1999; Isenhour et al. 2000).

According to the definition of collaborative problem solving proposed by the OECD (2013), the most important issue is the design of the computer agent; this is also reflected in the principles set forth by Franklin and Graesser (1996). The computer agent should be utilizable in different situations, including tutoring, collaborative learning, knowledge construction, and other ones; and should be included in the collaborative problem-solving skill assessment system (OECD 2013). This study included a computer agent in eight assessment modules in STEM education. Students must collaborate with the computer agent in establishing and maintaining shared understanding, taking appropriate action to solve the problem, and establishing and maintaining team organization. There were no human–human collaborative interactions. In order to more effectively realize the goal of ensuring collaboration of students and computer agents in an authentic STEM learning experience, STEM specialists were invited to participate in developing the scripts for the eight assessment modules used in this study. That is, within this design for a web-based CPS system, the computer agents communicate with students using the scripts composed by the STEM specialists, and the way students respond to the scripted prompts from the computer agent is how we assess the effectiveness of the CPS system in promoting students’ learning of collaborative problem-solving skills.

Matrix of collaborative problem-solving skills

In order to assess students’ collaborative problem-solving skills, three major collaborative problem-solving skills are adopted here, and cross-related with four major problem-solving processes to form a matrix of collaborative problem-solving skills (see OECD 2013). The three skills and four processes are listed in Table 1.

Table 1 Matrix of collaborative problem solving including the specific CPS skills (A1–D3) that were assessed

To possess the skill of establishing and maintaining shared understanding, students must have the ability to identify mutual knowledge, to identify the perspectives of other agents, and to establish a shared vision of the problem state (Dillenbourg and Traum 2006; OECD 2013). For the skill of taking appropriate action to solve the problem, students must be able to understand the problem constraints, create team goals toward a solution, take action on the tasks, and monitor the results in relation to the group and to the problem goals (OECD 2013). For establishing and maintaining group organization, students must be able to understand their own role and the roles of the other agents, follow the rules of engagement for their role, monitor group organization, and facilitate changes needed to handle obstacles to the problem (OECD 2013).

Adopting these definitions of the three skills, this study designed a collaborative problem-solving system for use in assessing students’ collaborative problem-solving skills. As for the approach used in assessing students’ collaborative problem-solving skills, see the detailed explanation in the Evaluation section introduction.

CPS modules for applying STEM content

Because some previous collaborative problem-solving studies included limited use of skills relevant to students’ school life (e.g., Kuo 2014), and it is becoming increasingly clear that real-life situations help to promote engagement and motivation, we have developed a series of eight CPS modules in our assessment of students’ realistic collaborative problem-solving skills. In addition, we have included STEM themes as a major part of the eight modules. Raju and Clayson (2010) analyzed two national reports and proposed that the development of STEM talents is a major trend in the United States. However, the traditional models of education in these fields are inadequate, because they focused on knowledge transmission instead of offering different learning opportunities that engage the students in solving daily problems by applying what they learned in the school (Johnson 1989). Consequently, many researchers recommend that the learning scenarios should be focused on solving practical problems in the students’ daily life to more effectively assess authentic STEM collaborative problem-solving skills (Blackwell and Henkin 1989; Daugherty and Wicklein 1993; Martin-Kniep et al. 1995). Therefore, when considering the best ways to develop and assess students’ web-based collaborative problem-solving skills, the inclusion of content relevant to students’ daily life, should be a matter of first priority.

In order to explore and improve students’ problem-solving skills, many researchers have utilized a variety of different instructional methods (Barak and Mesika 2007; Hong et al. 2013; Kirschner et al. 2011). However, most of these studies have focused on developing students’ skill at solving structured and subject-based problems with designated solutions in terms of the theoretical learning content of scientific fields such as physics, biology, and chemistry. As a result, students are not given an opportunity to collaboratively develop their problem-solving skills, while engaged with practical, daily life problems that typically have no predefined solutions (Dixon and Brown 2012; Sternberg 2001). It is self-evident, that the problem-solving skills students need in order to arrive at solutions appropriate to daily life are different from those they generally need in solving less practically situated school-based problems; and teachers need to adopt different strategies to instill these skills (Johnson et al. 2011).

Implementation

System framework

The design of the assessment system is centered on eight assessment modules for applying STEM content relevant in daily life for the participating Taiwanese students (see Fig. 1). The modules include eight practical problem-solving tasks: building shelves, using a microwave oven, defusing a bomb, interior design, two days’ travel in Kaohsiung, buying a cell phone, building a house, and designing a desk tidy. With respect to computer agents, students were required to work with the agent(s) to solve problems collaboratively; they communicate with the computer agents by responding using the keyboard to insert their responses, or selecting an item onscreen (see Fig. 2). Once the students input what they want to say, the computer agent searches the item bank and responds to the students (see Fig. 3). In order to evaluate students’ collaborative problem-solving skills, many possible reactions to the student’s input during the problem-solving process are included in this system, these include one where wrong guidance is offered by the computer agent, in order to evaluate students’ CPS performance in relation to these challenging situations.

Fig. 1
figure 1

Assessment modules including eight STEM tasks largely centered on practical type problems: Building Shelves, Using a Microwave Oven, Defusing a Bomb, Interior Design, Two days’ Travel in Kaohsiung, Buying a Cell Phone, Building a House, and Designing a Desk Tidy

Fig. 2
figure 2

Assessment system interface that the student uses to collaboratively interact with computer agents

Fig. 3
figure 3

Assessment system diagram showing the information flow and interactions of student and computer agents, including the role of the assessment module and its embedded agent where student input data are analyzed, and appropriate items are retrieved from the Item bank to provide a reaction to the students’ input

Web-based CPS system planning

The design elements of task characteristics, problem scenario, medium, and team composition were considered in the process of developing the CPS system (OECD 2013). With regard to task characteristics, all the tasks in this study were related to students’ daily life, and the students and the computer agents had to play different roles and take different responsibilities to solve the problems. With regard to the problem scenario, as mentioned above, tasks that could plausibly appear in the daily lives of these Taiwanese students were adopted. As for medium, students were provided with sufficient information to make decisions by the computer agents as they worked together, but students had to determine the correctness of the information offered by the agent. Finally, for team composition, students worked with either one or two computer agents and were required to play different roles in the different scenarios.

Evaluation

As mentioned above, in order to assess students’ collaborative problem-solving skills, this study employed the collaborative problem-solving matrix proposed by the OECD (2013), composed of three skills. In the eight assessment modules, this study included a series of questions in assessing students’ performance in the matrix of collaborative problem solving. Take the task of building shelves for example, there are 13 questions for evaluating students’ performance in establishing and maintaining shared understanding, 19 questions for evaluating students’ performance in taking appropriate action to solve the problem, and 7 questions for evaluating students’ performance in establishing and maintaining team organization during the collaborative problem-solving process (see sample questions and score table in Fig. 4). Students must answer these questions according to their interaction with the computer agent as they progress through the collaborative problem-solving tasks included in the eight assessment modules. For the 12 elements in the matrix of collaborative problem solving (A1–D3), the overall score assigned depended on the students’ percentage of correct responses to the relevant questions related to each element. If the students answered all questions related to an element correctly, their responses were assigned a value of 1.0 (100 % correct, see Fig. 4).

Fig. 4
figure 4

Table of students’ original performance scores on questions A1–D3 derived from the OECD collaborative problem-solving matrix. For example, the original scores for the shelf construction task (line 2), include the scores for sample questions C1 and C2 (1.00 and 0.60, respectively): These two questions are: (C1) Do you think that a small and simple set of plastic shelves is a good choice for solving this problem?; (C2) Do you know the installation process of building shelves?

To obtain a performance assessment of the three skills (columns 1–3, Table 1), the mean values of A1–D1, A2–D2, and A3–D3, were calculated, respectively. For a composite assessment of the collaborative problem-solving skills, the OECD (2013) suggests the following criteria of percent correct responses, (1) establishing and maintaining shared understanding should account for between 40 and 50 % of overall skill performance, (2) taking appropriate action to solve the problem, between 20 and 30 % of overall performance, and (3) establishing and maintaining team organization, between 30 and 35 % of overall skill performance. Therefore to be concise, we adopted a criterion level of 40, 30, and 30 % for these, respectively.

With regard to validity, three STEM education or collaborative problem-solving specialists reviewed the content validity of the eight assessment modules according to the OECD criteria of the 12 skills listed in Table 1 and concurred that they were adequate. Furthermore, to examine the criterion- related validity of our assessment of the eight modules, Pearson product-moment intercorrelations of the students’ performance on the eight modules were obtained in relation to overall and each of three skill areas as follows. For example, for the overall CPS skill, the assessment values for each of the eight CPS module scenarios were intercorrelated to determine if they were consistently rendering the same result (see Table 4 in Results for this example). Likewise, intercorrelations were obtained for each of the eight CPS modules in relation to skill 1 (Table 1); i.e., “Establishing and maintaining shared understanding” (Table 5, Results). Also, intercorrelations were obtained for skill 2: “Taking appropriate action to solve the problem” (Table 6, Results); and similarly for skill 3: “Establishing and maintaining team organization” (Table 7, Results). The higher the intercorrelations in each table, the more likely the eight scenarios were consistent, yielding the same results for the particular identified skill, hence supporting the validity of the design.

Participants and procedures

A total of 222 Taiwanese junior high students aged 13–15 years old participated in this study. They came from nine classes in two different schools, and spent three weeks finishing the eight tasks in the CPS web-based system. In the first week, the teachers explained the operation of the assessment system, and the students were expected to complete two tasks. In the second and third weeks, students had to finish three tasks per week. With respect to the format of the web-based CPS system, the number of computer agents that the student interacted with varied. The student worked with one agent (initially during 4 modules) and later two agents (subsequently, for 4 other modules).

Findings for each research question

Research Question 1

Is there evidence that the eight modules have good validity in assessing junior high school students’ collaborative problem-solving skills?

Students’ performance on the assessment modules

Average student performance using the assessment system was about 0.7 (out of a full mark of 1.0; this is equal to 70 % correct in the test, see Table 2). Students tended to have lower CPS performance if the problem scenario was related to students’ hands-on experiences (that is, making shelves or designing and tidying a desk). Based on the results, it is hard to find related studies for comparison to our finding. Because, many prior studies focused on different dimensions to be assessed than the ones we used, such as using the computer agent in designing reading, writing, and communicating systems (McNamara et al. 2007; Johnson and Valente 2008), developing an inferential problem-solving system (Azevedo et al. 2010; Biswas et al. 2010; Cai et al. 2011); and so on. In our assessment system, the computer agent played an important role as a technology specialist, and was designed to offer as much valuable information as possible to the students, but the results were not as expected. Among possible reasons for this discrepancy, the student’s prior experiences with the particular kinds of hands-on tasks we included in the CPS system may have been too limited. If the students had very little prior experience with some of the practical kinds of tasks we included, it may have been difficult for them to work effectively with the kind of information that the computer agent was providing. That is, students’ prior knowledge in the specific domain could be an important factor in influencing their effective interaction with the information provided by the computer agent in the collaborative STEM problem-solving scenarios. Further issues with limitations that may have arisen due to the kind of information and its organization used for the items by the agent in responding to the student are also examined in the Discussion.

Table 2 Means (M) and standard deviations (SD) for Students’ performance on the assessment modules

Discrimination and difficulty of the assessment modules

In order to explore the discrimination and difficulty of the eight assessment modules, the analysis of the discrimination and difficulty indexes of these eight assessment modules is shown in Table 3. In order to simplify the entries in the table, the average results of three dimensions are shown instead of all items in these eight assessment modules. The discrimination indexes of the CPS assessment modules in this study were between 0.34 and 0.70 with an average of 0.49, and the difficulty indexes were between 0.57 and 0.92 with an average of 0.80 (details shown in Table 3). According to Ebel and Frisbie’s (1986) viewpoints, an item should be removed if its discrimination index (D) is below 0.19 and is acceptable but needs to be modified if D is between 0.20 and 0.29. All the discrimination indexes of the CPS assessment modules are larger than 0.34, so the discrimination of the eight assessment modules was deemed to be good. As for the difficulty of the eight assessment modules, Kuo (1996) believed that the optimal difficulty index (P) is between 0.40 and 0.80. However, the average difficulty indexes is 0.80 and some items are larger than 0.80. Overall, the difficulty of the eight assessment modules is acceptable, but some items require revision to make the difficulty more acceptable.

Table 3 Discrimination and difficulty indexes of the assessment modules

The correlations of students’ CPS performance across eight different problem scenarios

In addition to the content validity that was reported by the panel of expert reviewers, the criterion-related validity was also examined through Pearson intercorrelations as explained in the methods section on Evaluation. The use of this kind of evaluation criterion is difficult to find in publications for the past few years. The only system of a similar kind was developed by Kuo (2014). The core ideas of Kuo’s system are similar to this study, but the difference is that Kuo’s system is much simpler and students can finish one test in a few steps (e.g., five steps). The eight assessment modules used in our study were developed by four small groups (including one professor, one graduate student, and one information technology engineer); thus, a question can be raised about how consistent the eight modules were in yielding similar evaluation results. Therefore, our rationale was to examine how consistently the eight modules yielded similar results for overall and each of three CPS skills. If the intercorrelations of the results for each of the eight modules were large within any one CPS skill, this would support a conclusion that they were consistently producing similar results. Accordingly, the results of the intercorrelations across the eight task modules in relation to overall CPS performance are presented in Table 4. The correlation values of the students’ collaborative problem-solving skills, across eight different problem scenarios, all reached a statistically significant threshold (p ≤ 0.05); although the correlation values ranged considerably from 0.33 to 0.60. That is, the eight tasks yielded sufficiently similar results for the assessment of the students’ CPS performance to be considered adequately consistent.

Table 4 Correlations of students’ overall CPS performance across eight different problem scenarios

The correlations of establishing and maintaining shared understanding across eight different problem scenarios

According to the intercorrelation results in Table 5, the correlations of students’ performance establishing and maintaining shared understanding across the eight problem scenarios all reached a significant level, indicating an adequate level of consistency within this task dimension. The correlation values of “microwave oven and interior design,” “interior design and construct a house,” though statistically significant, are below 0.2, and suggest less confidence that the eight tasks are substantially yielding similar results. The correlations account for less than 4 % of the variance, even though they are statistically significant with the large N of 222 students. That is, the assessment of the students’ performance in the skill of “Establishing and maintaining shared understanding” may differ depending on which of the eight tasks they are performing. It is not clear why there are these differences. However, for the tasks 1–4, students worked with one computer agent, and for tasks 5–8, students worked with two computer agents. Therefore, the possible reason for the lack of consistency in results (0.19 for item 2 in relation to item 4, and 0.18 for item 4 in relation to item 7) is not likely due to the number of computer agents, because the number of agents was different between these two sets, yet the results were low for both.

Table 5 Correlations of “Establishing and maintaining shared understandings” across eight different problem scenarios

The correlations of students’ taking appropriate action to solve the problem across eight different problem scenarios

According to the results in Table 6, the correlations of students’ performance on the skill of “Taking appropriate action to solve the problem,” across the eight different problem scenarios, all reached a significant level. That is, the eight tasks consistently yield similar outcome patterns. If we check the correlations for three comparisons that are low, i.e., (1) task 2 “Microwave oven” and task 6 “Buy a cell phone;” (2) for task 1 “Shelves” and task 7 “Construct a house,” and (3) task 2 “Microwave oven and task 7 “construct a house,” we note that all values are less than, or equal, to 0.2. Among the four tasks analyzed in these three comparisons the number of computer agents varied. There are two computer agents in the tasks “Buy a cell phone” and “Construct a house.” In these two module tasks, the two computer agents will offer different choices to the students, and some of the choices are designed to be wrong choices as explained in the foregoing methods section on System framework. So the students may face more difficulties in taking appropriate action to solve the problems in these two task modules, and this could be one of the possible reasons for the low correlations of the outcomes for these two tasks.

Table 6 Correlations of students’ “Taking appropriate action to solve the problem” across eight different problem scenarios

The correlations of students’ establishing and maintaining team organization across eight different problem scenarios

According to the analysis results in Table 7, the correlations of students’ establishing and maintaining team organization performance across eight different problem scenarios all reached a significant level, but the shelves and construct a house are an exception. That is, there are six tasks at least that can assess students’ establishment and maintenance of team organization effectively. However, some correlations were low, and these findings should therefore be taken cautiously and re-checked by different studies and from different angles. If we check the correlations of “shelves and construct a house,” “shelves and travel in Kaohsiung,” “shelves and buy a cell phone,” “defuse a bomb and construct a house,” “construct a house and design tidy desk,” we note that the values are below 0.2. The possible reason for the low correlations with the shelves task in assessing students’ establishing and maintaining team organization is that we had a different design in the shelves task. Students had to communicate with the computer agent by inputting what they wanted to say using the keyboard, and the computer agent searched the keyed-in input by accessing the database and responded to the students. In addition, students also could indicate their response by selecting from among a set of choices. According to the analysis of students’ online record, some students did not want to communicate with the computer agent by keying in their responses and preferred to select the choice. Therefore, if the task is largely designed to be most effective using the keyboard communication mode, but students preferentially elect to use the simpler choice mode, this may be a source of differential performance in establishing and maintaining team organization, hence contributing to lower intercorrelations across tasks.

Table 7 Correlations of students’ “Establishing and maintaining team organization” across eight different problem task scenarios

Research Question 2

What are some of the difficulties in developing an assessment system for collaborative problem solving in STEM education?

This study was admittedly complex. Our goal was to provide some of the first assessment-based evidence for a CPS web-based system that was theoretically based (OECD matrix), that included computer agents (rather than human-to-human agents during CPS), and incorporated problem tasks that were situated in every day kinds of issues that junior high school students may encounter. Weaving all three of these components into a novel system of this kind, that also used broad STEM content, was challenging; especially given the lack of a similar ambitious study in the literature to help guide us. One of the issues concerns the challenge of creating a computer agent system that is sufficiently responsive to the varied student inputs to be reasonably natural and sufficiently contingent to encourage and reinforce student engagement in a way that leads to consistent gains across the eight different task modules. In this exploratory study, a team of experts designed a limited number of potential items that the “agent” could retrieve in responding to student input. Consequently, some lack of coherence is likely to occur; and it is difficult in every case to develop an algorithm that would provide appropriate decision making by the “agent” to respond in both a logical and human-based natural way. More sophisticated agent systems are clearly needed, and this study offers some insights on the prospects for success as well as the challenges.

Another challenge emerged for the skill of “Establishing and maintaining team organization.” This also was likely influenced partially by the lack of a more “human-like” response of the computer agent as well as the limited number of options programmed in the agent submodule in the server system (Fig. 3). Unless the student has a sense that the collaborator is responding in a contingent and logical way, it is difficult to see how a truly coordinated collaborative relationship can develop.

Discussion and conclusions

According to the data and analysis presented above, the assessment system developed in this study consisted of the items with acceptable difficulty and satisfactory discrimination, and the content validity and criterion-related validity were also judged to be good. Overall, the assessment system for collaborative problem solving in STEM education presented by this study was effective in assessing students’ collaborative problem-solving skills. Although the majority of intercorrelations used as evidence of consistency of outcomes across the eight tasks within a given CPS skill (Tables 4, 5, 6, and 7) on the whole were highly statistically significant and also in many cases of sufficiently high correlation values to be considered strong, there were a few that were low. These were highlighted in the Findings section and the reasons for these discrepant findings remain somewhat enigmatic as to why the consistencies were so low. Among these, the pairs of items that had low correlations: task 2 “Microwave oven” and task 6 “Buy a cell phone;” task 1 “Shelves” and task 7 “Construct a house,” and task 2 “Microwave oven and task 7 “construct a house,” there are no immediate clues as to why these tasks of different demands were so discrepant. These findings may offer an opportunity for further research to determine what task attributes in relation to CPS skills are most likely to be challenging for students at different grade levels and prior levels of knowledge.

As explained in the results for Research question 2, one skill that requires further attention, “Establishing and maintaining team organization,” is especially challenging with respect to the design of a naturalistic computer agent. It proved hard to design a problem scenario to assess this skill, mostly because it is difficult to always design a response format for the computer agents that is sufficiently “human-like;” rather, they act according to a more limited set of item bank options developed by our research team members. The complexities that emerge when the mode of interaction of the learner with the agent varies (such as keying in versus choosing options) may also contribute to differences in performance assessment outcomes across varied tasks. In Ding’s (2009) study, three different collaboration patterns in terms of joint and individual knowledge elaboration were shown to be effective, but only one collaboration pattern was employed in the assessment system for collaborative problem solving, because the students had to follow the script provided in that study. In general, the limitations of a script-based computer agent are a major problem for Ding’s study, and likely for others given our current state of development in the field. It was also a major issue in our study, when in addition to the complexities of the OECD framework, we also included natural every day problems to assess students’ performance of collaborative problem-solving skills in STEM education. That is, if students have different collaboration patterns (Ding 2009), and the computer agent contributes yet another variation on the complexity of the patterns, then the students may not achieve the level of sophistication that they may achieve in a more natural human-to-human-based experience. Besides, if the task is designed as a keyboard response mode, where the text may be complex and not easily parsed by the software, it may be hard for the system to capture the students’ real thoughts; this may be especially difficult due to the limitation of our techniques in Chinese latent semantic analysis. If we want to improve the system and reduce these mistakes, it could be possible to offer the students multiple choices in responding, rather than a single option provided by the computer agent (e.g., Kuo 2014), but it is difficult in this case to prevent the students from merely choosing an answer instead of keying in an answer that is more clearly what they want to say, in addition to the problem of less consistency across different tasks, as discussed above.

In addition, there is another important difficulty that needs to be addressed. In this study, we tried to assess 12 collaborative problem-solving skills in each problem scenario; this large number of assessments became a drawback due to the complexity of the problem scenarios. In some cases, there were just one or two questions used to assess a given skill (see Table 1) and this may also be the reason why the average difficulty of the eight assessment modules was deemed only “acceptable” according to Kuo’s (1996) criteria. If in future work, we include fewer of the 12 skills in one problem scenario to try to obtain a more robust item analysis of the CPS responses, we face another limitation, i.e., it may be difficult to get a comprehensive sense of students’ real collaborative problem-solving skills.

Moreover, this study utilized a criterion-related validity method of examining the intercorrelations of the results for each of the eight tasks in relation to overall and each of three CPS dimensions as a way of gaining evidence of consistency and a form of validity for the eight modules in the assessment CPS system. Even if a correlation value reached the significant level (p ≤ 0.05), we found in some instances that the amount of variance accounted for was low; thus there is still a limitation that one or more of the eight assessment modules could have by chance missed the core objective of the collaborative problem-solving task that we intended to achieve.

To sum up, this study proposed a novel, and somewhat complex, assessment system for collaborative problem solving in STEM education that can be used to assess junior high students’ collaborative problem-solving skills. Although we used a computer agent, it is likely that with appropriate online data capture systems, it could be adapted for assessing collaborative learning where human-to-human collaborations were used. Even with some of the limitations of our current capacity to create a more human-like computer agent or agents, the results for many of the assessments in this study were encouraging. We hope the reflections and suggestions provided can help future researchers develop more effective collaborative problem-solving systems or inspire others to conduct related studies.