Strategies for Building Computing Skills To Support Microbiome Analysis: a Five-Year Perspective from the EDAMAME Workshop

High-throughput sequencing and related statistical and bioinformatic analyses have become routine in microbiology in the past decade, but there are few formal training opportunities to develop these skills. A weeklong workshop can offer sufficient time for novices to become introduced to best computing practices and common workflows in sequence analysis. We report our experiences in executing such a workshop targeted to professional learners (graduate students, postdoctoral scientists, faculty, and research staff).

I t is now recognized that microbial communities ("microbiomes") play essential roles in the health of the environments and the hosts that they inhabit. In addition, advances in high-throughput sequencing technologies allow observations of the diversity and functional potential of microbiomes in their habitats (1), captured with spatially and temporally ambitious study designs (2). Together, these advances in knowledge and methodology deepen and broaden our understanding of the centrality of microbiomes for host and environmental health. Because of the economy and accessibility of high-throughput sequencing, researchers can now investigate the diversity of interesting microbiomes and can begin to untangle how this diversity contributes to host or ecosystem health. Efforts to capitalize on the promise of microbiome sequencing data have resulted in information-rich genomic data sets that must be analyzed to gain knowledge of their intricate relationships.
We realized that there was a need for broad computational training in microbiome analysis. In 2014, we were encouraged by C. Titus Brown (now at the University of California, Davis) to offer a microbiome analysis workshop. At the time, he led the Analyzing Next-Gen Sequencing (ANGUS; https://angus.readthedocs.io/en/2018/index .html) Workshop at Michigan State University's Kellogg Biological Station (KBS). He noted that some ANGUS learners were particularly interested in microbiome analysis and that there were limited offerings for this training. At the time, there were several science tools for microbiome analyses, ecological statistics, and computing best practices. We provided specific data sets to accompany each of the hands-on tutorials and encouraged learners to use these data sets in the classroom. Some data sets were used for more than one tutorial to provide continuity through the analysis workflow.
Target learners and admissions. We targeted our applicant pool at the learners who would benefit most from the training and who we expected would share their developed expertise with others to maximize the reach of the workshop's training. We accepted applicants who were novices in their analysis skillset and who did not have apparent access to other resources to support their skill development. We also aimed to promote diversity in scientific discipline (e.g., human, agricultural, environmental microbiomes), learner gender and background, research institution (e.g., undergraduate-serving, research university, agencies), geography (with special advertising directed to learners in the Midwest), and academic level ( Fig. 2; see also Fig. 3). We also strove to provide opportunities to international learners and learners from underrepresented backgrounds. To advertise the course, we used social media (Twitter), our website, and professional networks. We also attempted to reach broader audiences by advertising with international scientific networks, especially Ciencia Puerto Rico in 2014 to 2016.
In each workshop, we could accommodate 23 to 26 learners in the classroom, and applications were oversubscribed every year ( Table 1). As admissions became increasingly competitive, we began to require (rather than encourage) applicants to generate a microbiome data set prior to the workshop. We found that students who had struggled in an analysis attempt were highly incentivized to maximize their time at the workshop. Also, they could work on their data during office hours and ask specific questions of the instructors and teaching assistants (TAs). Instructional team. A large instructional team was necessary to support EDAMAME's learning goals. There were one to three lead instructors per year ( Table 1). The instructors led the courses, oversaw admissions, provided lectures and course content, determined guest lectures, and mentored TAs in tutorial development. In the final 2 years of the workshop, there was also a course coordinator who oversaw conference logistics; fielded learner and applicant questions; and coordinated transportation for learners, guest lecturers, and instructors.
The hands-on nature of the workshop necessitated the presence of several dedicated TAs. Multiple instructors and supportive TAs in the classroom allowed us to be immediately responsive to the needs of the learners. TAs led tutorials based on interest and expertise. Having multiple TAs broadened instructional expertise and allowed unscheduled time for each TA to rest when they were not supporting instruction. Most often, new learners struggled with basic syntax and with interpreting error messages. Novice TAs (e.g., early graduate students) helped learners troubleshoot common errors,  while the more senior TAs and instructors assisted with more complicated problems (e.g., software and operating system incompatibilities, experimental design power for data analysis). In addition to instruction, TAs supported the logistical aspects of the course, such as providing local transportation for learners, purchasing supplies, and assisting learners with unexpected personal needs (e.g., trip to the medical center, forgotten toothbrush). TAs included volunteers (graduate students and postdocs) and graduate assistants who were partially supported by EDAMAME external funding. Participation in the workshop also offered TAs teaching opportunities that served diverse audiences.
There were also numerous invited guest instructors who offered tutorials, technical lectures, and research talks ( Table 2). These guest instructors were selected to represent diverse career stages and levels of expertise across microbiome science ( Table 2). Guest instructors varied according to guest availability, learner interests, and workshop duration, but some guest instructors generously provided content every year. Stuart Jones (University of Notre Dame) taught statistical analysis in R; Patrick Schloss and members of his lab (University of Michigan) taught amplicon analysis with mothur; Jim Tiedje (Michigan State University) provided a lecture and discussion on the future of microbial ecology. Instructors interacted with the learners during dinner and social time, and this provided an opportunity for learner networking and discussions.
Learning environment and daily schedule. EDAMAME was held at the Kellogg Biological Station (KBS), which offered a remote location and an immersive experience for learners and instructors. KBS was also chosen for economy-the room and board rates at KBS were affordable to many (e.g., ϳ$370 per week in 2018). Teaching assistants and volunteers provided transportation from the Kalamazoo and Lansing airports to KBS. KBS also provided conference services, dining, Wi-Fi, and bonfires. Finally, the natural setting and outdoor activities at KBS provided a respite from the time spent in front of the computer.
The durations of the workshops ranged from 7 to 11 days (Table 1; see also Fig. 4), including travel days. The morning schedule included an overview lecture followed by hands-on tutorials and group learning activities. After lunch, we had an afternoon lecture and additional tutorials. We held optional office hours with "choose your own adventure" tutorials and/or lectures on learner-chosen topics during the afternoon break. For example, in 2018 we discussed exact sequence variant analysis because it was a new approach being used in the field for defining operational taxonomic units. Learners could also ask specific questions about their own data during office hours. After dinner, we held an evening guest lecture in microbiome research. Evenings provided free time for networking and relaxation. EDAMAME educational strategy and assessment. EDAMAME's educational strategy addressed two training needs. First, we offered general training in the fundamentals of introductory computing (e.g., command line, scripting, cloud computing, bioinformatic workflows [8]). This equipped participants with the basic skills needed to independently execute their analyses. We also offered specific training to overcome hurdles particular to microbial metagenomic data analysis and advised on best practices for microbiome analysis. To iteratively assess these strategies, we used a combination of summative and formative assessments to determine participant learning gains.
For the summative assessments, we worked with educational consultants to develop online, anonymous surveys and to perform pre-and postworkshop assessments. These assessments evaluated student-reported learning gains and confidence in areas aligned with our learning objectives. Each learner created a password to preserve anonymity while allowing for linking the pre-and postsurvey responses. To maximize response rates, we provided dedicated time in the classroom to complete the surveys. The preassessment survey was completed on the first full day, and the postassessment survey was completed on the final day of the workshop. We updated the survey annually to reflect any new or changed learning objectives but maintained the structure to facilitate interannual comparisons. Results of the annual surveys guided the continued development of course materials and topics covered. In the early years of the workshop, we had consultants perform in-classroom observations and provide feedback to the instructors. Ultimately, we compiled the 5 years of pre-and postsurvey data and performed a longitudinal analysis.
In the pre-and postsurveys, learners were asked to indicate the extent to which they understood specific learning outcomes or skills covered in the course, with ratings (e.g., Strongly Disagree, Disagree, Agree, and Strongly Agree [ Fig. 5]).
We also used "real-time" assessment during the workshop by replicating formative assessment strategies found to be effective in Software Carpentry workshops (5-7). Formative assessment is an approach where teachers use informal practices to assess student learning during the workshop to evaluate understanding and modify teaching if needed. We used red and green sticky notes and "minute cards" to get this real-time feedback from students. Each participant was given a green ("I'm doing okay") and a red ("I have a question") sticky note to stick onto their open laptop during tutorials. This visual cue allowed instructors to quickly survey the classroom and determine learners' comfort level and to attend to any student who was struggling during tutorials. Furthermore, it allowed students to continue working through tutorials or troubleshooting without the need of raising their hand. We also employed minute cards. After each tutorial, students wrote what went well on the green sticky note and what could be improved on the red sticky note. Instructors and TAs read through notes during breaks to quickly identify gaps in understanding. This allowed us to identify gaps and make adjustments (e.g., in speed) in the subsequent instruction period. Building community resources and peer networks. We were dedicated to promoting a welcoming and supportive learning environment. We presented a Code of Conduct in the welcome lecture, which outlines expectations for student and faculty behavior during the workshop and reporting procedures if those behavior expectations are not met, so that it was clear that any questionable conduct was grounds for dismissal. We created an online shared document Etherpad for collaborative note taking to maximize engagement and inclusivity. All materials were regularly updated and available online through our course website. We did our best to accommodate learners with families, providing private housing to families and learners with special requirements.
We aimed to build a peer learning community and to provide resources to support learners beyond the workshop. We offered an informal meet-and-greet on the arrival travel day and get-to-know-you short talks as lighting presentations after the first full day. These interactions allowed learners to identify peers with common research interests early in the workshop. We created a workshop website and a public repository on GitHub so that learners (and outside parties) could access EDAMAME learning materials. Linked content included lectures, hands-on tutorials, and reference lists. These materials have been shared openly, with most content licensed as Creative Commons generic (e.g., CC-BY), so that all course registrants and anyone else interested could have access. We also shared group email lists and encouraged social media outreach via Twitter and blogging. An EDAMAME meet-up was also held at the International Society for Microbial Ecology 2016 meeting in Montreal, Canada.
Pre-and postsurvey comparisons and qualitative interviews. Ninety-seven percent of EDAMAME learners from 2014 to 2018 rated the workshop overall in the top evaluative categories, "good" to "very good" (Fig. 6). A comparison of pre-and postassessment learner-reported learning gains and/or confidence with the major learning objectives of EDAMAME shows gains in all subcategories of learning reported  (3), mothur (4), the .biom format (9), and R (10). (Fig. 7). The largest gains between the pre-and postassessments were seen with Computational Understanding (Fig. 7B) and Perception in Ability (Fig. 7C).
We also asked short-answer questions at the end of the survey, in which learners were asked to design an experiment and report how they would process and analyze microbial community high-throughput sequencing data. We observed increased sophistication in the responses to the short-answer questions from the pre-to postsurvey  periods, with some learners leaving the questions blank in the presurvey and then providing thorough answers in the postsurvey (data not shown, but anonymized annual assessment reports are available upon request). This suggests large gains in particular for those learners who were new to high-throughput sequence analysis. Qualitative interviews with 9 learners who attended EDAMAME from 2014 to 2016 (each spending 25 to 40 min with the interviewer; Table 3) suggested that the members of this group of learners were largely satisfied with the workshop and appreciated the attentiveness of the TAs and instructors as well as the red/green sticky note mechanism for soliciting help in real time. However, some of these learners also felt that there was too much material covered in the workshop and reported that they struggled to keep up with the pace of the course ("Content overwhelm"). Finally, we had many interviewed learners state that the workshop and materials covered made a positive impact on their career and research.

DISCUSSION
Lessons learned. We offer suggestions from our experiences for running an effective microbiome analysis workshop (Fig. 8). EDAMAME's content changed from 2014 to

Comment category
Comment content Positive overall "EDAMAME was an inspiring introduction into microbiology. I thought the kind of analyses you could do with microbiology was really interesting. I really got pulled in on the data science part." (2014); "It was definitely one of the most effective workshops I've been to." (2016); "Very comprehensive, reached a lot of people from different backgrounds who were interested in analyzing microbial communities. I thought it provided a good survey of the tools that were available and it brought in some experts." 1. Regularly evaluate and change content to meet changing learner needs. 2. Maintain a high instructor to learner ratio. 3. Provide consistent workshop timing and fill the "middle-ground" duration needs of learners. 4. Understand the pros and cons of cloud computing for a workshop, and plan use of these resources well in advance. 5. Reach the broadest applicant pool of learners who have the potential to have the most gains from the training. 6. Consider the trade-off in workshop value (including instructor time) and maintaining economical costs to learners. 7. Plan well in advance to achieve best outcomes for applicants who require a US visa and international travel plans to attend the workshop. 8. Almost all learning engagement needs to happen on-site; efforts to engage learners pre-workshop were ineffective. 9. Scheduled classes should teach to the majority of learners to accomplish all our learning objectives. Office hours can help struggling learners catch up. 10. A welcoming and inclusive environment creates a positive workshop experience and is essential for effective learning. 2018 to meet changing learner needs. These changes were guided in part by the applicants' responses to questions about their data sets and their expectations for the workshop. For example, amplicon analysis (e.g., 16S rRNA gene sequencing) was favored in the early years whereas untargeted metagenome analysis was favored in later years. Similarly, proportionally fewer students in 2018 were novices with respect to the command line or R, but the majority of the class appreciated the refresher. Some of the learners with self-taught experience embraced the opportunity to relearn the "correct" approaches and to gain missing foundational knowledge. Several tutorials were popular every year. For example, there was a consistent demand for ecological statistics and "supporting" skills such as GitHub/version control and cloud computing.
High instructor-to-learner ratios were essential for the success of the hands-on EDAMAME workshop. In the years in which we had the lowest instructor-to-learner ratios (e.g., in 2014 and 2015; Table 1), the TAs and instructors anecdotally reported exhaustion whereas the learners craved more attention. In addition to assistance from the formal instructors, learners could assist one another. To facilitate peer learning, we arranged the classroom in tables with groups of two or four. We also encouraged learners to support one another with troubleshooting while waiting for an instructor to become available to provide assistance.
Regardless of the length of the course, several learners indicated in their postassessments that more time at the workshop was needed each year. However, learners who were faculty or staff researchers shared (in informal conversations) that they would have been unable to commit to a longer workshop due to other professional responsibilities. We noted that there were other offerings for multiweek workshops (e.g., STAMPS), as well as several 1-day or 2-day workshops at professional society meetings and pipeline-specific training (e.g., in mothur and QIIME).
The timing of the workshop had several challenges. EDAMAME was held in the summer, and we tried to avoid scheduling it for the same week as major microbiology conferences, such as the American Society for Microbiology Microbe meeting and the International Symposium on Microbial Ecology (ISME) and Ecological Society of America meetings. Because microbiome analysis spans multiple disciplines, it was hard to avoid all of the large conferences that microbiome researchers may attend. We also had to change the timing of the workshop every year to accommodate the KBS event schedule. As EDAMAME grew in popularity, some learners applied for fellowships or travel awards to support their training, but the annual changes in timing made it difficult for students to plan. Moving the workshop to a dedicated conference site (e.g., a hotel) might help with consistent timing, but it would also increase the cost to learners.
We found that using cloud computing streamlined the course content and democratized access. We used the Amazon Elastic Compute Cloud (EC2), which was costeffective and available to students who do not have access to high-performance computers at their home institutions. In the early years of the workshops, we guided learners through software installation on the EC2, but in the later years, we installed the software on the EC2 for the learners so that they could focus on moving data to and from the EC2. Using the EC2 presented a challenge for learners who were affiliated with government agencies or research laboratories (e.g., the U.S. Environmental Protection Agency, U.S. Geological Survey) because of their need for additional security and management approval prior to installing new software or moving data. While we did not have a perfect solution for these learners, we began to anticipate their needs and prompted them to apply for the required permissions in advance. Another hurdle with using the EC2 was the changing ways in which Amazon provided student or educational computing resources over the years. Amazon provided individual credits to learners in some years and in others required the instructors to apply for an educational grant. Cloud computing logistics needed to be anticipated about 9 months in advance, but in years where individual email addresses were needed, it was impossible to prepare until after the admissions were finalized, which typically occurred 4 to 6 months in advance of the workshop. We also noted an issue for some international learners who did not have credit cards compatible with Amazon requirements to enroll for an EC2 account, and for these learners we had to share our own accounts or create accounts for them.
While our applicant pool and learner demographics reflected balance in sex, discipline, and academic level, EDAMAME fell short of its racial diversity goals. We could have benefitted from improvement in advertising the course to reach a broader pool to attract more applicants of color. We largely advertised on social media and through word of mouth. We attempted to specifically advertise to key target learner groups, such as those underrepresented in the sciences who may be expected to have less access to the training. On a positive note, we have evidence that EDAMAME was reaching socioeconomic diversity goals, as two interview respondents were clear that they would not have had the same opportunity for training and advancement given their lower-income backgrounds if it had not been for EDAMAME.
A final lesson to share is the balance between course value and learner costs. In its first years, EDAMAME was funded piecemeal by generous sponsors. We experimented with a mixed enrollment model of offering EDAMAME for university credit to local students and for a fee to outside students, but many of the local students could not afford the summer tuition required for the credit hours. Subsequently, EDAMAME was funded by external federal grants (the NIH during 2015 to 2018 and the USDA during 2017 to 2018), and we stopped offering it for credit for logistical simplicity. We began to charge modest workshop fees ($325) to support items that could not be covered by the grant (e.g., coffee, snacks). As soon as we began to charge workshop fees, the majority of applicants began to request financial aid. We realized that many of the learners, mostly graduate students and postdocs, were paying for the workshop personally, so we then worked to waive fees for eligible students in need and offer scholarships for students with international travel. The instructional team did not have enough funds to fully pay the TAs and instructors, who largely volunteered their time because they believed in the mission of the training. Guest instructors and lecturers generously volunteered their time as part of their broader impact for federal grants, and the workshop covered their travel expenses along with room and board at KBS. Thus, there is inevitable tension balancing instructor compensation and course affordability.
How much does it actually cost to run a workshop like EDAMAME? The first year, we ran the workshop for less than $14,000; students paid their own expenses of room and board, and no workshop fees were charged. This face amount does not include the substantial additional support that was provided via shared logistics with the ANGUS workshop, which was occurring at the same time at Kellogg Biological Station. It also does not include any support for personnel, which was the largest expense. Ideally, there would have been an annual budget for instructor and TA summer salaries, a logistics coordinator salary, and an hourly salary for undergraduate labor during the course. We also realized that unless we could procure funds to support personnel, the training might not be valued as highly by institutions and peers and might instead be perceived as a representing a cost diverting funds from other scholarly activities. The second biggest expenses were those of financial aid to offset costs of room and board and workshop fees to learners who needed it, which we provided in 2017 and 2018 to qualified learners, with USDA support. The third biggest expense was represented by the educational consultant employed to evaluate the course as a neutral third party and totaled $5,500 to $6,000 per evaluation. The remaining expenses were those represented by conference services at Kellogg Biological Station and lodging and travel expenses for the instructional team and guest speakers. In summary, there is a trade-off between the course cost, inclusive of the real value of instructor/TA time, and workshop affordability for the learners.
Future directions. While the data indicate that EDAMAME workshop was effective, only a limited number of learners can be accommodated per year, and there is high effort from the instructional team to support them. This is a low-throughput model of skill development. We are eager to reach a larger learner pool than what we could accommodate in the classroom. In 2016, we experimented with live engagement of three to five remote learners (the number varied by tutorial) using free conference calling and screen sharing resources. The remote learners participated as a group at the same location. They engaged with the lectures and tutorials as fully as possible (but missed out on the guest lectures and other events). This added a mild distraction for the on-site learners, but the workshop proceeded relatively smoothly. The biggest hurdle was engaging with the remote learners during tutorials, as they had no classroom support. It is possible that a remote learning workshop could be successful, given an appropriate investment into conference technology, an on-site coordinator dedicated to its logistics, and an enhanced instructional team with traveling TAs dedicated to the remote classrooms.
The content of EDAMAME remains freely available online (https://github.com/ edamame-course), but parts of the content are also being transitioned to local offerings. Many universities desire more offerings of online or digitized curriculum, and there is a question of how to balance the university's need to provide quality instruction for tuition with the open-science philosophy of providing free, democratic access to information. At Michigan State University, we are developing a graduate-level learning module on microbial metagenomics that includes amplicon and untargeted metagenome analysis pipelines. The 1-credit metagenomics module includes hands-on tutorials, is offered twice a week for 1 month, and is accompanied by prerecorded lectures. Postdoctoral trainees or faculty can enroll for a modest fee. Though based on EDAMAME materials, the modular content at Michigan State covers less content because there are prerequisite modules required for enrollment. Learners already have familiarity with the command line, with submitting jobs to the high-performance computing cluster, and with fundamentals of microbial genome analysis. EDAMAME materials have also been extended to international workshops, including a metagenomics 1-day crash course in Rio, Brazil, and a 1-week microbiome analysis workshop at Centro de Investigaciones Biológicas del Noroeste in La Paz, Mexico. In addition, more general tutorials (e.g., shell, GitHub, etc.) remain available from other data science training efforts, including Software and Data Carpentry, and short-format 2-day workshops on these skills are available through The Carpentries (http://carpentries.org).
Finally, we seek to maximize the impact of EDAMAME by offering this kind of training to those who need it most. We hope that the impact of our trainees training others is a lasting legacy of EDAMAME. We have found that our international learners have benefited immensely from this course, as they are challenged by limitations of access to computer resources or training. Going forward, we hope to continue to identify target audiences who could both benefit from our training and extend its impact broadly. Additionally, sequence analysis will continue to evolve with technologies, impacting the depth and breadth of scientific questions and experiments that are imaginable. We hope that our course content can continue to remove obstacles for scientists who wish to engage in these technologies.