Staffing and Workflow of a Maturing Institutional Repository

Institutional repositories (IRs) have become established components of many academic libraries. As an IR matures it will face the challenge of how to scale up its operations to increase the amount and types of content archived. These challenges involve staffing, systems, workflows, and promotion. In the past eight years, Kansas State University’s IR (K-REx) has grown from a platform for student theses, dissertations, and reports to also include faculty works. The initial workforce of a single faculty member was expanded as a part of a library-wide reorganization, resulting in a cross-departmental team that is better able to accommodate the expansion of the IR. The resultant need to define staff responsibilities and develop resources to manage the workflows has led to the innovations described here, which may prove useful to the greater library community as other IRs mature.


INTRODUCTION
In 2013, institutional repositories are well established components of many academic libraries.
As an institutional repository (IR) matures it will face the challenge of how to scale up its operations to increase the amount and types of content archived. These challenges involve staffing, systems, workflows, and promotion. Although there is a growing body of literature describing the content, implementation, and marketing of a newly established IR, there are fewer articles describing the evolution of processes as an IR matures and grows in size.
In his article from 2003, Clifford Lynch defines a mature institutional repository as follows: …a mature and fully realized institutional repository will contain the intellectual works of faculty and students-both research and teaching materialsand also documentation of the activities of the institution itself in the form of records of events and performance and of the ongoing intellectual life of the institution. It will also house experimental and observational data captured by members of the institution that support their scholarly activities. (p. 328) By Dr. Lynch's definition, Kansas State University's IR, K-REx, has not yet achieved full maturity. K-REx, which has been in operation for eight years, was originally developed as a platform for student theses, dissertations, and reports. Over the years it has evolved to include scholarly works of faculty and students, conference papers, and selected departmental publications. Initially a single staff member was responsible for all aspects of the repository, but this was not a model that could accommodate growth. The desire to scale up the operation, expanding the number of faculty participants and content, was addressed as part of a library-wide reorganization that provided more staff working as a cross-departmental team. This staff expansion, in turn, created the need to redefine staff responsibilities, develop resources to manage workflows, and provide greater efficiencies. These challenges have been met with some innovations that may be useful to the greater library community as they manage their own maturing IRs.

LITERATURE REVIEW
To provide context for a discussion of how K-REx operations have evolved from a single staff member handling the deposit of articles to a cross-departmental team approach, a search of library literature was conducted to investigate processes and workflows from other institutions. Searches included keyword combinations such as "technical services and institutional repositories" and "institutional repositories and workflows." While a number of articles exist in the literature regarding the implementation of IRs at various institutions, in many cases they deal with the creation of a new IR by describing the cost, technology, services, and policies. For example, Baudoin and Branschofsky (2003) describe using DSpace for their IR, the importance of creating policies and advocacy, and the application for funding, along with a short discussion of the impact on the library's organization. More recently, Oguz and Davis (2011) discuss the creation of an IR at a medium, four-year university on a limited budget, using a survey of faculty to discover their familiarity with IRs, their self-archiving habits, and to see how that may translate to use of an IR at their institution.
Strategies for organization of content are also often stressed in these case studies, such as by document type within a collection, as well as decisions regarding the collection scope (see (Cohen and Schmidle, 2007)). These studies also frequently discuss the need for cross training of technical services staff to provide support for the IR.

Workflow
However, while there is recognition of the need to train technical services staff, there is a dearth of articles discussing the specific role of technical services in IRs, and few describing workflows (Connell and Cetwinski, 2010). Recent explorations of workflow needs and common practices are provided by Morrow and Mower (2009) and Hanlon and Ramirez (2011). In their introduction of the University Scholarly Knowledge Inventory System (U-SKIES), Morrow and Mower (2009) address the need for a workflow manager as a way to coordinate multiple persons in the deposit of numerous articles, track what has been done where, and what policies and communications apply. A survey of institutional repository managers by Hanlon and Ramirez (2011) indicated that a majority of IRs follow a mediated deposit process, with librarians and library staff holding the role of copyright clearance. In many cases, using SHERPA/RoMEO or similar tools, checking publisher policies and author license agreements, and contacting the publisher have been built into the deposit workflow. In addition to these articles, which focus on workflow issues related to article deposits, Boock and Kunda (2009) compare the workflows for depositing electronic theses and dissertations in the Oregon State University IR versus processing the print equivalent in the OSU Libraries, with an eye to demonstrate both efficiency and savings of cost and time.
The workflow challenges of "non-traditional" IR deposits (i.e. non-article/ETD content), particularly those related to data curation, have also been the focus of recent literature. Data is often either deposited in an IR or in discipline-or domain-specific repositories (e.g., Yoon & Tibbo, 2011). Delserone (2008), at the University of Minnesota, describes the preparation for the curation of subject-specific data managed at UMN, including the importance of having an IR in place. Additionally, Witt (2008) discusses the importance of data curation, the challenges therein, and the necessary resources for such a project.

Promotion
Another aspect of the literature that is directly related to the 'back-end' processes of IR management is the discussion of advocacy, marketing, and recruitment of content for IRs. Successful promotional efforts lead to the need for efficient workflows, while the efficiency of the workflows can itself become a promotional asset for the IR program. Multiple authors stress the importance of librarians in reference, liaison, and subject specialist roles in marketing the IR and communicating with faculty about the features and advantages Bell, Fried Foster, & Gibbons, 2005;; [see additional articles in the special issue of Reference Services Review 33(3) 2005]). Aggressive (but not overly aggressive (Troll Covey, 2011)) marketing and value-added services are necessary to increase faculty participation. Removing barriers by offering to check publisher policies, insuring compliance, and depositing the work on behalf of the faculty member can benefit the IR.
Providing services that remove barriers to participation can help ameliorate the difficulty of recruiting faculty content according to Bankier and Perciali (2008), who point out that for many universities the "core mission is to advance research and scholarship," while making that research publicly available is secondary. The 'build it, and they will come' model (Giesecke, 2011) is not enough; additional incentives must be built in as well. Giesecke (2011) describes three other models for faculty content recruitment that build on this idea: making the deposit of articles appear fun and attractive; self-archiving mandates; and providing services. With this final model, Giesecke builds upon Lynch's definition of an IR as a set of services to include metadata, preservation, and technical assistance (Giesecke, 2011;Lynch, 2003). Indeed, the services approach can be turned into a marketing tool, as shown by Utah State's approach of using copyright clearance services to market their IR (Leary, Lundstrom, & Martin, 2012).
As Leary et al. (2012) point out, continued marketing leads to continued growth of the IR, making it all the more necessary that the IR runs smoothly. This is going to impact the workflow for the IR, whether it is by bringing in subject librarians for copyright clearance, as illustrated by Utah State, or using catalogers for metadata processing, as illustrated by University of St. Andrews (Aucock, 2012).

Development
Kansas State University's institutional repository, K-REx, was launched in 2004 as a platform for students to electronically deposit their theses, dissertations and reports (ETDs). The primary partners in the repository's development were the Graduate School, the Libraries, and the Office of Mediated Education which provided the technical support. DSpace was chosen as the repository software, which was hosted on campus servers, and a staff member was hired to serve as repository manager.
K-REx remained strictly a repository for student ETDs for the first four years of its existence. Library cataloging staff developed procedures to review the students' submissions prior to entering the bibliographic information into the local catalog and OCLC. Submission processes were refined, and the result was a successful repository of student graduate work. But change was already being envisioned.
During 2007, technical support for K-REx was relocated to the Libraries after a gradual transition. This action was key to establishing the Libraries as the home of the institutional repository.
With the Libraries' support, interest turned to capturing faculty research and publications in the repository. Libraries' staff began to define the services necessary to attract faculty participation, define Dublin Core metadata, and develop K-REx input screens to archive faculty's scholarly works.
Whereas students were mandated by the Graduate School to deposit their theses and dissertations into K-REx, there was no similar mandate for the faculty. The first overture to faculty was to the university's Food Science Institute in 2008 which resulted in a strong endorsement by that faculty and the first faculty article in K-REx. The next exploration was to the Department of Animal Science to bring their annual conference proceedings into K-REx. These early successes led to two major discoveries: it was too much to ask faculty to submit their own work, and it was time-consuming for librarians to create the metadata if they didn't do it enough to develop expertise. The conclusion was that the Libraries would have to assume the submission work for faculty publications and devote staff resources to do it.
The primary staff resource continued to be the repository manager, with some assistance from librarian subject specialists in promoting K-REx to faculty. The repository manager did most of the promotion and virtually all of the actual submissions into K-REx. He developed the basic workflow for ingesting faculty content into the repository that is still followed today: • Step 1: Contact faculty member(s) to describe the benefits of depositing scholarly works in K-REx. Interested faculty respond with citations, vitae, or by providing actual documents to be archived in the repository.

•
Step 2: With a specific citation in hand, check SHERPA/RoMEO or the publisher's website to identify the policy for an author's right to selfarchive. The repository manager began a wiki to document the policies for each publisher, copying actual text from publishers' websites and adding his own comments as needed.
• Step 3: If the publisher permits archiving, obtain the text either online or, more frequently, from the author in manuscript form. All content files are stored on the libraries' local area network (LAN).

•
Step 4: Create the metadata in the repository and attach the content. The repository manager created a template for a cover page that contains citation and other relevant information and is combined with the text in a single PDF document for the repository.

•
Step 5: Communicate again with the faculty member, providing the repository handle for the archived content.
The repository manager was able to archive faculty material single-handedly for two years, archiving an average of 80 items per year. This one-man operation worked well for a low-volume repository, but was not sustainable if K-REx was ever to expand.
In 2010, the Libraries went through a major reorganization which had a significant impact on K-REx. In addition to the repository manager, two librarians, one from Metadata/Preservation (MP) and one from Scholarly Communications/Publishing (SCP), were assigned part-time to K-REx as well as two paraprofessional catalogers. The next challenge for K-REx became the creation of processes and mechanisms which would spread operational assignments among several people in different departments and enable handoffs from one person to another throughout the process.
The first task was to define roles for the new staff and redefine the role of the repository manager. This process took several meetings among the librarians and their supervisors. Ultimately, the roles were divided into four primary areas: • Collection development: Determining what content is appropriate for K-REx.
• Promotion: The continuing effort of contacting faculty both individually and at the department level.
• Pre-processing: Checking publishers' policies, obtaining the necessary files and manuscripts, and handing off work to the metadata-creation staff.

Figure 1. Basic Workflow
This original workflow is still the basis for the enhanced workflow used today.
• Metadata creation: Creating the cover page and final PDF file, entering metadata in DSpace to create item records, and attaching the associated file.
While the repository manager continues to provide general oversight to the project, with this new structure he and the SCP librarian took responsibility for collection development and promotion. The MP librarian took direct responsibility for pre-processing and supervised the work of the two paraprofessional catalogers who create the metadata.
With the roles assigned, the next task was to revise and create the mechanisms whereby five people could seamlessly perform the work that was formerly done by one person. There were three primary areas of development: • Creating folders and subfolders on the LAN to store manuscripts, procedures, sample letters, and work product related to metadata creation.
• Refining a local wiki for publishers' policies so that information was clearly formatted for easy interpretation.
• Creating a workflow management system (WMS) to allow easy sharing of responsibilities as an article moved through the processes from identifying an item to the final deposit in K-REx.
A later, transformative process was developed in 2011 to download citations from external databases into the WMS using RefWorks. These four areas are significant to the success of the operation and each will be described in greater detail below.

LAN files
In the earliest stages of ingesting faculty works into K-REx, a folder for K-REX files was created on the Libraries' LAN. Initially the K-REx folder simply contained a sub-folder for each faculty member who submitted a manuscript. With the expansion of the K-REx operation and the creation of written procedures and other documentation, the LAN files were expanded to include folders for procedures, permissions received from publishers, and resources such as forms, sample letters, and an APA style guide. From its humble beginning of four faculty folders, the LAN now has folders for 334 faculty and the list grows every week. Many of these faculty folders contain multiple sub-folders, each representing a separate article, book chapter, or presentation. These sub-folders generally contain the author's manuscript, the published version to assist the metadata creation staff, and the cover page containing both the K-REx and the published citations, plus the url, copyright statement, and digital object identifier (doi) for the published article (Figure 2, following page).
Because the number of faculty participants and folders has increased significantly, we now add processing status to the file name, such as "requesting MS" [i.e. manuscript], "requesting permission" [from the publisher], "ready," or "finished." This added information helps in managing all of the folders.

Wiki
Another early K-REx resource was a local wiki which was created to record information about publishers' policies. Although SHERPA/RoMEO was used to a great extent initially, it soon became apparent that something more was needed to record information that wasn't available in that source as well as local notes. As publishers' policies were identified, either from websites or actual correspondence, this information was stored in the wiki. There was no standardized formatting of the information on the wiki, however, so it was sometimes difficult to interpret.
With the addition of staff, particularly the paraprofessional catalogers who would create the metadata, it became imperative that the publishers' policies on the wiki be both clear and consistent. We identified a consistent format with six labels: • Link to publisher's policy online • Text of publisher's policy for self-archiving • What we can put up • What we need to add • Embargo • Notes By using these six fields consistently (Figure 3, page 7), it is easy to identify the critical information needed to request and set up files and create the actual metadata in K-REx. Journal of Librarianship and Scholarly Communication | jlsc-pub.org JL SC

Figure 2. Cover Page
This page is combined with the author's manuscript into a single PDF file for the repository.

Workflow management system (WMS)
This locally developed system is the centerpiece around which all of the processes revolve. It is the means by which different staff members and tasks are assigned to each item. Since its development in 2011, the WMS has proven to be highly flexible, providing basic operational functionality plus added features for downloading citations from external databases and creation of management statistics.
The system was developed using Apache, PHP, and MySQL software. The developers' intent was to create a tracking system with the capability of simplifying the multi-faceted operation involving multiple tasks and workers yet capable of expanding to accommodate future needs. Initially the system contained only task (item) data. The record for each item (Figure 4) includes the title, local author, publisher, and a note field. In addition, the status can be assigned to each item from a drop down menu (e.g. contacted author, assigned to metadata team,

Figure 3. Publisher Policy on the Wiki
Although the amount of text varies by publisher, the format is consistent.

Figure 4. WMS Item Screen
Note down arrows to select from available options. Citation information is downloaded from RefWorks. ready for approval, closed, etc.) and a staff member assigned. The status and staff member are changed as the item moves through the various processing steps. Display of the task list is very flexible, allowing sorting by author, title, status, user, date created, and last update.
In 2012, the WMS was further refined to add publisher data with links to the item records, so the publisher for each item could be selected from a drop down menu. The publisher data also includes fields for the same data elements used in the wiki, but unfortunately the data could not be imported from the wiki into the WMS so those fields have not been populated.
One of the features designed into the initial WMS was an interface with RefWorks to import citations into the system and create item records automatically. By the end of 2011 we had moved beyond the basics and were ready to implement this feature. This process, described in greater detail below, has more than doubled the amount of material identified for K-REX and has almost unlimited potential for growth.

RefWorks
In early 2011, the SCP department started collecting data pertaining to article publication by Kansas State University authors using the Web of Science database and the RefWorks citation manager. By that fall, the SCP librarian developed a promotional project using that data to contact faculty and add new records to the WMS. That project provided the basis for the current RefWorks workflow.
Weekly searches are run in Web of Science for articles with Kansas State University in the address field. The resultant records are exported from Web of Science into RefWorks, where the data are reviewed for content development criteria. In order to reduce the number of non-productive records, stop lists have been created of journal titles and publishers that do not permit selfarchiving, and of faculty authors who do not wish to participate. Articles with an author, title, or publisher on the stop list or with innumerable authors, where it is difficult to identify the local author, are removed from the pool. The remaining articles are left in the pool with the understanding that publishers' policies related to selfarchiving in institutional repositories will be checked later in the K-REx workflow.
The university directory is used to identify contact information for those authors associated with Kansas State University. This information is then used to craft invitations to faculty authors to archive their recently published works in K-REx. These invitations introduce faculty authors to K-REx, explain the benefits of archiving their works in an open access repository, and request their permission to archive the work.
In 2012, the RefWorks operation was expanded to include two other staff members-the repository manager and an SCP paraprofessional-with the workload divided among the three. The scope of searching was expanded to include Scopus, and other databases have also been tested to determine how their coverage compares to those currently used. Additionally, a standing permission list has been created for authors who wish to give us their permission to archive all of their works going forward without waiting for their permission for each individual work.

Assessment
At this point in its development, K-REx is a wellestablished repository. It is staffed by librarians and paraprofessionals with expertise in their tasks, and the repository is growing at a rate of approximately 6000 items per year. Some aspects of the operation are working well and others provide challenges that have yet to be met. The workflow management system has been a great success, allowing easy handoffs between several individuals in two different departments and expanding to provide greater efficiency as processes develop. The system was designed with tools to simplify operations, such as the RefWorks interface, and the ability to add new features as needed, such as the publisher data. This flexibility provides the means of increasing both the capacity as well as the efficiency of the overall operation. Most recently, fields were added to store the email addresses of all university faculty and students involved in a publication so they could all be notified when the item was archived in K-REx. The hope is that this expanded notification will serve as an effective promotion device and ultimately result in greater faculty participation.
There is one feature of the WMS that remains to be implemented: adding publisher policies to the publisher database. As noted earlier, the publisher database was constructed with fields for each of the data elements in the wiki publisher policies. Because the wiki functions well, the incentive to transfer the data from the wiki to the WMS has not been a high priority. With almost 200 publishers listed on the wiki, manually transferring the data would be laborious. When time permits, development staff plan to make this transfer automatically.
The ability to download citations from commercial databases into the WMS via RefWorks has also been a major success, providing a steady stream of journal citations by university authors who can be invited to archive in K-REx. The two databases currently used for the process-Web of Science and Scopus-are providing a wealth of science and technology publications. However, the current practice is overlooking social science and humanities publications which must still be discovered through one-on-one contacts. Finding a means of developing suitable notification mechanisms for social science and humanities publications remains a major challenge.
Because the archiving of faculty works requires input from both faculty and, in many cases, publishers, the workflow is not steady. The one consistent bottleneck in the K-REx operation is the failure of some faculty and publishers to respond to requests for manuscripts or permission to archive. Faculty frequently ignore email requests, particularly at busy times during the semester, or may be unable to locate the manuscript version that can be archived. Publishers, too, are often slow to respond to requests for permission and some never respond at all. There are also, of course, some publishers with policies that prohibit archiving in institutional repositories. The result is that many of the new items identified for K-REX are never successfully archived. The use of external databases to identify articles by university faculty has increased the number of items available, but that doesn't necessarily translate into a steady flow of material for those creating the metadata. One way to mitigate this unevenness is to obtain curriculum vitae from eager faculty who readily respond to requests for manuscripts. But even this eagerness can be squelched by publishers who refuse permission to archive. Journal of Librarianship and Scholarly Communication | jlsc-pub.org JL SC Participation by multiple staff from different departments has worked very well, largely due to the WMS and the RefWorks processes which have been designed to facilitate handoffs. However, another significant factor in this success is regular communication among the staff involved. The development of close working relationships has fostered a strong team spirit that transcends departmental lines.

NEXT STEPS
As noted above, there are three areas of improvement that have currently been identified: • Identify more social science and humanities content.
• Address the uneven workflow.
• Move the publishers' policies from the wiki to the WMS.
Of these three, the most pressing is finding a source or sources to identify faculty work in the social sciences and humanities. Working with faculty one-on-one is not keeping pace with the influx of material from Web of Science and Scopus. We are investigating other databases that might serve as resources for those disciplines, but have yet to identify anything suitable.
Uneven workflow is likely just the nature of this type of operation, reliant as it is on outside factors. One solution may simply be continuing to add as many items as possible into the WMS to provide a regular stream of work to be ingested into K-REx. However, we have also begun to consider a campus-wide open access policy for faculty scholarly works. An open access policy would not only increase the amount of content available to us, but would greatly streamline our workflow by eliminating the effort involved in obtaining permission for each item. This type of policy shift would not happen quickly, however, due to the number of campus constituencies that would have to sign off on the change.
Finally, movement of publishers' policies from the wiki to the WMS would increase the efficiency for those who set up files and create metadata. This is the least difficult of the current challenges, requiring only the time and talent of the technical support staff.
Looking beyond these immediate challenges, the role of K-REx in the university's digital content management plan is evolving. DSpace has been a very suitable platform for the faculty and student scholarly works in K-REx, but it is not as well suited for other types of content. We have turned to more suitable platforms for archives, images, and datasets. CONTENTdm and Omeka, for example, have been used to create image collections, course catalogs are deposited in archive.org, and Archon is used for university archives. The university has recently developed a data management plan which provides for archiving small/inactive datasets in K-REx (and which will necessitate development of a unique workflow), while large/active datasets would be stored on central university computers. This raises the question whether K-REx is the institutional repository or just one of several. A solution to this multi-platform dilemma would be the development of a web portal which would provide access to all of our digital collections, archives, data, K-REx, and digital services, but we have not yet reached that level of integration.

CONCLUSION
The development of K-REx from a simple beginning to a fully functioning repository has been marked by both challenges and opportunities. Scaling up a one-person operation to a cross-departmental team has provided the opportunity to significantly increase the volume of content deposited. The challenge has been to adapt a simple operation to one in which multiple staff members are involved in different parts of the process. This challenge was met by creating a workflow management system that would define the tasks and coordinate the handoffs. Further refinement provided the capability of identifying and downloading citations from external sources and systemizing faculty contacts which significantly increased the volume of available content.
The fundamental issue in scaling up an IR operation is finding the right balance between available staff and available content: having staff without the volume creates frustration, and having volume without the staff leads to overload. To succeed, both staff capability and content availability need to increase in tandem. This case study shows that it is possible to achieve balance as a repository grows by identifying suitable and sufficient content, assigning staff appropriately, and developing efficient systems and workflows that are both flexible and expandable.
Ten years after Lynch wrote his definition of a mature IR, it may be that repositories have developed differently than envisioned. Today we're seeing the development of specialized platforms to manage different types of digital products. Platforms other than the IR may prove more appropriate for image, sound, and data collections, and consequently the IR may be just one of an array of digital content resources. But whether the IR stands alone or within a suite of other resources, it is a product that requires effort and resources to grow and maintain.