What do we know about grant peer review in the health sciences?

Background: Peer review decisions award an estimated >95% of academic medical research funding, so it is crucial to understand how well they work and if they could be improved. Methods: This paper summarises evidence from 105 papers identified through a literature search on the effectiveness and burden of peer review for grant funding. Results: There is a remarkable paucity of evidence about the efficiency of peer review for funding allocation, given its centrality to the modern system of science. From the available evidence, we can identify some conclusions around the effectiveness and burden of peer review. The strongest evidence around effectiveness indicates a bias against innovative research. There is also fairly clear evidence that peer review is, at best, a weak predictor of future research performance, and that ratings vary considerably between reviewers. There is some evidence of age bias and cronyism. Good evidence shows that the burden of peer review is high and that around 75% of it falls on applicants. By contrast, many of the efforts to reduce burden are focused on funders and reviewers/panel members. Conclusions: We suggest funders should acknowledge, assess and analyse the uncertainty around peer review, even using reviewers’ uncertainty as an input to funding decisions. Funders could consider a lottery element in some parts of their funding allocation process, to reduce both burden and bias, and allow better evaluation of decision processes. Alternatively, the distribution of scores from different reviewers could be better utilised as a possible way to identify novel, innovative research. Above all, there is a need for open, transparent experimentation and evaluation of different ways to fund research. This also requires more openness across the wider scientific community to support such investigations, acknowledging the lack of evidence about the primacy of the current system and the impossibility of achieving perfection.


Introduction
Health research has contributed enormously to society, but it is also expensive. This has led to increasing demands to understand and improve how research is supported. Most effort has focused on evaluating impacts of research, on society and the economy. Funders are attempting to gather evidence of impact using online survey platforms such as Researchfish in the UK, and national assessment frameworks including Excellence for Research in Australia (ERA).
Much less work has focused on understanding how research is selected for support. Peer review is used to allocate the vast majority of competitive research funding internationally (Ismail et al. (2009) estimated that >95% of UK medical research funding was allocated by peer review). Therefore it is crucial to understand whether peer review is effective and efficient -whether it can fairly, reliably allocate research funding without bias. In this study, we carried out a rapid evidence assessment which asked whether the peer review process lives up to these aspirations.
The research was commissioned by the Canadian Institutes of Health Research (CIHR) to support an ongoing review of CIHR's peer review system, particularly the Peer Review Expert Panel which was convened to review the design and adjudication processes of CIHR's investigator-initiated research programmes.

Search strategy
We identified relevant literature through five routes, using 2009 as our cut-off date because this was the date of our previous review (Ismail et al., 2009): 1. Google Scholar search using the search terms below, for publications from 2009 onwards. We reviewed the top 500 search results for each query. GRADE is an internationally accepted system for the assessment of evidence quality. GRADE offers four levels of evidence quality: high, moderate, low, and very low. Randomised trials begin as high-quality evidence and observational studies as low-quality evidence, and studies may be downgraded as a result of limitations in study design or implementation, imprecision of estimates, variability in results, indirectness of evidence, or publication bias. Equally, quality may be upgraded based on a very large magnitude of effect or if all plausible biases would reduce an apparent effect (Guyatt et al., 2008). effectiveness and/or burden of grant review processes. Studies were excluded on the basis of being: -Purely descriptive, describing a specific peer review process.
-Focused on wider concerns around the funding process, with no (or only tangential) reference to the peer review process in particular.
-Focused on manuscript peer review rather than peer review for funding purposes.
-From 2008 or earlier.
-Reviews, with no additional synthesis or analysis, summarising work from before 2008, or studies already identified and included individually.
If studies were relevant full text was retrieved and an Excel spreadsheet was used to capture key information on the study and its conclusions.
We identified 105 studies for inclusion. Table 2  When synthesising our findings, we also drew on our previous review of the topic (Ismail et al., 2009).

Results
We summarise our findings in Table 3 with each discussed in detail below.
Is peer review an effective system for awarding grants?
The meaning of 'best' science is not fixed. What constitutes the 'best' science will vary, however it may include research that is innovative, interdisciplinary and applied. This section considers biases against any particular type of research and whether peer review is a good predictor of future success.
Peer review is probably anti-innovation. Braben (2004) has suggested that supporting highly innovative research is important because it drives technological change and economic growth -an idea increasingly embraced by research funders. NIH has expressed concern at falling numbers of innovative or risky applications, suggesting 'competitive pressures have pushed researchers to submit more conservative applications' (Kaplan, 2005;Scarpa, 2006  There has been limited further work in this area since 2009. Increasing the size of the review panel and broadening the range of expertise and disciplines present has been suggested as a way to address these problems. However, this increases burden and can only work if the role of the initial in-depth reviewer(s) is diminished (Gluckman, 2012).

It is not clear if peer review fairly assesses applied research. The
Cooksey Report on health research funding in the UK noted that peer review 'can in some instances inhibit programmes in translational and applied health research' (Cooksey, 2006b). The report suggested that one reason for this inhibition was because peer review prevented the iterative development of research projects where funder and researcher worked together. Cooksey also suggested that because applied researchers publish in specialist (i.e. lower-impact) journals, they received less credit for publications than basic researchers. Including research users and considering the likely impact of research as part of the funding process may address these concerns. In our 2009 review, we noted the Canadian Health Services Research Foundation pioneering work through the use of 'merit review panels' to evaluate proposals, combining members from both academic and wider user/policy communities. This approach has now spread to other major funders, notably NIHR. Considering impact at the application stage -an approach criticised for disadvantaging innovative research -is likely to be beneficial when reviewing research which is closer to being applied. As of 5 January 2017: https://www.macfound.org/programs/fellows/ review, namely individual reviews and overall consistency of decisionmaking -and how they might be addressed.

It is clear that ratings vary considerably between reviewers.
Single-rater reliabilities 5 are not encouraging, but have been hampered by the methodological difficulties of modelling the complex interactions between reviewers in multi-stage peer review processes.
In particular, the work of Jayasinghe et al. In contrast, two studies have found a higher level of agreement between reviwers. The first study which built in some of the complexities of the peer review process, found a dependent reliability 6 rating for individual peer reviewers of 0.80. The second study on the review process for Marie Curie Actions (a major EU funding stream) measured inter-rater reliability based on the average deviation in scores between raters, and found a high level of agreement (Pina et al., 2015).
Strikingly, the chance of improvements from initial ratings during panel discussion is virtually nil (e.g. from 'no award' or 'possible award' to 'award'). This suggests that initial triage of applications may be preferable to re-rating rounds (Bornmann, et al.).
Increasing diversity of background and discipline of peer reviewers also reduces rating consistency. Lobb et al. (2013) identified a low intra-class correlation coefficient (0.12) when comparing reviewers from a research, practice or policy background. They also noted that the level of agreement among experts from different disciplines was considerably lower than that among adjudicators of the same discipline, meaning that the presence of several practitioners from the same discipline area could have the potential to skew funding outcomes, depending on the wider makeup of the panel. This suggests that peer review processes may not work well for transdisciplinary teams integrating both academic and non-academic experts. Taking a different perspective, Reinhart found that although the global intra-class correlation coefficient was 0.41, there were considerable differences between fields, for example, biology Two funders have experimented with, and evaluated, virtual peer review both by teleconference and through the use of Second Life, a virtual world. NIH estimated that using Second Life telepresence, peer review could cut panel costs by one third (Bohannon, 2011). Pier et al. (2015) compared videoconference and face-toface panels. They set up one videoconference and three face-to-face panels modelled on NIH review procedures, concluding that scoring was similar between face-to-face and videoconference panels. Both the Bohannon and Pier studies of virtual panels noted that participants valued the social aspects of meeting in person and preferred the face-to-face arrangements.
Gallo et al. (2013) examined four years of peer review discussions, two years face-to-face and two years teleconferencing. They found minimal differences in merit score distribution, inter-rater reliability or reviewer demographics. They also noted that panel discussion, of any type, affects the funding decision for around 10 per cent of applications relative to original scores.
Approaches to improve reliability have been tried. The NIH peer review self-study suggested some possible improvements to the peer review process to combat low reliability, focusing principally on better training for reviewers (NIH, 2008 In a multi-stage review process, the assessor at each evaluation stage will know the score given to a particular research proposal at the previous stage. This particular study assessed the reliability of grant peer review processes by determining the proportion of those applications for which the dependent ratings on the same proposal did not change from the first to the second and third stage. training should focus on: (1) emphasising the strengths (rather than weaknesses) of research proposals; (2) focusing on the potential impact of research; (3) reviewing the merit of the proposal and not re-writing it; (4) recognising the problem of implicit bias in study sections; (5) using benchmark applications during panel meetings to provide review guidelines; and (6) pointing out potential bias towards lesser known applicant organisations.
Recent work by Sattler et al. (2015) has evaluated the effect this type of brief training programme. The study found inter-rater reliability increased from 0.61 to 0.89, and the amount of time spent reviewing also increased, for both new and experienced reviewers.
If inconsistency stems from discrepancies in review quality (which is by no means clear), it might be feasible to evaluate the quality of reviews, although this approach has its own challenges -for example, what is a 'good' review? If a review is not consistent with other review does that intrinsically make it 'bad'? It could be the outlier picking up on the true potential of an innovative application. However, this approach is used by many funders, as shown in a report by the European Science Foundation (2011) which found in a survey of European research funders that more than half (60 per cent) evaluate the quality of all reviews as standard practice using a range of criteria (e.g. completeness, level of substantiation, appropriateness, comprehensibility, timeliness and usefulness), and may return the review to a reviewer or reject the review. Organisations felt that review quality was higher where these checks were made, but noted little difference quality between cases where all reviews are evaluated versus just a sample. However, no data was available to assess these suggestions, and no empirical analysis had been carried out. Adding such an evaluation process clearly adds to the burden of the process. There is evidence that peer review suffers from cronyism. Cronyism is a concern for many major funders, who have detailed conflict of interest processes in place to counter the presence or perception of such biases. However, (Wenneras & Wold, 1997) show that prior affiliation with a reviewer considerably increased a researcher's chances of funding, Similarly, a large-scale study of applications to the National Science Foundation of Korea found that applications reviewed by previous or current affiliates were more likely to be successful (Jang et al., 2016). A review of NSF proposals reported by Bhattacharjee (2012) is harder to interpret when full proposals and shorter, anonymised versions of the same proposals were compared there were only weak correlations.

Is peer review fair?
Panelists and applicants suggested anonymisation made a difference, but the shorter length of proposals was also seen as important.
Luukkonen (2012) notes that panel debate may fail to counter crude forms of cronyism since panels often cover a wide area of research, and each specific area is only represented by a few experts, so the other members may defer to the experts' knowledge. Members of funding panels may also benefit directly from their membership. One study noted that panel members submit more applications, and have more grant awards (van den Besselaar, 2012). The challenge in this area is separating factors such as good researchers who submit more applications being selected to join panels or having a better sense of what makes a good application, from nepotism. There is also dispute about how to resolve this potential problem Alberts et al. (2014) suggests that such effects could be countered by broadening 'the range of scientific problems judged by each group and include[ing] a diversity of fields on each panel', suggesting that 'senior scientists with a wide appreciation for different fields can play important roles by counteracting the tendency of specialists to overvalue work in their own field' (p.5777). However, Li (2015) advises caution, noting that though evaluators may be biased in favour of projects in their own area, they are also likely to be better able to assess the quality of those projects, and the benefits of this expertise may well outweigh any possible biases.

Is peer review timely?
There is suggestive evidence that the peer review process slows, and hinders, the progress of research. In some cases such as an emerging epidemic the time taken by peer review could reduce the number of people benefiting from the research, such slowing of the research process could also reduce the economic viability of a new product, (e.g. Agres, 2005; Cures, 2005; Daniels, 2004; Roy, 1985). The many stages of grant peer review can take from 9 to 18 months from submission to funding. It is less clear how often this time significantly hinders the progress of science. In the health sciences, research is one of many steps in develop new treatments and practices (Hanney et al., 2015). Research suggests that the time required for translation of research from initial idea to adopted practice is around 17 years, so peer review may be a relatively small contributor, however any one translation pathway may have multiple stages of peer review (Morris et al., 2011).

There is good evidence that peer review has the support of most major scientific stakeholders.
Though criticism of the peer review process abounds, empirical evidence, though limited, indicates that support for peer review amongst the academic community remain strong (Bornmann, 2011;Wooding & Grant, 2003). The dominance of peer review across funding systems internationally suggests it has the confidence of institutional stakeholders. A recent review of literature about the NIH peer review processes found a firm belief in the transparency and objectivity of peer review amongst grant reviewers (Miner, 2011). There is a striking disconnect between the institutional and community support for the peer review system and the empirical evidence of its effectiveness -unfortunately the scope of our review excluded the types of research that might explain this divergence.
In finding that increased effort did not translate into increased success rates.
A few qualitative studies have examined the burden of the system on particular groups of researchers and the wider implications on researchers' quality of life. A survey of 215 NHMRC applicants concluded that the 'impact of preparing grant proposals for a single annual deadline is stressful, time consuming and conflicts with family responsibilities' (p.1), although it did not quantify the effects or time taken (Herbert et al., 2014). A study of early career investigators applying for funding at CIHR identified the application process as burdensome The institutional costs of application preparation were examined by the US Government Accountability Office (GAO) in 2016, which concluded that pre-award requirements for applicants to develop and submit detailed documentation for grant proposals, and increased prescriptiveness of certain requirements, had increased universities' workload and costs, but the study (GAO, 2016) did not quantify these increases

Burden on reviewers and panel members.
Time invested by reviewers and panel members is consistently identified as the second-highest monetised cost of peer review, making up about 15 per cent of the burden. Two types of studies carried out in this area have both aimed at optimising the process, balancing the trade-off between burden and quality to achieve efficiency.
The first study approach trialled simplified processes for grant review to test how much time they save and whether they affected funding decisions (Herbert et al., 2015) -particular the use of a shortened application form and smaller review panels. They found the simplified processes achieved agreement with the current award system of close to 75 per cent (which they suggested was the 'acceptable' threshold based on a review of previous surveys), at estimated savings of 33-78 per cent of review costs.
The second study used statistical techniques to estimate the optimum number of reviewers (Snell, 2015) trading off improved reproducibility with additional reviewer burden. They found that five reviewers were optimal; similar work by Graves et al. (2011) on a different funding scheme found 11 reviewers was the most effective number.
In addition to experimental changes there are examples of funding agency policy changes that have been examined. The NSF changed its review procedures in 2012 to reduce burden by introducing triage on short preliminary applications with a 75 per cent cull rate, with annual rather than six-monthly applications. The General Accountability Office has praised the system and it reduces administrative burden on programme officers. However, because several changes happened simultaneously, it is not clear whether this is because of the triaging. It also resulted in reduced success rates, partly because of more applications (perhaps because they were easier to write but also because of funding reductions (Mervis, 2016).
One of the drivers of the burden on funders is identifying appropriate reviewers for each proposal. Mervis (2014) reports on a radical experiment at NSF where applicants reviewed each other's grants (each applicant completing seven reviews), consequently reducing this burden to zero. To guard against applicants marking their competitors down, they were rewarded for scores that aligned with the other reviewers. The pilot allowed the number of reviews per proposal to be increased from three or four to seven and the reviews provided were more detailed. Because of the additional reviews, NSF was able to dispense with panel discussion, thus saving administrative costs.

Discussion
In this section we summarise our findings: firstly, on the availability of evidence, considering the scope and coverage of the existing literature; secondly, on what that the evidence shows, and finally, highlighting the implications for health research funders.

Availability of evidence
Questions about the effectiveness and burden of peer review can be addressed at two levels. At a high level, does peer review support valuable science? And at a lower level, can the design of peer review systems be improved to increase effectiveness and reduce burden?
It is clear that the current system of funding has produced significant benefits for society, suggesting that peer review supports valuable science. However, whether peer review is demonstrably better than any other system is impossible to judge with certainty because of the lack of comparators: no funding agencies have made significant use of alternative systems.
Moving to the lower level, considering comparisons between or research on peer review systems, there is only a very small number of robust, well-conducted studies. Much of the literature identified is anecdotal in nature and we found no systematic reviews, underlining the fragility of the evidence base. However, we did identify a series of robust, high-quality studies that have been carried out since our last review in 2009. Despite this new work it is still true that most studies examine the peer review process of one particular funder in one particular context, rather than looking across funders or contexts, and few go beyond process measures to judge effectiveness.
This persistent lack of evidence about the allocation of the 'inputs' to research is all the more striking given the advances in understanding the outputs and outcomes of research through research impact assessment over the last decade.

Findings from the available evidence
The central problem when assessing peer review is the lack of an absolute standard or 'ground truth' to judge against. There will be uncertainty in all peer review decisions -it is, after all, predicting the future. And there is evidence suggesting it is not a particularly good predictor, at least for bibliometric performance. At present most funders do not capture, use, or even acknowledge this uncertainty, despite clear evidence of inconsistency in peer review ratings and mixed evidence on the reproducibility of panel decisions.
These is good evidence that peer review suffers from biases. The strongest evidence is of a bias against innovation and although a range of improvements have been suggested, none have been robustly evaluated. There is some evidence peer review is influenced by cognitive distance and suffers from cronyism and suggestive evidence that there are age biases. Considerable work has been done on gender bias, with conflicting results, which illustrates the challenges of accounting for biases outside the scope of the peer review process, for example through eligibility or the culture of the wider scientific system.
Though the problem of burden is widely recognised, funders' considerations often focus on their own and reviewers' burden as these are more immediately visible (and costly) to them. However, it is clear that the burden largely falls on applicants (rather than reviewers or panel members).
Falling success rates across many funders compound the burden on applicants. One way to address these challenges could be to reduce the complexity of the application process, with evidence suggesting similar decisions can be made with much shorter applications and less information. However, small decreases in application length do not seem to translate into application preparation time so such changes would need to be carefully evaluated.
Despite the plethora of comment pieces criticising the peer review system, there is no empirical evidence suggesting whether peer review has more or less support among key stakeholders than it did in 2009.

Potential improvements
Improving effectiveness. This section outlines our reflections on ideas for improving peer review processes. We concentrate on ideas that augment or refine peer review -as those approaches were most comprehensively covered by our search approach. Other approaches that are more complete alternatives to peer review for example peer to peer allocation were beyond the scope of this review (Bollen et al., 2017).
We feel the uncertainty in peer review -clear in the inconsistency of ratings and weak predictive power in terms of future academic performance -should be acknowledged, captured and used to improve decision making and for analysis. Reviewers should be asked both for their rating of the proposal and a measure of their confidence in this rating -some smaller funders, such as the Villum and Velux Foundations in Denmark, are starting to implement such systems. Funders could also analyse levels of disagreement between reviewers, which may be an indicator of innovative research (Linton, 2016), or take a portfolio approach selecting projects scoring highly across different criteria, including innovation (Lee, 2015).
A second approach is to acknowledge the difficulty of predicting the future and introduce an explicit element of randomness into the allocation system. This could be done to differing extents -from completely random allocation of funding to the use of a lottery system within set groups of applicants. Fang & Casadevall (2016) propose a two-stage system, in which the best applications are identified and then a smaller percentage are funded using a lottery. Avin (2015) proposes using two thresholds, above the higher threshold all applications are funded and below the lower threshold all applications are rejected, applications between the two thresholds are funded at random, effectively blurring the funding line.
A lottery approach should reduce biases in decision making since the selection from the fundable pool is random; however, applicant eligibility restrictions/selection for the lottery could reintroduce bias. Selecting into a fundable pool requires less fine-grained decisions addressing concerns about the reliability of peer review. The use of lottery systems is a promising, but politically challenging idea, so far is has only been used in very limited cases, such as the Explorer Grants offered by the Health Research Council of New Zealand; the Seed Projects offered by Science for Technological Innovation also in New Zealand and the Experiment! Grants from the Volkswagen Foundation 9 , and as such we think using elements of lottery allocation merits further empirical research (Barnett, 2016). Complex approaches combining assessment and lottery, although theoretically attractive, suffer from the disadvantage of sacrificing understandability (Kurokawa et al., 2015).
Other approaches to address bias include blinding of reviewers (e.g. Lee et al., 2012), though the feasibility of this is debated (Bhattacharjee, 2012). More practically funders have also used training approaches to address bias (e.g. CIHR) and to improve quality of reviews (e.g. NIH, 2008) and there is limited evidence that the approach could reduce the discrepancies between reviewers (Sattler et al., 2015).
Reducing Burden. Applicant burden should be considered as a priority compared to reviewer and administrative burden as it represents around 75% of the system burden. This can be addressed by reducing the level of burden or increasing the value unsuccessful applicants receive by applying. Changes to reduce burden need to be carefully evaluated as there is evidence that even significant reductions in application length/complexity may not reduce applicant burden as much as expected. An alternative approach is to make the process more valuable for the applicants. Reviewer and panel feedback may be one way to do this (although one of the reviewers of this paper noted the concern that providing feedback may open a funder to appeals from rejected applicants).
Technology provides ways to reduce the time burden of the peer review process for panel members and funders -for example by eliminating travel -and does not appear to significantly affect the outcomes. However, face-to-face discussion of applications brings other side-benefits, including social interaction and network formation, other research suggests these side-benefits may be important to the progress of science and hence may need to be supported in other ways if peer review is done remotely.
Altering the format of research proposals to incorporate multimedia or video has been suggested as a way to improve information transmission and reduce burden, but the effects of doing so have not been tested (Doran et al., 2014).
Improving the evidence base. It remains striking how little robust evidence is available about peer review as a method for grant allocation. Given the centrality of the peer review process in the current science funding system, there is a need for better evidence, not only on the overall effectiveness of peer review but also to help improve the design of peer review processes. We suggest three fruitful areas for investigator are the links between the peer review process and the wider context of science funding; the social processes of peer review and panel meetings.
System changes (such as the overall amount of funding) affect the peer review process, and peer review changes affect the system, so both need to be considered together to understand the dynamic behaviour of the overall research process. Nearly all of the studies we identified considered aspects of the peer review system in isolation -for example tracking success rates or reviewer burden. However, system changes such as decreased funding, or changes in researcher demographics, often happen alongside, and interact with, changes to the peer review system. To address these questions may require developing the modelling and simulation approaches such as those in Avin (2015) Even in the fairly barren landscape of evidence we explored, it was startling that we could find no studies examining the social processes that occur during panel discussions -a central part of the peer review process. Such studies will clearly be challenging and require the cooperation of funders working in concert, but we feel are essential to understand how to optimise one of the fundamental processes of science.
At a more mundane level, funders should be more willing to experiment with, evaluate and publish results from evaluations of alternative approaches. Through our conversations with funders it appears that where analysis is carried out it is often not published, partly because of the extreme sensitivity around funding allocation procedures. Funders are not the only ones who need to take a more reflective approach: they will need the support of the wider scientific community to support such investigations, and acknowledge the lack of evidence about the primacy of the current system and the impossibility of achieving perfection.

Conclusions
Many criticisms of the peer review system reflect conflicts between the needs of stakeholders. Researchers look to peer review to uphold research standards and promote the 'best' science, while politicians and funders use it to provide accountability for spending (Viner et al., 2004). This tension requires peer review to both protect the identities of reviewers while appearing transparent to applicants; to be innovative yet assure quality; to be based on human judgement yet free of human biases (Hackett & Chubin, 2003).
We think that current dissatisfaction with the peer review process is amplified by falling success rates, so it is important to remember that the concerns around peer review are heavily influenced by funding policy and the size of research budgets.
As a society, if we are to improve how we use our research funds, we need a better understanding of the peer review process. When making changes, funders should: build in before and after comparisons; strive to make data available for analysis; openly publish studies of their processes and work together on comparative analysis.
We need to overcome the reluctance of funders and scientists to acknowledge the uncertainties intrinsic to allocating research funding, and encourage them to experiment with peer review and other allocation processes.

Data availability
All data underlying the results are available as part of the article and no additional source data are required. This is a nice and comprehensive review of the studies that assess the strengths and weaknesses of the peer review process.
Although there are no new conclusions, it does reveal or highlight the three major issues that plague our mechanism to dole out research funding and the use of peer review.
The first is fundamental -without a clear and agreed upon definition of what constitutes the "best" science, or the "best" outcomes, it will be impossible to know whether the grant adjudication process is meeting, or can ever meet, its objectives.
Second, the paper highlights what is often overlooked, and this is the real cost in time of writing funding proposals. Although scientists often manufacture a narrative that writing grants is good for them (for example, making you catch up on the literature), the real costs to the system are rarely factored in. I think this paper did a nice job in highlighted this issue.
Finally, I would have liked a greater discussion on what is an amazing disconnect. All the empirical evidence highlights the deficiencies of the process to allocate grant funding. It is clear that it is neither scientifically founded, nor evidence-based. Yet one of the only strongly supported aspects of the peer review process is that is has the strong support of the community. I find this fascinating. In terms of using technology in the review process, some researchers have suggested that videos may produce more reliable peer reviewer ratings and take less time to prepare: Doran MR, Lott WB, Doran SE. Trends Biochem Sci. 2014 Apr;39(4):151-3. doi: 10.1016/j.tibs.2014.01.004. Multimedia: a necessary step in the evolution of research funding applications .

Minor comments
Introduction, 1st paragraph. As an Australian researcher I would argue that the ERA has not really measured research quality, rather it has simply measured research output. Maybe you could say "Funders have attempted to gather evidence…" Finally, your conclusion is spot on. Data is very scarce in this field and without cooperation and data sharing from the research funding community, progress in this area will be very slow.

No competing interests Competing Interests:
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com