The choice of words used by women may have disadvantaged them during the evaluation of their grant applications despite a blinded process, a recent analysis suggests. Whether this applies to other contexts remains to be seen, but perhaps women need not err on the side of caution in their choice of words.
Across the world women are underrepresented in science, have lower salaries than men and are subjected to harassment in educational institutions or workplaces. This can be attributed to the structural factors that put women at a disadvantage, as well as subtle or not-so-subtle gender bias. Gender bias might also affect the evaluation of grant applications submitted by women (and other underrepresented groups), but the evidence appears to be far more mixed and context specific.
Findings from a randomized controlled experiment published this year found little to no gender or race bias in the initial evaluations of applications for the R01 grants of the US National Institutes of Health (NIH). By contrast, another study published this year, which took advantage of a natural experiment made possible by a change in policies at the Canadian Institutes of Health Research, reported a gender gap in grant funding that resulted from less favourable assessments of women as principal investigators.
Common to the evaluations at both these institutions is the fact that the gender of the applicants was known to the reviewers. Given that there is at least circumstantial evidence of bias, one strategy resorted to by some agencies/foundations is to withhold the identities (and thus gender) of the applicants from the reviewers. One would expect such a blinded review process to minimize, if not eliminate, biases, but this expectation had not been put to the scientific sword.
Grant review under the scanner
To explore whether blinded review does indeed eliminate potential bias, a new study1 from the National Bureau of Economic Research (NBER) analysed thousands of applications submitted to the Bill and Melinda Gates Foundation (hereafter referred to as the Gates Foundation). It found that blinded review did not lead to more gender-equal outcomes: instead, reviewers gave significantly lower scores to women applicants. Interestingly, it was a certain conservatism in the choice of words that seems to have disadvantaged the women applicants.
The study looked at a sample consisting of almost 7,000 applications submitted by US-based researchers between 2008-2017 to the Gates Foundation’s Global Challenges: Exploration (GCE) programme. The applications pertained to subtopics within the broader topic of infectious diseases, but the reviewer pool was broad and diverse; that is, each application wasn’t evaluated just by specialists but by academics from different fields and also non-academics. Further, evaluations were individual and not consensus based, unlike at many funding agencies. Taken together, these characteristics were well suited to the study’s aim of exploring whether/how gender might have indirectly influenced decisions and outcomes.
The study was able to rule out potential factors other than gender, such as the choice of topic, that could have influenced the scores. The analysis revealed that applicants with superior publication histories received higher scores from reviewers. The men applicants in the data set analysed had a generally superior record in this respect, so they were conceivably at an advantage. However, the difference in the scores of men and women applicants persisted even after controlling for the effect of varying publication histories, which suggests that the disparity has its origin elsewhere. If the applicants’ gender was indeed the key determinant, how exactly did it contribute to the scores? It turns out that it did so by the differences in the types of words used by the women and men applicants.
It was a certain conservatism in the choice of words that seems to have disadvantaged the women applicants.
Words used by men applicants – “broad” words (e.g., bacteria) that occur at similar rates in all topic areas and are thus not very specific to a given topic – were associated more often with high scoring proposals. Conversely, “narrow” words (e.g., community), which occurred far more often in some topics than others and that were favoured by women applicants, were associated with low scoring proposals. Intriguingly, word choice did not seem to affect the decisions of women reviewers but, unfortunately, the study could not explore this aspect further: as is typical, only 15% of the reviewers in the sample were women and women also made up a small number of the applicants, inhibiting more in-depth analysis. The study further found that although the use of “broad” words did enhance the chances of success, such successful proposals did not necessarily have greater impact after being funded. Interestingly, among the applicants that did receive funding, the subsequent output of women in terms of publications or acquiring additional prestigious grants (such as from the NIH) was as good as or better than that of men.
These findings are the first to reveal the potential impact of gendered language in grant proposals on the review scores and eventual success. They are consistent with the evidence from other types of texts – for example, recommendation letters or performance evaluations – that points to implicit gender bias. However, a closer look at other funding agencies, applicants and reviewers is warranted to explore the potential generality of the NBER study’s findings.
Breaking the linguistic shackles
Meanwhile, though, the findings can certainly inform three possible actions with respect to the GCE programme. The first action is for the Gates Foundation to train its reviewers, especially men, to avoid the seduction of the sorts of “broad” words that were identified in this study. Indeed, the study’s lead author Julian Kolev is quoted in a recent Nature news article as pointing to the need for training reviewers to de-emphasize different writing styles. The diversity of the GCE reviewer pool may pose a challenge in this regard; further, it might also take a while for the effects of any such training to be reflected in the decisions.
The second action that Kolev suggests is to take advantage of the finding that word choice does not seem to affect the decisions of women reviewers. The paucity of women reviewers is a pervasive affliction: for example, women made up less than 30% of the panellists during the evaluation of European Research Council (ERC) grant applications from 2007-2016. Increasing the proportion of women in the reviewer pool of the GCE programme (and, indeed, in all grant review processes) would be beneficial for a variety of reasons. Funders around the world need to take immediate and concrete action on this front.
The third action could focus on the women applicants themselves. Prospective women applicants to the GCE programme could be trained to use language that would enhance their chances of success or, at minimum, not put them at a potential disadvantage. Such training could draw on the NBER study (and any follow-ups) to get an idea of the sorts of “narrow” words that could be minimized and, conversely, the types of “broad” words that could be included. The effectiveness of any such training is difficult to estimate in advance, not least because the study’s text analysis focused on a relatively simple measure in terms of word choice. The authors acknowledge that a more sophisticated analysis is needed to reveal more fully how language might contribute to indirect gender bias. Nevertheless, as the study states, “there is significant scope for female applicants to improve their scores by altering the words they use to describe their proposals”.
Women have done much to crack the glass ceiling. Perhaps it is time to break the linguistic shackles too!
- The study was published as a working paper that is intended for comment and discussion; it has not been peer reviewed.