Category Archives: Scientific progress

How does scientific progress work – distributions of impact

Is science advanced by a small group of scientists or is it more of a large scale collaboration? Answer to this question is pretty crucial, because it largely influences what kind of interventions are best to improve science and knowledge creation. For example, if we found that science is advanced by a small group of scientists rather than by every scientist in a large scale collaboration, then we might further think about how to grow this small group or how to empower them and perhaps let off the others if growing is not possible.   

Small group vs large scale collaboration

One of the good treatments of this question I’ve read so far is this post by José Luis Ricón. My aim here is to quickly summarize its findings and then generate some more unexplored questions that might fill in the blank spaces on the map or might help to make the answers more actionable.

This debate could be framed as Newton vs Ortega hypothesis, where Newton hypothesis is a view that top researchers build on the work of other top researchers and most of the other (average) science doesn’t really matter, whereas the Ortega hypothesis is a view that large numbers of average scientists contribute substantially to the advance of science through their research. 

This post by José Luis Ricón looks at some evidence for both of these hypotheses and concludes that Newton hypothesis is likely correct and Ortega hypothesis false. The strongest evidence in favor of this conclusion seems to be a study by Cole & Cole (1972) and later by Bornmann et al. (2010), which show that most cited papers cite mainly other most cited papers and only rarely cite less cited papers. At the same time, other less cited papers also cite mainly the most cited papers and to a lesser extent (but still more than the most cited papers) cite the less cited papers. See this in a figure from Bornmann et al. (2010):

image-20210112094455330

 

The distribution of impact across researchers also seems to be highly skewed, which excludes another potential version of Ortega hypothesis: that everyone tries to get to the frontier and throughout their career, many people succeed, push the frontier a little bit forward but then fall back and stay behind again for the rest of their career – if that was the case, the distribution of citations across researchers would be less skewed and more uniform than it actually seems to be.

Why is that?

We might ask: Why is that? Why do some researchers produce max more impact than others? Here are couple potential answers:

1) It’s genetic – people at the frontier have some superficial combination of genetic traits that are impossible to learn (e.g. combination of very high intelligence, conscientiousness, need for cognition,..).
But if that was true, then we can argue that the raw population numbers increased a lot during the second half of the 20th century and we should see a proportionately large increase in the total number of positions at the frontier (because more people with this superior mix of skills have been born) – did we see that?

2) It’s behavioural – people at the frontier do something differently than all the others and this can be taught/learnt. Maybe it is the choice of which papers to read, how to distribute their attention, how to ask questions that yield more fruitful answers, how to design the research so it is robust, etc… Evidence in this direction could be that top scientists tend to be trained by other top scientists, as observed on Nobel laureates, National Academy members, and other eminent scientists (Cole & Cole, 1972). Similarly, early co-authorship with top scientists predicts success in academic careers (Li, Aste, Caccioli & Livan, 2019). Maybe there is something hidden in the training process which causes researchers to become excellent. This would be good news, because as long as we can figure out what this is, we can scale it up and train way more researchers in this way, adding many more excellent researchers to the pool.

3) It’s systematic – the limited number of positions must be limited because of some feature of the system. For example, attention (expressed via citations) is a limited resource and always just 5 people can get the full attention, regardless of how many other people there are, whether it’s 5 or 500. Knowledge or quality of work doesn’t seem to be limited in this way (e.g. it is permissible to imagine 500 scientists doing the highest quality work at the same time instead of just 5 scientists), but attention is and that may create dynamics as described above. Moreover in deciding where are we going to allocate our attention, we might use heuristics like “following the work and students of those who already proved to be impactful” which would be in line with the notion that top scientists tend to be trained by other top scientists and early co-authorship with top scientists predicts success in academic careers – only this time this relationship would appear not because of the “inner quality” of these apprentices, but rather because of Matthew effect (see more discussion below).

Another version of this is that there is some unique combination of resources (e.g. concentration of talent) that makes it much more likely scientists to rise to the top. This would explain why top scientists often come from the same few universities. Good thing is, that this might be perhaps easier to replicate and thus increase the total number of top scientists.     

Alternative explanation: Matthew effect

The alternative explanation of the Bornmann et al. (2010) data above could be the existence of Matthew effect alas extremizing of the distribution of paper quality via self-reinforcing mechanisms resulting in more skewed distribution not tracking the differences in quality accurately (via Turner & Chubin, 1976). For this, Ricón mentions several studies studying Matthew effect via effects of prestige on the number of citations a given author gets. One way to study this is to look at how the number of citations of an author’s previous work changes after they receive a prestigious award/grant, relative to the control group (see Azoulay et al., 2013; and relatedly Li et al., 2020). Another, perhaps a bit weaker, method, is trying to explicitly control for prestige as a mediator in the relationship between higher scores given by the funder in the grant proposal assessment and higher citation performance later (see Li & Agha, 2015). These studies conclude that the effects of prestige are either very weak or non existent, and thus the main differences in number of citations are likely caused by differences in actual quality of these papers/authors.  

However, prestige might not be the only, or even the most important, way how Matthew’s effect plays out and causes the skewed distribution of impact. Consider this scenario: As a scientist, you have limited time to spend reading papers. Which papers will you choose to read in order to get the best understanding of a given issue? Perhaps the most cited papers since these seem to have been most useful to other scientists. Then your time for reading papers runs out and you write your own paper – which studies are you going to cite? The already top-cited studies, because these are the ones you managed to read. And thus you’ll contribute to the self-reinforcing mechanism of Matthew effect. Anyway, let’s say that already highly cited papers are a bit old because it took time to collect these citations and you want to be ahead. Ideally, you would like to read papers which are not yet highly cited, but will become highly cited in the future – what mechanism are there to find these papers? Perhaps the most established mechanism is journal impact factor ranking. Journals are kind of prediction machines for future citations (indeed, that seems to be their main role now as they are not necessary for the dissemination of knowledge in the internet age when you can upload your paper just about everywhere, they also don’t seem to safeguard against less replicable/true research) so reading papers from high impact factor journal first would make it more likely for you to spot future highly cited papers. Anyways, similarly to the previous case, you are more likely to cite papers from these journals since this is the literature you have read, and thus actually cause the self-fulfilling prophecy of those papers from those journals being more cited. Note that this mechanism seems to operate regardless of the true value of given papers. The gatekeepers of true value are in this case high impact journal editors who kind of decide which papers will get cited in the future. This scenario could be empirically tested by surveying which papers (e.g. in terms of journal IF or number of citations) researchers (ideally broken down to top vs average scientists)  tend to read (e.g. via reference managers like Mendeley, looking at what papers they spend some significant time reading rather than what papers they just save to read later).

What Newton hypothesis suggests if true?

Anyway, let’s imagine the Matthew effect does not play a large role and the differences in contributions to scientific advancement (differences in citations) are actually that skewed with science being advanced by a small group of top researchers, building on each other’s work. What ramifications does that have in real life?

Firstly, we might want to stop focusing on average metrics when assessing science and move to focus on the best case analysis. Metrics like the average number of citations across funded projects or dollars spent per citation are pretty meaningless when it’s driven by outliers who are actually the only ones pushing science forward.
In the same vein, we might want to take a stance on the question of whether to fund smaller number of exceptional researchers and give them extra resources or distribute the funding among greater number of average scientists (which often promises better dollars per citation ratio – but these average citations are perhaps meaningless since they don’t add up to a few pieces of great work), especially when we are able to predict to some extent who will become top scientist.

Further, the Newton hypothesis also diminishes the importance of many meta-science topics – for example, we might not need to push so hard to establish open science behaviours among all researchers. We might be just ok if e.g. the top 10 % cited papers will be all open access to grab all the benefits that open access promises.  We don’t need the remaining 90 % of papers to be accessible because they just matter far less. Perhaps this would also save a lot of money that would otherwise be paid for enabling open access to papers which are not read and cited anyways (that is if we chose to provide open access via the gold way, which is in my opinion wrong strategy anyway. Maybe we should measure progress in open access and open data not via the total ratio of open access/data papers, but rather via looking at how large the ratio of top 10 % cited papers which have open access/data is. Further, we might not care that much about pre-registration, reporting standards, etc.. of all scientists – maybe it’s enough if the top 10 % papers adhere to these standards. Targeting the interventions in this way might also be more manageable than trying to cause a cultural change among all scientists.    

We could also think bigger picture and ask whether it would be sensible to lay off e.g. the less cited half of scientists and either invest the money in something else or redistribute it to the more cited half of scientists, creating better conditions for them and making the science jobs more attractive. I will not go deeper into this discussion but you can find some argumentation in Ricón’s post 

Can we grow it?


Instead of (or simultaneously with) getting rid of less cited scientists, we might focus on how to grow the number of top scientists. Is it even possible to grow it? The answer to this will perhaps depend on the underlying reasons why the top scientists are top, as discussed above. Anyway, we can also look at some pieces of direct empirical evidence: 

One piece of evidence pointing towards “not possible to grow much” is the notion that despite large increases of funding and number of researchers, the rate of economic growth has not increased. However, this tracks only economic impacts of science and is perhaps not a good metric of it anyway.

Another experiment cited by Ricón is Cole & Meyer (1985) who looked at number of hired assistant professors in physics between 1963 and 1975



and compared it to proportion of physicists that ended up being cited:

showing that the proportion of physicists with 1 or more citations stayed the same even though their absolute number increased and then declined in this time period. This suggests that growing the absolute number of cited researchers is possible – however, we might want to use stricter criteria for top scientist than “having 1 or more citations”. 


Conclusion

It seems that distribution of scientific impact is highly skewed both across papers and across researchers and that top papers/scientists are the main contributors to future top papers/scientists, suggesting that Newton hypothesis is true. Anyway, the extent to which this reflects true value is not necessarily clear, because the same dynamic could potentially be created (or at least supported) by the Matthew effect and distribution of attention. Further this finding might be disputed if there is some value causing the scientific progress that is not tracked by citations (which I plan do have some future post about).

If the Newton hypothesis about true distribution of scientific impact is true, then we might want to change the way we measure scientific success (moving to best cases analysis rather than aggregated average metrics) as well as what kind of programmes we support (perhaps stop supporting the average science). We might consider laying off some part (e.g. half) of the less cited scientists and redistributing the funding to support more cited scientists or elsewhere. At the same time, we might want to think more about and experiment with growing the number of top scientists.

In some future post, I would also like to look at another question that could help us understand how the scientific progress works – whether the progress happens via big jumps (similar to Kuhnian revolutions), small incremental steps or some other models and comparing the empirical evidence for this in different fields. This could give a better picture on whether the progress has the same dynamics and shape for science as a whole or whether different fields work differently. 

What is the trend of scientific progress?

Is there a problem?

Bloom, Jones, Van Reenen & Webb (2020) found in their multiple case studies, spanning Moore’s law, agricultural crop yields, mortality and life expectancy and research productivity in firm-level data, that each domain suffers from diminishing returns to investment. That means that a constant number of scientists would not keep coming up with a constant number of new ideas throughout the time in a given domain, but rather coming up with new ideas gets harder and harder and therefore we should expect them to come up with fewer and fewer new ideas and less progress.


Ideas in some domains might be getting more and more depleted, but new domains are arising and give opportunities for new and fresh ideas to be harvested. That is perhaps how science progress happens. The more surprising question comes when we try to generalize this to science as a whole, as Bloom et al. seem to be suggesting. Is it the case that it is getting harder and harder to find whole new domains of research? That our ability to find new ideas not in a specific domain, but anywhere, is hitting diminishing returns? Maybe our mechanism of knowledge creation and ability to understand the world is hitting diminishing returns as well?

I would say that evidence for this latter claim is much weaker than for diminishing returns in any given specific domain.
First, we can look at how economists measure the economic benefits of scientific progress. Bloom et al. model that by comparing inputs (investment into National Income and Product Accounts’ “intellectual property products”, a number that is primarily made up of research and development spending but also includes expenditures on creating other nonrival goods like computer software, music, books, and movies) and outputs (total factor productivity, TFP, which is a portion of the growth in output not explained by growth in traditionally measured inputs of labour and capital used in production). However, TFP does not seem to be an especially good measure of scientific progress – it does not track scientific progress per se, but rather all unmeasured factors contributing to productivity. This contains not only the economic benefits of new discoveries but also the diffusion of the existing knowledge, which seems to be qualitatively very different things. We have no way to tell how big proportion of TFP is caused by diffusion versus creation and how this varied in time. Further, some part of scientific progress works via enabling a greater supply of factors measured outside of the TFP (e.g. labor, capital, and land) or are embodied in concrete capital goods (which is perhaps how the benefits from most privately funded research are realised) and these will not be counted in TFP. The counterargument goes that if these biases are roughly constant over time, changes in TFP still would reflect changes in the rate of progress of science and technology (for more details and discussion see Cowen and Southwood, 2019, p.18). Anyway, I don’t think we have a strong reason to assume these biases are constant over time. TFP thus has the advantage of aggregating the science as a whole but seems not to be capturing scientific progress accurately enough to enable drawing strong conclusions about whether the science is slowing down or not. I think that if we wanted to make an argument using TFP, we could still say (although pretty vaguely), that since 1930s, TFP has been declining, whereas investments in research have been increasing, which suggests that at least some part of scientific discovery captured by TFP have been slowing down (or hitting diminishing returns).  


Further, we can take the notion of scientific progress as how much it does contribute to wellbeing measured in other ways than economic benefits (e.g. life satisfaction; chance of survival for the next thousand of years..). Is there any measure like that for scientific progress as a whole? Unfortunately, I’m not aware of any. 

Further yet, we may take a look at scientific progress from a perspective of how much it increases our understanding of the world. For example, measures that do not concern the economic benefits of science, but rather things like “how much a discovery changes our understanding of the world” and “how much does it enable us to make further discoveries”? Each of these interpretations would likely require its own measurement method.
One venture in this direction was done by Patrick Collison and Michael Nielson (2018), who asked top scientists to compare the importance of discoveries awarded by Nobel prizes in each decade. They found that scientists rate older discoveries as more important than the more recent ones. However, they focused again only on a few specific domains in which Nobel prizes are awarded (chemistry, physics, medical sciences), not on science as a whole, not to say newly developed domains of research. Further, the notion of “importance” might mean different things for different scientists –  some might weight economic application of that knowledge more, others might rather weight in how much it changed and formed our current understanding, etc.. Finally scientists might have a bias towards being correct, and it is safer to bet on older discoveries whose applications and importance have already been well demonstrated, rather than on more recent discoveries where applications are yet to be found. To conclude, while an interesting venture, this piece of research does not seem to be telling us much about whether science as a whole is slowing down.            

Diminishing returns to investments seem like a pretty natural way the things work when you want to exploit some resource (like gold mining). It also makes sense based on a heuristic that it’s easier to grow rapidly from small base than from large base (i.e. 10 % returns from investements in 1000 scientists are perhaps easier to achieve than 10 % returns from 1 000 000 scientists, because the latter benefit is just so much larger in absolute terms). Anyway, not all things in the world works this way – there are also things that work on the basis of constant returns or even increasing returns. An argument in this way could go like: the amount of pieces of knowledge is still increasing and by combining different pieces of knowledge, we can create yet another useful piece of knowledge. Therefore, the actual space of possible discoveries is getting larger and larger with each new discovery made, ther than smaller and smaller (as “ideas getting harder to find” hypothesis seems to suggest). However, it doesn’t say anything about (economic) returns to these discoveries – maybe ideas are increasingly easier to find, but the value of these ideas is lower and lower (or perhaps ideas with high economic value are getting harder and harder to find). One piece of evidence that could potentially point to the direction of increasing returns is the correlation evidence that firms and countries that invest more in R&D have higher returns to those investments (Hervas & Amoroso, 2016). However, this could also represent a result of comparative advantage over other countries/firms or there might be reversed causality (i.e. firms and countries that are doing economically better are using their additional resources to fund R&D, but R&D is not a causal mechanims for how they got economically better).


Anyway, let’s imagine science as a whole is really generating less and less benefit per dollar invested. Being true or not, this exercise is helpful in spotting things that might be suboptimal or might become problems in the future, something like writing a pre-mortem. So, why could that be?

Causes

1) New ideas are harder to find

As suggested by Bloom et al. and many others, it could be that just like in many other aspects of our lives, engaging in a given activity repeatedly brings less and less benefit. Good metaphor is the one with low-hanging fruit – we are gradually picking the low hanging fruit and now each new fruit is higher and thus harder and harder to pick. In that case, what kind of ability does this refer to specifically? It might mean that our ability to understand the world is hitting diminishing returns. Or it might be the method of knowledge creation that’s getting depleted. It might also mean that we just need more and more investment to get up to speed with our current level of knowledge. We can see this in several domains: PhD degrees in economics are taking longer to complete, co-authorship is increasing in mathematics (Odlyzko, 2000), economics (Brendel & Schweitzer, 2017) and research teams are growing larger in mathematics (Agrawal, Goldfarb & Teodoridis, 2016), suggesting one head can’t absorb all the required knowledge anymore. The age of the scientists at which they made a discovery for which they later received nobel prize also increased by about 10 years during the 20th century (Jones & Weinberg, 2011). However, note that all these pieces of evidence come from specific, very traditional disciplines, where the diminishing returns are most likely to be observed. They don’t assess science as a whole.  

Picking up the low hanging fruit seems like a natural principle – is it inevitable then? Can we do anything to reverse that or at least slow it down? I will propose some potential solutions in another post.

2) Science as an institution is getting significantly less effective in generating new benefits

This suggests that due to more bureaucracy researchers spend less time and attention on research itself. Or alternatively, with increasing numbers of researchers doing science, it’s getting harder and harder to coordinate and coordination costs cause large decreases in benefits generated.

One possible proxy for measuring the extent of bureaucratisation is looking at the proportion of time researchers spend actually doing research. One older study by Milem, Berger & Dey (2000) found that in the US between 1972 a 1992 the proportion of time devoted to research among academics at research universities actually increased. However, the time spent doing research is operationalized in a way that is likely also contains grant writing. Several more recent studies pointed to the large amount of time that gets wasted by researchers writing grant proposals and harshly competing for funding, with some researchers claiming they spent as much as 60 % of their time seeking funding (Fang & Casadevall, 2009) and overall costs of peer review grant selection process consuming as much as 20–35 per cent of the allocated budget for research (Gluckman, 2012) (for more discussion see e.g. this Nintil post). Moreover, this trend of wasted time seems to be increasing as success rate in grant competitions is getting lower over time (see e.g. this data from NIH or this data from CIHR, I will update with more data in the future to confirm whether this trend holds across domains and countries). Anyway, this trend would significantly contribute to decreasing returns only if the distribution of impact was more uniform, which might not be the case.
Another way to see whether administrative costs are increasing over time is to look whether there is increase in administrative positions (e.g. academic project managers) paid from the R&D budget (will need to find some more data about this).

However, as my friend Aleš suggested, reduced time researchers spent doing research might actually not be the most important form of bureaucratisation. The more important form of bureaucratisation might be expressed via hiring processes. If these processes become too stringent and inflexible, it might miss out some of the most extraordinary talents who won’t get the job and the processes might instead select for more average careerists, which might decrease the amount of innovation produced by academia by far in the long term.     

In the similar category of science institution getting less effective, the problem could also be insufficient research infrastructure. In the EU, the number of researchers grow approx 1,5 % per year, which means doubling once in 46 years (Baumberg, 2018, cited via Cowen a Southwood, 2019, other data since 1980s can be found e.g. via OECD, 2021). It is perhaps possible that institutions of science are finding it hard to keep up with such increases and, because the lack of coordination and infrastructure, the process of knowledge creation gets slower.
   

3) Privatization of research funding and benefits

Benefits from research are getting more and more privatized, oriented towards improving specific private goods rather than generating global public goods. This might explain why the TFP metric does not capture these benefits but also suggests a plausible hypothesis for how the overall value of research decreases. Indeed, the majority of research funding across G20 countries currently comes from business and private sources rather than government spending (1,26 % vs 0,65 % GDP on average) – but was it always like this or has it changed in the past decades?

During the second half of the 20th century, the profile of research funding changed greatly. In the 1960s United States, for example, public investment in research accounted for more than 2/3, while today it accounts for only 1/4 all investments in research. This privatization of research funding also applies to basic research – the proportion of publicly funded basic research has been declining since the 1960s, and in 2013 even more than half of all basic research spending in the US came from private sources (Mervis, 2017). This trend can also be seen in data from other OECD countries and Europe: the ratio of researchers paid from state/public sources vs. private sources is still declining (OECD, 2021).

If privately funded research brings smaller profits to the general public and greater profits to given private investors, then it is possible that this privatization of research funding ultimately reduces the overall economic benefits of investing in research. On the other hand, data from the US NSF show that the total amount of funding devoted to basic research increased 33 times between 1960 and 2000, and the ratio of basic research expenditure to total government expenditure (i.e. compared to applied research and development) increased from around 8% to around 27% over the same period (NSF, 2002). It is therefore true that overall research spending is being privatized, but spending on basic research is still rising – if (publicly funded) basic research is indeed more cost-effective in the long run because it allows knowledge to be more widely distributed and forms the basis for applications, then growth in investments in this type of research should ensure that the overall economic benefits of investing in research will grow, not fall.

Another pointer towards privatisation of research could be the Bayh-Dole Act, which might have encouraged universities to invest their resources to support of monetizable research and perhaps also decrease the disclosure and sharing of their research findings in order to monetize it later. Anyway, if “monetizable”=”more applied”, this should shown in some statistics (although perhaps not the NSF data, because that is strictly governmental funding, not university funding; also, it would be useful to compare how large proportion of research funding actually comes from universities vs governments to see whether that fraction is actually relevant).

Conclusion

To sum up, I think Bloom et al bring some evidence for decreasing economics returns to investments in any given domain, however, this not neccessary generalizes to science as whole. Because we don’t have especially good metric of economic benefits stemming from science as whole, perhaps the argument for diminishing returns for whole science is weak. Anyway, if it is true, there are number of reasons hypothesized – ranging from ideas getting harder to find to science as an institution getting less effective to privatisation of research funding and benefits. Perhaps all these effects are true at the same time and contribute to the observed decrease in benefits, but perhaps they vary in how much they contribute.