Science infrastructure | A step closer

Maybe it’s just me, but when I hear discussions about Open Science, it’s often reduced to “is the gold open access model better than subscription based model”? I think this focus is largely ignoring most of the value that open science could bring. Here I will try to sketch out what I think this value is and perhaps also some suggestions how to achieve it. First, let’s look at what all can hide under the Open Science label (figure by OpenAIRE)

Open data

I think that most value of open science lies in open data. The vision behind this is that we would create something like a “data web”, an interconnected repository of all data that is being collected, that we can use to analyze a wide range of questions about the world and come up with new insights that the original authors/data collectors missed or that were not possible to find in each dataset alone. There have already been some success stories in this direction of large scale projects that decided to share their data openly. One of the most striking examples could be the Human Genome Project, which is also considered one the the best investments in terms of economic returns from research, from a large part because they decided to share their data and results openly (read more about this decision here). Another example could be Sloan Digital Sky Survey or Ocean Observatories Initiative (for more details on these stories read Michael Nielsen’s Reinventing Discovery, chapter 6). Anyway persuading individual researchers to share their data seems a bit harder, because the data is the thing that often gives them comparative advantage over other researchers. It seems the incentives are not especially aligned for researchers to share their data. What could change these incentives?

Perhaps if the effort behind collecting data got some more recognition and was not treated only as a way to write good paper. One simple way to increase this recognition would be to plug it in the existing citation-based system, creating citations specifically for the data. Thus when someone else would use your online available data, you would get recognition too.
The critique that I sometimes hear about sharing the data is that this would “enable some researchers to build their careers only on the data they find online, without ever collecting any data themselves”. I see how frustrating this could be for a researcher who spent a lot of time collecting data with little recognition for their effort, but from a societal perspective it doesn’t seem bad – the end result would be that new insights would be generated form the data which otherwise would not be.

A half-way solution that I sometimes hear scientists being more open to is delayed release of the data – keeping data private for e.g. 2 years after their collection and releasing them then. This gives the researcher who collected the data some time to extract new insights from it but if they fail to do so, they give others the chance to do it. I think this policy might not be a bad idea, but haven’t yet made my mind about how much value gets lost via this delay (I guess for urgent issues like Covid or some policy relevant questions the delay would mean a large proportion of the value lost, but most research is not so urgent?).

Open access

The main vision of benefits from open access is that it will make it easier to disseminate knowledge and thus get everyone to the frontier of the current knowledge base faster. There is also a moral argument that publicly funded research should be accessible by the public and not be kept behind paywalls. The evidence for these benefits actually realising could be seen e.g. in open access papers being more cited (with paper quality/prestige being controlled for).

I think these benefits are substantial, but perhaps a bit lower than benefits from open data, since most researchers can find their way around if they want to read a given paper (you can always message the authors asking them for a copy if you don’t want to use pirate services like Sci-Hub). Further, reading someone else’s findings might stimulate your thinking and lead you to new insights, but I guess that new insights drawn from data might be even more important.

Gold is not the solution

In the gold open access model, the publisher makes all articles and related content available for free immediately on the journal’s website in exchange for a fee paid by the author (or other subjects). Publishers thus switch from being charging readers for access to content to charging authors for the publication service.

Perhaps the strongest line of critique that I heard is that this system incentivises journals to be less strict in their quality checks (i.e. peer review) – in the subscription based system the earnings of the journal depends on how high quality their content is so that many readers want to access it so they are incentivized to pick the highest quality papers. Not so in the gold open access model, where rejecting an author actually means lower earnings for the journal (in case that the author is not replaced by another author willing to publish in a given journal, which is often the case). Additionally this system could also make it harder to recognise predatory journals (which operate the same way but dont do any real quality checks and don’t provide any other value the standard journals do). I think this might be true and should cause some serious worries, but I’m unsure how much to believe the current quality checks of subscription based journals anyway given that no correlation between replicability and journal status was found and journals with higher impact factor have more papers with erroneous p-values.

Another strong line of critique is that with the current journals’ fees, researchers (or bodies that pay for them like funding agencies or university libraries) would have to increase their budgets (perhaps even several times) to publish all research in this format (see e.g. Green, 2019). Therefore it is not financially sustainable to switch to open science via the gold model. I think this might be true, but it would be wrong to abandon the notion of open access altogether just because it is not possible via the gold model. There are other models which are significantly less expensive and have some additional benefits on the top of the gold model (see below).

Finally, another line of critique is that although one of the goals of open access was to make science results accessible to developing countries which can’t afford paying large subscription fees, high publishing fees in the gold open access model now make it impossible for researchers from those countries to participate by sharing their own findings with the global community. I think that even if this is true, the tradeoff here is pretty skewed – it is much more beneficial for developing countries to be able to consume research of the developed countries (which is perhaps of higher quality) and thus stimulate their economies, than sharing their own findings (which are perhaps of lower quality because fewer resources goes into creating them).

Overall I think there are strong and sensible arguments against gold open access models and perhaps we should search for alternative models that have fewer downsides and more benefits.

Preprints are

Perhaps the easiest way to make all knowledge open access is to establish norm to share all papers as preprints, ideally in some (centralized) databases that are searchable by Google Scholar, WoS and other search engines. This has not only the advantage of making all papers open access by default and for free, but it also has one additional important advantage – it speeds up the dissemination of knowledge a lot. No more waiting for several months (or sometimes even years!) from the time the paper is finished to the time it is actually published and shared with the world. Findings are shared instantly. This already proved crucial in urgent matters such as Covid research, but I guess the potential of that increased speed is high even for less urgent matters.

Perhaps the only critique of preprints I heard so far is that there is no quality check and thus you can’t know whether a given paper you end up reading doesn’t have some major flaw in it. I agree that there is no quality assurance and perhaps the credit that researchers get for publishing should not be the same for pre-prints and for papers that have gone through some quality checks. Anyway, every researcher can judge for themselves whether a given paper has or doesn’t have any major flaw in it when they are reading it, after all that is how the peer review works, right? And maybe we can nudge the readers to give a rating to a given pre-print they read so others can get a signal on how good the paper is even before they start reading it. The standard quality checks of journals (i.e. peer review) might still be in place – authors could submit the preprints to the public databases first, but then continue submitting their papers to journals as usual. Journals then can publish revised versions of the paper themselves, or give some “badge of quality” to the preprint version in the database. Funders can value papers with this “badge of quality” higher than standard preprints and thus create incentives for applying for these badges. Or alternatively the initiative might rest on the shoulders of journals – they can search through the preprint databases and decide themselves which papers will they invite for the quality check or give the badge of quality right away. But let’s open our minds still a bit further. Maybe in the future we will find better mechanisms to assess quality than the peer review process. Maybe it will be making use of collective intelligence via the rating scales suggested above or replication prediction polls/markets which would be open to everyone who has read the paper. That sounds like something more scalable than peer review. Maybe it will be some automated AI algorithm assessment which would be even more scalable. In any case, publishing via pre-prints would make these possible in the future and would have some direct benefits even today.

A step closer

A blog about science and how to improve it

Category Archives: Science infrastructure

Open Science

Open data

Open access

Gold is not the solution

Preprints are