Assessing science: apples, oranges and the necessary evil of comparing them

April 18, 2016 6' reading time

The dominant paradigm of assessing (natural) sciences, and in particular physics, is simple and seemingly adequate. It is based upon the successful dissemination of scientific contributions in the relevant scientific community. The more other experts care about a result (by reviewing and accepting it, citing it, etc.) the more value it is assigned to. The “scientific value” of a given scientist is then calculated by taking all of his or her results into account.

This approach makes a lot of sense. In particular, it eliminates sources of prejudice by judging the results and not the author first. However, a number of issues with the process have appeared and increasingly gained virulence over the last decades.

What is wrong in physics: a consensus

The problems are perfectly summarised in a recent blog post by Reinhard F. Werner and a slightly older article by Carlton M. Caves. Their three main points are:

Using the average impact factor of the journal to assess a paper (and its authors) makes no sense.
The time and money spent on publishing in “high impact journals” and bibliometry-optimisation in general does nothing for science, but it is rewarded by funding and status.
Although the cost of publishing is entirely borne by public funds (through subscriptions, article fees and free peer review), the large profits are often privatised (by giant companies such as Elsevier and Springer Nature)

These ideas are now commonplace throughout the echelons of the scientific community and I find myself wholeheartedly agreeing with them.

I believe that the last point has a special status in that it is relatively easily addressed. New publishers (owned by (semi)governmental or non-profit organisations), preprint repositories and mandatory open-access policies for publicly funded science can realistically contribute to a breakdown of the publishing oligopoly and its indecent profit margins. I am quite optimistic that this will happen within the next decades.

However, why is no change in sight regarding the first two points? I will contend that they are the consequences of a problem running much deeper than usually recognised and which will inevitably persist, in some form or other.

The underlying problem: allocating resources

The very nature of science makes it unintelligible to an overwhelming majority of people—even scientists are knowledgeable about only a single field. Yet, it is usually provided for by public institutions.¹ This leads to the basic dilemma of science politics: why, and more importantly, how should subjects that nobody understands be funded?

The basic dilemma of science politics: how can a field that nobody understands be funded?

Let us assume that a more or less fixed global budget for science has already been decided upon (which is a problem in itself). Then the question breaks down into two components:

How can quality control be enforced? How to make sure that money is not wasted on outdated science and pseudoscience?
What should allocation of resources be? Which fields should be awarded more, which ones less money?

Let me begin with quality control—the assessment of the quality of scientists within a field. For this task, one can use those scientists within the field who are not being assessed (and who ideally have no interest in the outcome of the evaluation other than the advancement of science). This is usually done by internationalising local and national quality control, for instance through an internationalisation of publishing and some form or other of intra-field bibliometry (with respect to these international journals) to assess scientific quality. Another policy associated with intra-field quality control is forced mobility, which ensures that the local scientific level is in tune with the global standard. There would be a number of problems to discuss in this context, but they do not directly relate to the issues Werner addresses.

This brings me to the issue of resource allocation between fields, which I believe is even more important to understand the status of bibliometry in evaluating contemporary science. Funding bodies are forced to compare different fields with no common language nor common standards. Crucially, they cannot trust experts anymore, because any expert would inevitably be biased towards his or her own field.² Therefore, there has to be some sort of global benchmark, which is currently provided for by multidisciplinary journals (Nature, Science, PNAS, but on another level also PRL or Nature Physics), opening the door to interdisciplinary bibliometry including rating criteria such as the dreaded journal impact factor.

How to decide whether to invest money in developing, say, quantum computers or drugs on cancer? Any tentatively objective benchmark to answer this question can only be highly imperfect, and will tend to result in a waste of scientific resources.

Werner clearly shows why, from the point of view of each scientific field, assessing scientists using these external benchmarks appears ridiculous. However, this is the case for any exogenous benchmark! Apples simply cannot properly be compared to oranges; neither can the prospect of a functioning quantum computer be compared to the prospect of a cure for cancer. These are value judgements.

Multidisciplinary journals embody some form of value judgement that is highly imperfect and objective only in appearance. Yet, since resources allocated to science are limited, such judgements have to be made. They inevitably come at the cost of terms of misallocating scientific resources, since they impose the same uniform, ill adapted standards on each field.³

Is there really no way out?

I would personally advocate to just put up with the fact that the organisation of any scientific field necessarily depends on exogenous factors, which in turn leads to irrational behaviour and a waste of resources.

This being said, I also believe that the deleterious effects of resource allocation can at least be contained. One possibility would be to reduce the level of competition in and between fields, by increasing science funding as a whole. For instance, the more physicists have a stable position and funding, the more will probably spend time on doing actual physics rather than on marketing for high impact journals. If there has to be some sort of “science lobbying”, I would therefore support lobbying not for one’s own field but rather for science funding in general.⁴

Conversely, reducing the number scientists in a field, while maintaining its budget constant, would have a similar effect: less competition for funds and prestige, resulting in an alleviated pressure. Maybe—to conclude on a slightly controversial note—such a downsizing would even result in an overall increase of scientific output. More scientists do not always make for better science…

Addendum 1 (April 19, 2016): I would like to thank Jacques Pienaar for reading and commenting on a draft of this post. He suggested that resources should be allocated by a government think thank including experts in philosophy of science, politics, economics, etc. I agree that this would constitute an improvement over the current situation, in particular because it would highlight that science funding is in essence a matter of policy choices. However, the criteria used by such a body would still be exogenous to each field and induce wasteful behaviour akin to what we presently witness.⁵

Addendum 2 (April 21, 2016): I might not have been very clear about two points. First, I use “field” to refer to an area of scientific research that is (reasonably well) independent from other areas. As such, I would consider physics as a collection of fields such as quantum information, mathematical physics, computational physics, aerosol physics, and so on.⁶ Second, I do not argue that multidisciplinary journals are the only tool used to allocate funds, nor that they are particularly good at informing funding choices. I merely pointed out that they are essential to understand how science funding (to name just a few examples: the ERC; Horizon 2020; the way faculty positions are created and awarded) presently works and that replacing them with something else will not get rid of the underlying competition for resources.

Addendum 3 (August 31, 2016): I recently stumbled upon an interesting example of how important interdisciplinary journals are in the media and politics: The “Shanghai ranking”, which is one of the most ridiculous assessments of universities that can possibly be conceived, and yet receives massive—and uncritical—coverage. It relies, for 20% of the score, on the number (!) of papers published in Nature or Science. Sadly, this is not even the most inane criterion used for the ranking.

Which makes sense, since scientific knowledge is a public good, which is all but impossible to produce and allocate using a market mechanism. ↩
A recent example of this phenomenon is the “Quantum Manifesto”. This shameless lobbying operation to promote funding for quantum information shows how little objectivity can be expected from scientists when their own funding is at stake. ↩
Werner summarised the influence of these criteria on science forcefully: “When we believe that we will be judged by silly criteria we will adapt and behave in silly ways.” Unfortunately, I see no way to eliminate the criteria’s “silliness” altogether. ↩
I am well aware that there is no general and sharp criterion to determine what “science” is, but this is another issue altogether. ↩
For example, suppose that a funding body would like to fund scientific contributions according to their potential, long-term, “benefits for society” (which is really the case in Australia, as Miguel Navascués pointed out to me). It is easy to imagine how such a constraint would undermine the quality of research in foundational physics. ↩
This is also why I would consider PRL to be a multidisciplinary journal. ↩

Questions? Comments? Suggestions?

layout

zoom

tags

What is wrong in physics: a consensus

The underlying problem: allocating resources

Is there really no way out?

Recommendations