Men are more likely to make misogynistic arguments; so are philosophers. Could this be the explanation of gender disparity within philosophy?

Today, Daily Nous published a guest post by Christina Easton titled Women and the “Philosophical Personality”, with the provocative hypothesis:

Research suggests that there is a cognitive task on which philosophers tend to perform better than non-philosophers and men tend to perform better than women.

…a natural (perhaps ‘intuitive’!) conclusion might present itself: Perhaps women are less likely to possess the aspect of the ideal philosophical personality tracked by the CRT, and this contributes to the gender imbalance in Philosophy. Call this the ‘Quick Conclusion’.

In my view, this is a perfect example of the kind of sneaky argument where one strings together a collection of scientifically dubious but entertainable conclusions into a shaky scaffold, holding up an eye-catchy sign. You know the sign is about to fall off and hit a particular group of people on the head — you can see it coming. But somewhere hidden in the scaffold is a tiny line of “disclaimer: sign may fall”, so that the builder can shake off the responsibility of having intentionally harmed someone.

The small-font disclaimer does make it tricky for opponents to respond in a way that doesn’t appear too uncharitable. Indeed, if you read the published version of the paper, you’ll realize that Easton’s stated conclusion is extremely cautious: we do not know everything, and many things are possible. It’s the kind of truism that makes it hard to disagree with.

I am going to try. This post consists of 3 parts. Part 1 points to the various shaky joints of the scaffold, some of which are entertained in a non-committal manner in Easton’s paper. Part 2 tries to explain why this kind of “quick stab for hype” approach is problematic. Part 3 provides my theory for how something like this could happen.

1. the scaffold

If we disregard the systematic hedging that permeates the article (in fact, the conclusion section of the published paper has so much hedging it’s hard to pinpoint a thesis), the central argument seems to be this.

the Cognitive Reflection Test (CRT) measures System 2 processing, which is the ability or tendency for a person to rely on rational deliberation, rather than emotional/intuitive reflection, when making judgments.
There is evidence that philosophers do better at CRT than non-philosophers. This is perhaps because System 2 processing is very important to philosophy.
There is evidence that women do worse at CRT than men.
Conclusion: it is possible that the women’s innate* inability to perfect System 2 processing makes them less competitive in the field of philosophy.

(*Whether System 2 processing is taken as innate in the born-with-it; can’t-change sense is difficult to tell; there is inconsistent language used in the paper.)

Here is a (very much incomplete) list of places to pause throughout this reasoning.

First, consider the CRT itself, which consists of 3 questions:

A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?

If it takes 5 machines 5 min to make 5 widgets, how long would it take 100 machines to make 100 widgets?

In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake?

They are supposed to indicate your philosophical ability of rationally searching for truth instead of giving in to quick-and-easy intuitions.

Now, I happen to have seen these questions in one of those “brain-twister” books I used to get as a kid. Unfortunately, I don’t remember whether I got them right the first time.

In what possible world a 3-question test of this sort is capable of detecting anything coherent at all is a question I shall not dig too much deeper, except to remind readers that: even if it’s true that people who are “smarter” do better on these tests, all else being equal, it does not mean that 1) there aren’t an additional million reasons for why someone would do better, and 2) those reasons wouldn’t collectively render the test useless as an indicator of “smartness”.

Even if this test does measure something coherent, the thought that that something is an “ability” with some predictive power is extremely contentious. Let me share a secret with you: I was convinced that 85+25=100 all the way through 2nd grade. Does this mean that I forever lack the necessary mathematical ability and will never be successful at math?

To take a different example. I know a fair number of scholars who are dyslexic. They often had trouble learning reading as children and they still make spelling and grammatical mistakes as adults. Now, some people might think that the mastery of words and sentences is an indicator of one’s ability to find success in writing and scholarship. Those people would be wrong.

What this tells us is that we are not very good at intuiting what behaviours are “indicators” of what “abilities”, and what constraints those “abilities” may or may not put on our success.

This is also not to mention that the whole narrative relies on a reason/emotion divide, which is problematic, as well as a dual systems theory of cognition, which is pop-culturally misunderstood and scientifically controversial.

Second, even if we grant that dual systems theory is true and CRT measures System 2 processing — a huge leap of scientifically-unsupported faith — one still needs to argue that System 2 is crucial in philosophy. Easton’s claim on this front seems to come from a paper by Livengood et al., which reports an online study with people’s self-reported exposure of philosophy and their accuracy of answering the 3 questions of CRT listed above. They found that people who report greater philosophical exposure answer more of these questions accurately, and concluded that 1) the CRT must track reflectivity, and 2) reflectivity must be very important to philosophy.

I do not wish to dwell on the tenuity of this other line of argument too much — social-scientific research is hard. One must be satisfied with small steps. However, I do want readers to mentally note the many leaps-of-faith that are needed for an argument of that type to go through. Let me be clear: three logic puzzles are taken to provide non-negligible information on why some people succeed in the academic field of philosophy and others do not. Just think about it. (Compare this claim to the still-debated claim that intelligence testing measures something innate. I have discussed those subtleties here and here. Think about how much research has gone into intelligence and how far that’s traveled; compared with how much research has gone into CRT and the level of claim being made here.)

Third, even if we grant all of the above — that CRT meaningfully tracks this System 2 thing, which totally exists and totally plays an important role in philosophy — the claim that women are worse at CRT explains how women are worse at philosophy still takes an unjustified stance on one particular causal direction. Recall that, whenever two variables (in this case, CRT score and philosophical ability) correlate, there are 4 possibilities: A causes B; B causes A; some third C causes both A and B; and the correlation is superfluous. To say that A explains B is to take the first option.

In fact, Easton takes an even stronger view. She writes,

Perhaps women are less likely to possess the aspect of the ideal philosophical personality tracked by the CRT, and this contributes to the gender imbalance in Philosophy.

Notice that, here, whatever CRT is taken to track is conceptualized as a personality. Now, calling something a personality comes with a baggage of assumptions: that it causes behaviours rather than being caused by behaviours, that it doesn’t change very much across time and context, etc. It’s now something about the person herself, rather than about the societal treatment and expectations around her, that is responsible. There is very little evidence for this claim.

2. the falling sign

Note how much leaps we have had to take in order to get to that conclusion: despite all of psychometric validity theory saying that dreaming up a test and observing differences in scores doesn’t mean the test is valid at measuring what you thought it measured, we are to believe that CRT measures “reflective thinking” or something similar. Despite scientific debate around dual systems theory, we are to take “reflective thinking” to be System 2. Despite no real argument being put forward, we are to believe that System 2 is super important in philosophy. Despite no argument at all being put forward, we are to believe that women’s scoring lower on CRT reflects something about women themselves, rather than the test or society.

String all these together, we conclude: women are underrepresented in philosophy because women are naturally bad at philosophy. What a revelation!

But wait! The author didn’t actually say this. In fact, the author said she wasn’t sure whether this was the correct conclusion to draw, only that it’s “possible”!

And this is the sneaky part I alluded to earlier. Think about this in terms of information. Each time a leap of faith is required, the level of noise in an argument increases, thinning the line between the starting point and the endpoint. With the level of noise involved in this scaffold, the information that is left is so minuscule, if you allow this to be informative, then you’d allow almost anything to be informative. If this level of tenuity is what we are happy to work with, then we can work with almost any theories!

Why is this a problem, though? No information is strictly false and the conclusion is certainly possible, where is the problem?

The problem is that, by devoting scholarly attention to this thesis, we raise it to the level of equal proposal with a variety of more plausible theses. By doing so, one digresses public attention from more important and complex (hence difficult) issues to what is essentially the easy fix of just blame the victims. It blurs our attention and contributes to the general hostility of the conversational context.

Think about the people who say “have you considered the possibility that girls are less educated because they’re just less smart?” or “African Americans score lower on the SAT, which tests for intelligence. One could certainly draw an obvious conclusion…” What do you say to them? “You’re just wrong”? In a strict sense, they are not, because the things they say are certainly (logically? metaphysically?) possible. In another sense, we have piles of evidence suggesting that other interpretations of the same data are more plausible and so these interpretations should be disregarded. Perhaps the best response is to say nothing. But saying nothing does not usually serve well the group that is repeatedly hit by the signs.

3. the view from afar

How might something like this happen? I actually do have a theory. Like all theories, this one is possible. I’ll let readers decide whether it’s also likely.

Quite a number of scientific communities have begun to recognize the problem of scientists needing to “hype up” their own findings in order to get published. It is well known that science journalism can harm science by mispresenting findings in a way that is more attention-grabbing. It is also well known that the pressure to publish has led to quite a bit of scientifically dishonest behaviours.

More recently, some have noticed that scientists are, collectively and on a greater scale, moving towards more extreme descriptions of their findings and their impacts. A paper published in the BMJ found that scientists nowadays are more likely to use positive words like “robust,” “novel,” “innovative,” and “unprecedented to describe their findings, compared with in the 1970s. (See a secondary discussion here.) While most scientists would not (knowingly) distort data, there certainly is a pressure for them to stretch the implications and potential impacts of their results.

While this phenomenon seems particularly problematic in medicine, it is, of course, well documented in psychology. Psychology is hard because people are noisy. It takes a long time and a lot of hard work to be able to say anything concrete about people. However, just saying “I only did one study and so can’t really say anything about anything” is not going to get a paper published. So, instead, researchers have to make unsubstantiated grandiose claims that make it sound like their study can single-handedly change the entire scientific landscape.

Could the CRT be of scientific value? Sure. Could the CRT tell us something meaningful about individual differences? Very possibly. Does the CRT pick out a personality trait that systematically differs between men and women? This is a much more contentious issue that the inventor of the CRT seems to endorse. As far as I can tell (after trying to skim over the 96-page paper), most (all?) of the studies are cross-sectional and done on English-speaking university undergrads.

The problem of over-stretching results to grab attention, among others, has already led to the replication crisis, which we are still very much in the middle of. As scientists struggle to find ways moving forward and “fix the system”, science-consumers such as journalists and us philosophers need to be more cautious, since we are not always best equipped to judge the level of well-evidenced information contained in a piece of research. Perhaps this is a place where extra-scientific considerations should take on more weight.

Finally, let me share this wonderful video by Tara Brabazon of Flinders University in Australia on socially conscientious methodology of science. Think about what question you are asking; think about what question you are not asking; think about why that is.

Author
Recent Posts

Kino

Kino specializes in the philosophy of statistics and its application in the social sciences. She looks at the methodology of social sciences in general but psychology in particular through the lens of data analysis. Kino posts under the banner "Scattered Plot".