Genetics of education (it’s not what you think! … no quite, at least)

I have been taking a somewhat long break from blogging. A lot has happened recently. Some good: my partner and I just had a wedding, yay! Some not so good, which I won’t talk about today. Maybe I will at a later time.

It turns out that, surprise surprise, stress does kill both creativity and productivity. But I need to slowly get back to it. So I’m going to finish writing something I started before this frenzy.

I came across an Atlantic article on a genome-wide association study (GWAS) on genetic factors in educational attainment (i.e. years of schooling) in adults from European descent (which is common since that’s where most existing data lay). In accordance with disciplinary norms, the study paper is extremely short, dense, and dry, making sense to probably 100 people at most, half of which are authors on this paper. (Seriously, though, the humanities are in the minority in being allowed to make their papers self explanatory.) Fortunately, the research institution hosts an FAQ page for the papers its authors publish (the education paper is the second one, accessible from the left-hand bar). Here, the authors reflect on the limitations of their methodology and implications (or lack thereof!) of their results. It’s an interesting read.

For those who are unfamiliar, GWAS is a new and rising technique in genetic research made possible by big data (not in the sense of machine learning, but in the sense of cheap storage and computation of massive data). Previously, the popular method in genetic research is “candidate gene study”, where researchers identify a few genes based on theories about their roles in relation to some research target and perform small-scale association studies with them. The qualities of these studies are hard to control, because: 1) the combination of existing theories supporting these candidate genes and the fact that only a few genes are studied and so no effect isn’t interesting motivates researchers to (consciously or unconsciously) p-hack, and 2) the small samples usually have very low power. GWAS gets around both of these problems by employing huge samples. There’s a podcast episode from EverythingHertz on genetics that talks about some interesting histories and reservations about recent enthusiasm over GWAS.

Educational attainment is, of course, a topic as old as education itself. It is interesting that the paper identifies years of schooling as the response variable, rather than intelligence, which sometimes is seen as measured by educational attainment and is much more plausibly genetic. This feature might reveal some of their attitudes towards education and nature/nurture.

Association studies, like GWAS and many others, start with the identification and separation between two groups of variables. Sometimes a causal direction can be inferred, such as in experimental studies or when there are clear temporal orders. But whether or not there is a well-grounded causal direction, the nature of the relationship between one variable and another is not revealed by their association — other than that there is one. To give some examples from the FAQ,

Genetic variation may improve sleep quality (making it easier to subsequently stay awake in boring lectures). Genetic variation can affect personality traits, such as the willingness to listen politely to and follow the instructions of teachers (who aren’t always right but nevertheless dictate grades and other outcomes).

Factors, like (in this case) sleep quality, that are unaccounted for by an interpretation of an association and that in fact explain the association are called confounders. Much of the job of study designers and statisticians is to fight off confounders, but they, by definition, are always elusive. The public, however, or whomever without much training or experience in study design (which sometimes include scientists too), very often overlook the possible existence of confounders when they read a piece of association finding.

When you read “scientists have linked genes A, B, C to higher education attainment”, it is really hard to not see this as a claim to the effect “something about these genes make it so that people are more likely to go to college”. This temptation holds even when you rephrase the first statement into more “factual”, “deflationary” terms, like “it is more probable for people (in our sample) with variants A, B, C to go to college”. In many cases, this last statement is factually accurate, but it can be miles away from the seemingly-immediate implication that people with these genes are smarter somehow. Here’s another example given by an author of the GWAS paper, reported in the Atlantic piece linked above, that highlights this dynamic in the context of education attainment,

“If you did a study like ours 100 years ago, the strongest genetic predictor of education would be how many X chromosomes you had, because society was set up in a way that it was much harder for women to get educated than men,” says Benjamin. Likewise, many of the genes that are associated with education today are likely important “because of how today’s educational system is set up. It requires people to sit at desks for hours, and listen to instructions from a teacher. People who get restless, or are less obedient to authority, will fare less well in that environment.”

As noted before, one key difference between GWAS and candidate gene study is that GWAS is, in a loose sense, theory-free. In a candidate gene study, the genes are selected in accordance with some theory, and the presence or absence of an effect is essentially evidence for or against the theory. In something like GWAS, one can detect effects without supporting theory. Scientifically, this means that GWAS is less susceptible to theory-motivated p-hacking. But in the world of interpretations, this means that there is often less of a story to tell when an effect is detected.

As association studies with greater scopes are becoming more common, it will become increasingly hard to decide what lessons we are/ aren’t supposed to draw from our findings. In one sense, associations are incredibly important scientific phenomena that require great care to detect and isolate. In another sense, associations tell us pretty much absolutely nothing at all. At least nothing concrete. Just as how the authors answer the question “3.6.  What policy lessons do you draw from this study?” in the FAQ: “None whatsoever.”

Kino
Latest posts by Kino (see all)

1 comment

Comments are closed.