Quick note on OpenAI and Erdős’s unit distance problem

Top-line reaction: It’s so incredibly important to see all of the actual details here to make an honest assessment of the result’s import. E.g., How many problems were they working on? In what areas? What prompts/inputs were used, and how many? What models with what settings? What was left out of the “abridged” (70k word, likely 100k+ token) CoT provided? How many other attempts were made? Why did they pick this problem? Did they already know what the counterexample would look like? Did they even have the answer already? Why didn’t they do any of this work out in the open? Why is access to this model restricted? How much time was spent working through the outputs? Who worked through them and how? And the always-crucial: What’s in the training data?

Since showing your work is not necessary in this community, I forego provision of evidence below.

The following seems consistent with what they’ve provided so far:

  1. the model just pumped out some interesting ideas here and there, demonstrating no appreciable recognition of how those pieces could fit together, and all of the intellectual work was done by (domain expert!) mathematicians, including the crucial step of setting it on the path of disproving the conjecture (which is pretty clearly what they did, judging by the first page of the abridged CoT),
  2. the same thing could have been achieved using a rudimentary (and deterministic and far more efficient) search algorithm geared to the same counterexample strategy (which, again, was almost certainly provided by domain-expert mathematicians),
  3. it’s almost certainly not a coincidence that their first big “discovery” is a counterexample because this strategy is (almost always) easier to brute force, both conceptually and computationally,
  4. likewise, it’s not a coincidence that four well-connected mathematicians were the points of contact for the proof and asked for comments (cough astroturfing cough), and
  5. they’ve pumped a shit ton of mathematical expertise into the environment surrounding the CoT, whether that be as a harness, hidden large-scale injections of very specific CoT training, hidden RLHF, many repeated human interactions via prompting, etc., which is bad news because it means this probably won’t generalize even to nearby areas of math and also because they likely spent a fuck ton of time, money, and energy just to get this one, and way more than just hiring some postdocs to work on this for a while.

If any one of these is even approximately true, we should be far less impressed than OpenAI wants us to be.

It should also be noted how cynical their marketing is here. First they targeted programmers, who are very well known for their evidence-proof enthusiasm for shiny new technology. Now they are coming for well-known problems in a publicly legible area of mathematics (notably not in, say, homotopy theory, representation theory, inner model theory, etc). On the surface, these seem like unrelated targets. But they share two important features. First, the public–and especially the financial and Business Idiot world–see both as areas requiring exceptional intelligence, much like chess two decades ago and Go after that. Second, neither community has (all of) the necessary expertise, let alone information, to call bullshit. They’re not targeting event planning, woodworking, care work, HVAC repair, IT technicians, etc., because (a) there are too many people who could see through the charade and (b) correctly or not, the public doesn’t view these domains as intellectual.

 

Until they release the details that prove otherwise, we should put “an LLM solved an Erdös problem” in the same category as  “Koko spoke sign language”, “dire wolves were brought back from extinction”, and “Majorana particles were detected”: it’s a load of shit. And really, is it that surprising that a private, for-profit company would manipulate us in this way, when even professional computer scientists are so routinely bad at designing assessments that rocks pass the most famous (supposed–not what Turing was actually doing in that paper) test of intelligence and our tests of quantum factorization can be passed by a (non-compliant!) dog?

And even if it’s fair to say the LLM solved it…so fucking what? That problem was noteworthy because its resolution was valuable–not in some fancy sense, but in the plain ol’ sense that actual people valued it and had earned enough trust from adjacent communities that they were given material and social support for pursuing it. Indeed, its resolution was so valuable to a community that a company was (seemingly) willing to light millions of dollars on fire in the hopes that it would distract credulous investors from the likely-atrocious numbers they’ll give us in their S-1 in the coming months. As elsewhere in the academy, and indeed all expertise, exploring, discussing, reevaluating, and interpreting what is valuable–the stuff that doesn’t show up in products like math papers, the stuff that comes before and after the proof per se–is often the most intellectually demanding work that goes into any result. OpenAI has done almost none of this work, much less their LLM, and expects us all to just ignore the gap.

Call me when an LLM is so committed to being a peer within an esoteric intellectual community that it’s willing to work as an adjunct instructor for four years without health insurance, or so committed to the ideal of an informed electorate that it’s willing to suffer through decades of attacks from Silicon Valley, just to keep the hope of a culture that values expertise alive.

Chris Mitsch
Latest posts by Chris Mitsch (see all)

About Chris Mitsch

Chris studies the history and philosophy of science and mathematics. He is currently translating several works by Hilbert, Nordheim, and von Neumann as part of a project on the philosophy of mathematics that informed early quantum mechanics formalisms. He is also interested in: historical method and how this should inform general philosophy of science; the cognitive foundations of mathematics; and the construction of identity in (especially American) politics. Chris posts under the banner "Method Matters".

Leave a Reply

Your email address will not be published. Required fields are marked *