Scott Alexandriad III: Driving up the Cost of Critique
In which I witness a living, breathing case study of revolutions becoming the establishment
There is a failure mode I’ve noticed in particularly smart people I know. I’ve seen a cluster of cases recently “from all sides of the debate,” and it may very well be happening to me, for all I know.
The failure mode is driving up the cost of being proven wrong so high as to make it impossible to be proved wrong. There’s always a “yes but,” to every fair point, and a sufficiently smart person can figure out what it is. Eventually, the conversation goes on long enough that you can declare a draw and move on with your pride intact. What’s not to like? This is a particular subcase of isolated demands for rigor that Scott Alexander has written so eloquently about.
Let’s go over a case study that will be familiar to readers of this Substack, and explore some methods used to drive up those costs of critique.
Method 1: Demand Your Critics Meet a Standard You Do Not
Case in point, Scott Alexander has responded to my previous post. He’s not happy, but that’s not a surprise. What was a bit of a surprise is that he misrepresents me and my positions throughout his response in easily demonstrable ways. Which is really quite interesting for a response that begins with, “I object to the way I'm portrayed in this post.”
I tried to set as much as I can straight in my response to Scott, but let’s face it, this was to correct the record for the readers. The odds that Scott reads my response, has an epiphany, and changes his mind are pretty infinitesimal at this point.
What I find fascinating in Scott’s response is that he faults me for all sorts of things, like the original title I used. He also claims about me, “He continues to trust people with a history of being totally crazy and credulous for anything that supports their opinion.” He doesn’t state who these people are, so at this point this is an unsubstantiated allegation I cannot really defend against.
He concludes in what is more or less the rationalist equivalent of a fatwa:
I don't think Alexandros is engaging in good faith, and I urge people not to take anything he says about me, my opinions, or my actions at face value.
Now, it should probably be haram in rationalist circles to tell others what to think, never mind unilaterally and immediately declaring your interlocutor as “bad faith,” as a response to criticism.
The definitions I saw online for “bad faith” equate it to “dishonesty,” and since he hasn’t demonstrated such dishonesty on my part, here is where I must draw the line:
Public pronouncements of dishonesty without even checking to see if one has their facts straight are arguably dishonest themselves, and Scott does not have special permission to violate all the rules he demands I follow. Consider the chilling effects on others who may consider pushing back in the future. Are they more or less likely to after witnessing nothing more or less than an edict that someone is to be shunned? His community is sadly not holding him to account on this, but unfortunately this is understandable. After all, who wants to be next?
Stepping back though, it is a most useful case study into what happened to the once-vibrant rationalist ecosystem. Treat it as a demonstration of what I described in my previous post: the loyal opposition has been eliminated. Dissent, even civil and reasoned, is not tolerated, in what is ultimately an act of self-harm. Scott uses these characterological accusations to explain why he doesn’t really see a reason to follow up on my points:
He does have a separate good point that after a certain number of hours responding to ivermectin complaints I want to move on and do something else, and this has made me less willing to do 100% due diligence on all his points - but I think even if not for this I would be particularly unwilling to work with him on this.
In short, I read this as saying that even though he knows I have made good points in the past, he does not care to confirm further points I am making, because he does not trust me.
Let’s take everything he says at his word. Let’s assume that I am the worst person in the world, a liar, a cheat, and a drunkard. What does this have to do with not addressing my criticism when I have laid out all my evidence? Rationalists should want to be less wrong. Being shown errors you make is supposed to be a gift. Why look a gift horse in the mouth? If he had strong evidence that many of my critiques were false or spurious, that could be a reason to stop paying attention, but this is not what I’m reading, and he does not list those anywhere else.
Method 2: Make the Critics Doubt Their Own Eyes
In the response to my previous article, Scott says that his original post in fact concluded the effect was real: “I go on to say that although I believe the effect is real it's probably due to parasitic worms.”
I have detailed the reasons why I did not believe this is an accurate representation of his post in the email I embedded in my previous article, and readers are welcome to read Scott’s corrected original piece and make up your own mind if the correction leaves the post reading smoothly. (You probably want to search for the word “update” and read from there if you’re pressed for time.)
Instead of relitigating what the post says or does not say, or going over the numerous reactions to the post to see what they understood at the time, let’s do one better. Let’s look at Scott’s follow-up post on fluvoxamine, posted soon after the ivermectin article, to see how he describes ivermectin in light of his then-recent piece. Here are some choice quotes:
…ivermectin and hydroxychloroquine crashed and burned.
Many of the epidemiologists and statisticians most instrumental in debunking the hype around ivermectin have spoken out in favor of fluvoxamine.
Maybe they’re remembering the ivermectin debacle and wondering if you secretly prescribe horse dewormer and vote Trump.
Medicine is too big and complicated and scary to stray from the herd most of the time, and the sort of person who never fails at this problem is probably crazy, and constantly gives his patients snake oil or ivermectin or whatever.
Do these seem like the comments of someone who has concluded ivermectin works, but only in some places? Do they sound like the pronouncements from someone who found a strong signal, but gave a 50-50 chance the effect was due to worms?
To me, they sound like he's calling ivermectin snake oil.
Method 3: Use Arguments That Prove Too Much
In the Reddit comment thread, many people agreed with me, and some asked very good questions. However, a fascinating critique emerged: that the meta-analysis results I showed combine heterogenous endpoints, and therefore cannot be taken at face value, because that violates meta-analytic rules. I’m sure there’s a lively statistical debate to be had on this, but first, let’s remember how I described the argument I was demonstrating:
This isn’t the argument I like. This isn’t the argument I chose. I am simply correcting Scott’s argument and showing you the results of his analysis.
The use of heterogeneous endpoints is not a choice I made. It is a choice I inherited from Scott. And while, yes, he inherited that choice from ivmmeta, he also did not state any objections to it, and it was his chosen starting point. He could have tried to start from homogeneous endpoints, like ivmmeta does here, or like Tess Lawrie does in her meta-analysis, but that’s not what he did, so I followed in his footsteps.
But let’s consider what that would mean: the signal found in the original post was actually far far weaker than originally thought, because heterogeneous endpoints cannot be combined. So assuming Scott’s previous justification was correct—that he indeed found and still believes ivermectin shows a statistically significant benefit and the disagreement is over the cause and geographic distribution—well, apparently the confidence in the signal is unsupported by the article.
In other words, to say my objection is invalid because of heterogeneous endpoints but not that the original article is invalid because of the same reason is quite an interesting intellectual exercise.
Method 4: Maybe You Were Right for the Wrong Reasons
The above points were combined into a franken-argument by the community in the Reddit comments (though not by Scott himself, please note). Perhaps the t-test was wrong, and perhaps heterogenous endpoints were wrong, but since these two errors work in the opposite direction, we can assume that they cancel each other out. There are a number of problems with this:
First, to say that the errors cancel out exactly is to go quite far out on a limb in terms of statistics. Perhaps a rewritten post could make that case, but I don’t think the original post would be nearly as convincing if it was clear that numerous adventurous statistical assumptions are being made.
Second, these are not the only things wrong with the original article. For instance, I believe that the trust in Gideon Meyerowitz-Katz, a self-declared biased actor, is disqualifying. For the smallest sample of what I’m talking about, this is from December 2020, long before we had most of this data:
I also object to the argument’s heavy reliance on frequentist statistics: for instance considering positive but not statistically significant results “negative,” focusing on whether an imaginary line of statistical significance was or was not crossed, etc.
Moreover, the “worms” hypothesis is a special case of an argument that could be used to hurt any repurposed drug—but is only posited for ivermectin—i.e. looking to see if the original mechanism is what is doing the work in the repurposing. Why is Scott not asking if fluvoxamine is only effective in depressed patients? If we look at the TOGETHER trial, patients taking fluvoxamine seem to go to the ER or hospital earlier, stay in the hospital longer, and there is a big difference in the effectiveness of the drug between men and women (who are 60-70% more likely to suffer from depression). Depression is a major COVID risk factor, with one study finding a 7x increase in risk. Could it be that SSRIs simply help with lifting the depression, helping some patients seek care earlier, and in general improving their outlook?
I’m not saying this is the case. I’m saying that positing worms explain ivermectin and not even considering if depression explains fluvoxamine is yet another isolated demand for rigor.
Finally, I object to the balkanization of evidence the argument engages in. Essentially, Scott’s original argument boils down to:
Ignore any other evidence than clinical trials.
Of those, focus only on early treatment.
Of those, focus only on a subset of studies that Scott determines to be useful.
Of those, focus only on the subset that GidMK also approves.
Assemble two groups of heterogenous endpoints from those studies.
Perform a t-test on those endpoint sets.
Declare that while there is a weak signal, the answer is worms because of some tweets that don’t meet even the most basic scientific standard applied to all the studies above.
On the basis of that, declare that people who said there was a signal were wrong, and the people who set out to debunk it were right.
No matter how strong an evidence base is, when most of the evidence is thrown out the window, and we allow self-declared “debunkers” to filter what tiny subset will be selected, then anything can be made to look tenuous.
Instead of only discussing the limitations of the argument, I chose to to show that, even accepting the limitations, fixing only the never-before-seen application of a t-test with what would be the default method, the results are very different. For people to then pick another part of Scott’s argument, and treat its weakness as if it nullifies my critique, is motivated reasoning of the highest order. For one, this is another reason why Scott’s original argument doesn’t work. Some people in the comment thread did agree with this, to their credit. For another, why not challenge the trust in GidMK? The focus on early treatment only? The reliance on the incredibly weakly supported worms hypothesis? Would those presuppositions also be challenged only if it was useful?
And most importantly, why was this concern not brought up by anyone originally, but only discovered now as a counterbalance to the issue with the statistical test used?
This argument is like saying that your ship taking in water from one side is not a problem, because hey, it is taking in water from the other side too, and if we’re lucky that means it won’t keel over as it’s sinking.
Method 5: Nuke the Chessboard
Someone else in the community quite aggressively demands I issue a full and proper correction because they found a paper that indicates the DerSimonian-Laird method is technically inappropriate to use here. Except “here” includes most meta-analyses ever done. While I disagree with the thrust of the critique, and the conclusion that I should broadcast a correction to anyone, you can follow the link and make up your own mind. Here’s the meat of the critique:
It is well known that [DerSimonian-Laird] is suboptimal and may lead to too many statistically significant results when the number of studies is small and there is moderate or substantial heterogeneity. If a treatment is inefficacious and testing is done at a significance level of 0.05, the error rate should be 5%, i.e. only one in 20 tests should result in a statistically significant result. For the DL method, the error rate can be substantially higher, unless the number of studies is large (≫ 20) and there is no or only minimal heterogeneity,” per this paper. Scott's meta-analysis had 11 studies, which is not ≫ 20. Later in the paper, they do a simulation which comes out to 30% false positives at p=0.05, and they report that "25.1% of the significant findings [in one sample] for the DL method were non-significant with [a different, better] method.”
Remember that in my previous post, I documented that this is the method used by everyone in the ivermectin debate from Tess Lawrie’s team to Gideon Meyerowitz-Katz (and again) to Cochrane to Andrew Hill to Ivmmeta. Many of these analyses combine fewer studies than I did. I assume this person did not have an issue with this until it became useful to point out that DL had a false-positives problem. They probably aren’t concerned that clinical trials have a false-negatives problem, either. This person also did not apply their own proposed method to see what the revised results would be. They simply assert that it is just as big an issue, which I strongly doubt, but that doesn’t really matter.
Had I used some other test, people would complain I had not used the default option. And in all this hubbub, this person seems to be ignoring that Scott used a method that nobody has ever used for meta-analysis, which is my actual complaint. The method I used—as additional effort on my part—to demonstrate the magnitude of the issue, is tangential to the actual problem.
However, this person is willing to blow up the entirety of the field of meta-analysis to get rid of one uncomfortable result. They might be right on the narrow point. Hell, they probably are. Meta-analyses are a hard problem, and the statistics involved are more art than science.
But to change the rules of the game in midstream because you don’t like where it’s going is very similar to people questioning Robert Malone’s role as a key inventor in the history of mRNA vaccines “because very many people contributed.” While this argument is not being used to question every other invention in the world, we’re willing to throw out the very concept of anyone inventing anything in order to get rid of the troublesome Malone.
Truly an isolated demand for rigor if I’ve ever seen one.
Method 6: Focus on a Subset
Over the course of my engagement with Scott’s work, I’ve pointed out perhaps a dozen big problems, and the meta-analysis statistical methods issue is the only one I’ve been able to get him to really engage with. It’s almost as if anything else doesn’t register or doesn’t merit addressing.
While I’m all about resolving matters in excruciating detail, it should not be done at the expense of giving an impression that all critiques have been addressed. Scott seems to think that the issues I’ve raised on GidMK being obviously biased, his ignoring of timing and dosing issues, the multitude of problems with the TOGETHER trial, the flimsiness of the worms hypothesis, and so much more, are not even worth engaging with, or even mentioning. While that helps with appearances, it does not help with finding out the truth of the object-level question.
I understand if Scott doesn’t have time to deal with this, but sadly, he has waded into a genuinely complicated problem that did not yield to his schedule. If we’re complaining about titles people use in articles, Scott can’t just title something “Ivermectin: Much More Than You Wanted To Know,” and not make sure it’s accurate, never mind complete. I understand he may not have planned to spend this much time on this (welcome to my world!) but intellectual honesty demands he either makes a best effort to ensure his claims are accurate, or take the article down.
Since many people also seem confused about where I stand with regard to the key topics in question, allow me to sum up:
I do not know if ivermectin works for Covid or not, but I believe it looks promising. From what I’ve seen, proclamations that it does not work are mostly based on poor arguments that often devolve into authority bias.
I do not know if combining heterogenous endpoints—as ivmmeta does—is statistically valid. To my Bayesian mind, pooling evidence does not seem like the worst idea in the world, but I’m not a statistician. I can see the counter-arguments. Other meta-analyses without this feature seem to find very similar results.
My position on ivermectin administration is that it is so safe, that even mild suspicion that it works should have led to massive trials and worldwide administration, just in case. In fact, I believe the fact that it did not to be an obvious demonstration of our failure to approach the pandemic seriously, independent of its ultimate efficacy.
I may be able to firm up my belief on whether it works or not with substantial effort, but given the prior argument, it’s not worth the investment in time. It’s not like an analysis coming from me would convince anyone, anyway.
I’m sure some will accuse me of dissembling on the fact that I make no claim of it working, so here’s a tweet stating the same position back in August 2021:
The only big change in my subjective sense of evidence since then has been my interaction with the TOGETHER trial, and my sense that the data, when undistorted, show a real signal of efficacy. But that is an argument that uses many of my own subjective priors, and I would be suspicious if others made it, so I don’t press that case.
In defense of Scott Alexander, we have people claiming…
My critiques are not worth looking into because I’m dishonest (or “bad faith” as kids say these days).
That the strong signal doesn’t change the conclusion.
That the weak signal was the right answer all along.
That my critique is invalid because heterogeneous endpoints cannot be combined in a meta-analysis, so there’s no signal to speak of.
That I should not have used the meta-analysis method the vast majority of meta-analyses use because it’s technically inappropriate.
And somehow all these arguers converge on the point that the fact that two-thirds of the article are spent figuring out whether there was (strong, weak, no) signal are irrelevant to the conclusion, which they all agree is fine as it is. Importantly, they’re not correcting each other on their mutually contradictory arguments, but mostly focusing on why their particular argument means my own correction of Scott is invalid.
In other words, unless I can demonstrate greater than PhD-level mastery of every single scientific field involved and the particular issues involved in Scott’s article, state my issues dispassionately yet attract enough attention to matter, have the wit of Churchill and the charm of Carl Sagan, my critiques are pointless and I am a dishonest idiot nobody should ever listen to. </sarcasm>
Rationalists, come to your senses and remember your Art. This is not decision theory or General AI. If we can’t figure out something this simple, what hope have we when facing the future? Millions of lives, and the future of our species, are at stake.