This is a public peer review of Scott Alexander’s essay on ivermectin, of which this is the second part. You can find an index containing all the articles in this series here.
For those of you following this series, you may have noted that I jumped over the introductory content of Scott’s article. For those of you reading this series for the first time, this is a good place to start reading this public, in-depth peer review.
Scott starts the essay like so:
I know I’m two months late here. Everyone’s already made up their mind and moved on to other things.
But here’s my pitch: this is one of the most carefully-pored-over scientific issues of our time. Dozens of teams published studies saying ivermectin definitely worked. Then most scientists concluded it didn’t. What a great opportunity to exercise our study-analyzing muscles! To learn stuff about how science works which we can then apply to less well-traveled terrain! Sure, you read the articles saying that experts had concluded the studies were wrong. But did you really develop a gears-level understanding of what was going on? That’s what we have a chance to get here!
This is a fascinating introduction, because of the ambiguity of interpretation it allows. Was it that the studies were right and the experts somehow subverted the obvious conclusion to derive an alternative message? Or were the experts right and something was going systematically wrong with the studies? These are the questions that need to be answered: the answers have implications that touch the foundation of scientific investigation into politically charged topics, under the glare of a media frenzy. In other words, ivermectin is the perfect microcosm through which we can evaluate how science and collective sense-making interact, and to what end.
The Devil’s Advocate
Any deep dive into ivermectin has to start here:
This is from ivmmeta.com, part of a sprawling empire of big professional-looking sites promoting unorthodox coronavirus treatments.
It must be noted here that c19early.com covers treatments that are approved in the USA (Paxlovid, Molnupiravir, monoclonal antibodies, Remdesivir, others) as well as treatments that are approved in countries other than the USA (ivermectin, budesonide, probiotics, others), alongside treatments that are not approved anywhere (yet?) such as Nigella sativa, vitamin A, colchicine, famotidine, others). The characterization “promoting unorthodox treatments” can only be understood through the lens of treatments the USA has approved, and even then, those are covered by the c19early network as well:
I have no idea who runs it - they’ve very reasonably kept their identity secret - but my hat is off to them. Each of these study names links to a discussion page which extracts key outcomes and offers links to html and pdf versions of the full text. These same people have another 35 ivermectin studies with different inclusion criteria, subanalyses by every variable under the sun, responses and counterresponses to everyone who disagrees with them about every study, and they’ve done this for twenty-nine other controversial COVID treatments.
Again—important to note—not all treatments covered by c19early would be characterized as controversial by any single person.
Putting aside the question of accuracy and grading only on presentation and scale, this is the most impressive act of science communication I have ever seen. The WHO and CDC get billions of dollars in funding and neither of them has been able to communicate their perspective anywhere near as effectively. Even an atheist can appreciate a cathedral, and even an ivermectin skeptic should be able to appreciate this website.
What stands out most in this image (their studies on early treatment only; there are more on other things) is all the green boxes on the left side of the table. A green box means that the ivermectin group did better than placebo (a red box means the opposite). This isn’t adjusted for statistical significance - indeed, many of these studies don’t reach it.
I have no idea what Scott means by “adjusted for statistical significance” here. What Scott is commenting on is ivmmeta’s version of a forest plot, which meta-analyses commonly use. It’s hard to understand what adjustment for “statistical significance” could be done, other than filtering out non-statistically-significant studies—which as Scott clearly understands—flies in the face of the essence of a meta-analysis:
The point of a meta-analysis is that things that aren’t statistically significant on their own can become so after you pool them with other things. If you see one green box, it could mean the ivermectin group just got a little luckier than the placebo group. When you see 26 boxes compared to only 4 red ones, you know that nobody gets that lucky.
No objection, your honor.
Is The Ivmmeta Methodology Fair?
Acknowledging that this is interesting, let’s detract from it a little.
Let’s see what issues Scott has with ivmmeta:
First, this presentation can exaggerate the effect size (represented by how far the green boxes are to the left of the gray line in the middle representing no effect). It focuses on the most dire outcome in every study - death if anybody died, hospitalization if anyone was hospitalized, etc.
This is an interesting critique to work through. To start, meta-analyses often focus on dire scenarios, as they tend to be both the ones people actually care about avoiding, and the ones harder to game. It’s much more difficult to trust a result that relies on subjective outcomes that include self-reporting or physician judgement. Endpoints such as hospitalization or death tend to be the kind of things people trust more, and for good reason. After all, you can game the cycle threshold on a PCR test, but, in the immortal words of The Wire, how do you make a body disappear?
In fact, the strongyloides meta-analysis that Scott will later rely on quite heavily, exclusively tracks mortality outcomes. Does this critique apply to that too? In other words, is Scott critiquing ivmmeta, or the concept of a meta-analysis more generally?
Most studies are small, and most COVID cases do fine, so most of these only have one or two people die or get hospitalized. So the score is often something like “ivermectin, 0 deaths; placebo, 1 death”, which is an infinitely large relative risk, […]
It should be noted that zero events appear six times in treatment and two times in the control arm, so this is a critique at most affecting 8/29 studies on the ivmmeta list Scott is commenting on. Of those, these are the ones that are disproportionately smaller (and therefore shift the conclusion of the meta-analysis by the smallest amount).
[…] and then the site rounds it down to some very high finite number.
Please keep in mind that ivmmeta is not making some arbitrary or random decision here. In their methods section, they go into quite a bit of detail about their approach:
If continuity correction for zero values is required, we use the reciprocal of the opposite arm with the sum of the correction factors equal to 1 [Sweeting]. Results are expressed with RR < 1.0 favoring treatment, and using the risk of a negative outcome when applicable (for example, the risk of death rather than the risk of survival). If studies only report relative continuous values such as relative times, the ratio of the time for the treatment group versus the time for the control group is used
Further, their reference goes to an article published in the journal Statistics in Medicine called “What to add to nothing? Use and avoidance of continuity corrections in meta‐analysis of sparse data.” A mathematician friend, whom I trust, reviewed it and described it as follows:
By the way, the reference they give (Sweeting) seems to be a serious article that digs deeply into the subject of continuity corrections for zero events, so apparently the ivmmeta people did a good job here.
This is good enough for me, but if others have any more substantial feedback on this issue, I’d love to hear it.
Back to Scott:
This methodology naturally produces very big apparent effects, and the rare studies where ivermectin does worse than placebo are equally exaggerated (one says that ivermectin patients are 600% more likely to end up hospitalized).
You won’t be surprised to hear that Scott chose the most extreme outlier as an example here, four times more extreme than any of the other 28 studies.
But this doesn’t change the basic fact that ivermectin beats placebo in 26/30 of these studies.
It’s actually 25/29, possibly Scott was counting Elgazzar in this group?
Second, this presents a pretty different picture than you would get reading the studies themselves. Most of these studies are looking at outcomes like viral load, how long until the patient tests negative, how long until the patient’s symptoms go away, etc.
As we covered above, meta-analyses commonly do not follow the primary outcome of the studies they combine. Doing so would make it much harder to find studies to combine, defeating the purpose of a meta-analysis to begin with. What’s more, sticking with the most serious endpoint prevents the kind of gaming we see in some of the studies, where authors pick some pretty exotic endpoints. Often, it’s not clear if those endpoints were chosen before, during, or after the trial is completed.
Many of these results are statistically insignificant or of low effect size.
It’s worth digging into the precise claim here. Scott seems to say that the way ivmmeta selects endpoints leads to more statistically significant results, and to a larger effect size. Thankfully, ivmmeta offers both analyses, so we can test the hypothesis. Below, you can find the early treatment section of the two analyses. One with ivmmeta-method selected outcomes (left), and one with results using study-selected primary outcomes (right). There were more studies when I took these screenshots than when Scott took his snapshot, but given that his critique is general, the time of the snapshot shouldn’t really matter.
Using the ivmmeta method of outcome selection, 12/36 outcomes are “statistically significant,” and a combined effect size of 63%. Using the primary outcomes, we have 14/36 outcomes as “statistically significant” and a combined effect size of 57%.
It seems Scott’s critique here is off the mark in a verifiable way. There doesn’t appear to be any systemic difference in how many of the outcomes are “statistically significant,” nor is the combined effect size drastically exaggerated. That is, unless you think the distance between 57% and 63% would make a real difference in the debate around ivermectin.
I went through these studies and tried to get some more information for my own reference:
Of studies that included any of the endpoints I recorded, ivermectin had a statistically significant effect on the endpoint 13 times, and failed to reach significance 8 times. Of studies that named a specific primary endpoint, 9 found ivermectin affected it significantly, and 12 found it didn’t.
But that’s still pretty good. And “doesn’t affect to a statistically significant degree” doesn’t mean it doesn’t work. It might just mean your study is too small for a real and important effect to achieve statistical significance.
This might sound like a minor issue, but I actually consider it to be fundamental. Scott uses the shorthand of “ivermectin failed to reach significance” and “ivermectin affected it significantly,” though that conclusion cannot really be inferred from the studies themselves. Clinical trials are complex beasts, each with a voluminous protocol, describing many experimental parameters—such as dosing, patient inclusion/exclusion criteria, diagnostic criteria, selected endpoints, etc. What’s more, local conditions—such as patient availability, quality of healthcare infrastructure, etc.—and trial execution issues—such as timing of administration, quality of followup, etc.—all affect whether the trial will be successful in “reaching statistical significance.” To collapse all these factors into a univariate analysis titled “ivermectin” leaves way too much out of the picture to be a reasonable way to describe what we’re looking at.
To use an analogy, if you’re trying to catch a station’s signal on your FM car radio, whether the station exists and is broadcasting is not the only factor that affects your ability to hear it clearly. Your car’s distance from the closest transmitter, whether there are mountains or other obstacles in the way, the quality of your receiver, the frequency you set your radio to—as well as your hearing working properly—all affect the listening experience. The inability to hear the broadcast clearly cannot be automatically considered evidence that the station does not exist.
Back to Scott:
That’s why people do meta-analyses to combine studies. And the ivmmeta people say they did that and it was really impressive. All of this is still basically what things would look like if ivermectin worked.
But of course we can’t give every study one vote. We’ve got to actually look at these and see which ones are good and which ones are bad.
Meta-analyses also don’t give “every study one vote.” They use algorithms that weigh the effect found in each study, the number of patients in each group, and other data to weigh each study against each other. Further, the most serious meta-analyses perform what they call “risk of bias” analyses. Here’s what it looks like for the Bryant et al. systematic review and meta-analysis:
So, God help us, let’s go over all thirty of the ivermectin studies in this top panel of ivmmeta.com.
God help us indeed, because I’m going to go over all of Scott’s reviews on the ivermectin studies. Fingers crossed, and I’ll see you on the other side.
This is a public peer review of Scott Alexander’s essay on ivermectin, of which this is the second part. You can find an index containing all the articles in this series here.
Debating whether or not IVM works seems a like debating whether or not if you if you sail the ocean far enough you are going to fall off the edge of the world. Enough people have sailed the world now and haven't fallen off. Millins of people have successfully used IVM early on and seen symptoms quickly resolve. It works regardless of who is being paid off by Unitaid.
"Promoting unorthodox treatments" pretty much announces Scott as a shill.
If a list of experiments on antivirals, protease inhibitors, immunomodulators, corticosteroids, vitamins, anti-inflammatory agents, interferon modulators...if that's unorthodox, then he must be declaring fealty to the vaccine lords. Because that's pretty much all that's left in the orthodoxy.
Does he ever examine for a moment the absurdity of the WHO trials? Or point out that hospitals are the unorthodox place to test antivirals when people are told not to go there until the disease progresses?
His leaping past the thousand simple signals of corruption suggests that has is either compromised or not nearly as bright as his ability to organize an essay around topics that Bay Area techies talk about in lieu of doing something productive for the world.