Discover more from Do Your Own Research
The Potemkin Argument, Part 13: Consequentialism for Me, Deonticism for Thee
Since starting work on this series, we’ve come a long way. We’ve gone through 17 of the 30 studies Scott covered. Let’s address the rest:
First, let’s work through four studies that I think are worth discussing in a little bit of detail:
Chowdhury et al.
Chowdury et al: Bangladeshi RCT. 60 patients in Group A got low-dose ivermectin plus the antibiotic doxycycline, 56 in Group B got hydroxychloroquine (another weird COVID treatment which most scientists think doesn’t work) plus the antibiotic azithromycin.
I’d like to take issue with the parenthetical here. Scott calls HCQ "another weird COVID treatment which most scientists think doesn’t work," implying that ivermectin is also in that group. I have quite a bit to say about that:
There is no such poll, and Scott doesn’t cite one. We cannot know what "most scientists think" (especially given human tendencies for preference falsification on highly politicized questions).
Scott even perpetuates this very notion by calling HCQ “weird.” This is exactly the sort of signal that leads to falsified preferences.
What’s the first thought that comes to mind when you think of HCQ? (Take a minute to answer this question before reading on.) Is it the most polarizing politician of the last few decades? Do you think this might affect people’s opinions?
It’s pretty obvious that “most scientists” didn’t “science” their way to their conclusions. Even if they genuinely hold such an opinion, they arrived at it by social triangulation. So it’s unclear what we should learn from this, even if it is, indeed, a fact.
…and this is why science is not a democracy. We do not vote on what is true: we prove it by experiment.
Let’s see what people do with this recent meta-analysis by Harvard-affiliated scientists showing that HCQ is effective as a prophylactic. (Who am I kidding, they’ll probably ignore it.)
To imply a scientific question—one for which the evidence we have is decidedly mixed—can be properly settled into a lasting consensus within a year or two is sheer madness.
No declared primary outcome. Ivermectin group got to negative PCR a little faster than the other (5.9 vs. 7 days) but it wasn’t statistically significant (p = 0.2). A couple of other non-statistically-significant things happened too. 2 controls were hospitalized, 0 ivermectin patients were.
This is a boring study that got boring results, so nobody has felt the need to assassinate it, but if they did, it would probably focus on both groups getting various medications besides ivermectin. None of these other medications are believed to work, so I don’t really care about this, but you could tell a story where actually doxycycline works great at addressing associated bacterial pneumonias, or where HCQ causes lots of side effects and that makes the ivermectin group look good in comparison, or whatever.
Let’s clarify what Scott says here: “It’s possible that HCQ harmed the patients such that ivermectin looks better than it otherwise would.” Here’s the thing: HCQ is a remarkably well-understood drug, what with having been in use for over 65 years. As such, HCQ has dosing that is known safe. The "lots of side effects" stories come when patients are overdosed, such as the dosing that the RECOVERY and SOLIDARITY trials used. No need for stories, we can check the dose given: lo and behold, Chowdhury used only 42% of that dose—so we should not be expecting anything like those side-effects.
I’m no HCQ defender—and I’ve not looked into the data deeply enough to tell you what is going on—but this is what makes Scott’s take here even more egregious. He’s recycling memes that are trivially shown to be questionable, even by a bystander, and using them as a way to support his critique of a study.
What’s more, this is one of the earliest studies done on ivermectin for COVID-19, running from May to early June 2020. Its function was to give us some early data and kick-start the process of looking for useful treatments, not to be the definitive be-all and end-all study. As a result—being tiny—it doesn’t move the needle much. The reason I make a point of commenting on it is that we should be praising the researchers who didn’t sit on their hands in those early days, not “trying to assassinate” their studies.
Espitia-Hernandez et al: Mexican trial which is probably not an RCT - all it says is that “patients were voluntarily allocated”.
There’s that obsession with RCTs again. Why not list out all the other things this study is not while we’re at it?
Before we continue with this study, we should make it clear that this was another of the earliest studies on ivermectin—conducted in Mexico City—between April 1 and May 20, 2020. Much like Chowdhury, it must be understood as an extremely early pilot to see if there was something to the idea that ivermectin (and other generics) could help, not as a be-all and end-all.
28 ended up taking a cocktail of low-dose ivermectin, vitamin D, and azithromycin; 7 were controls. On day ten, everyone (!) in the experimental group was PCR negative; everyone (!) in the control group was still positive. Also, symptoms in the experimental group lasted an average of three days; in the control group, more like 10. These results make ivermectin look amazingly super-good, probably better than any other drug for any other disease, except maybe stuff like vitamins for treatment of vitamin deficiency.
Any issues? We don’t know how patients were allocated, but they discuss patient characteristics and they don’t look different enough to produce this big an effect size. The experimental group got a lot of things other than ivermectin, but I would be equally surprised if vitamin D or azithromycin cured COVID this effectively.
Scott is probably not aware that azithromycin and ivermectin are known to be synergistic, at least in vitro. Also, it is very well known that viruses tend to require drug combinations to be brought under control. Here’s TOGETHER trial principal investigator Ed Mills in the Halifax Examiner:
That’s correct and almost always in infectious diseases, we require a group of drugs. It’s very rare that one drug is the drug that ends a condition. So if you look at HIV, for example, it’s usually a three-drug cocktail that’s required. Individually, those drugs are almost useless. But you put them together, they’re magic. Hepatitis C, it’s a two-drug combination. Individually, it was really painful for people, but you put them together and you get a cure. That typically is the way that it is with infectious diseases. We’re going to be looking at combinations of drugs.
Back to Scott:
It deviated from its preregistration in basically every way possible,
Scott is being vague here. Looking at the registration, I don’t see where the massive deviation is. There may be some lack of matching on the dates, but this is a retrospective trial. These are not usually pre-registered, and this one isn’t either. It was first posted after it was completed.
but you shouldn’t be able to get “every experimental patient tested negative when zero control patients did” by garden-of-forking-paths alone!
Again, this is a retrospective trial.
But this has to be false, right? Even the other pro-ivermectin studies don’t show effects nearly this big. In all other studies combined, ivermectin patients took an average of 8 days to recover; in Espitia-Hernandez, they took 3.
Well, it really depends on the cycle threshold used for PCR.
Also, it’s pretty weird that the entire control group had positive PCRs on day 10 - in most other studies, a majority of people had negative PCRs by day 7 or so, regardless of whether they were control or placebo.
Well, it really depends on the cycle threshold used for PCR.
Everything about this is so shoddy that I can easily believe something went wrong here.
Wait till you see the results in the Paxlovid trials…
I don’t have a great understanding of this one but I don’t trust it at all. Luckily it is small and non-randomized so it will be easy to ignore going forward.
As usual, Scott’s way of thanking researchers for doing research in the earliest stages of the pandemic is to mock them and besmirch their reputations—mostly based on his own misunderstandings. If I’ve missed anything please let me know, but Scott really doesn’t help himself here by being vague and essentially relying on an argument by incredulity.
Ravikirti et al.
Ravakirti et al: Here we’re in Eastern India - not exactly Bangladesh again, but a stone’s throw away from it. In this RCT patients were randomized into an ivermectin group (57) and a placebo group (58). Primary outcome was negative PCR on day 6, because doing it on day 7 like everyone else would be too easy. As with several other groups, this was a bad move; too few people had it to make a good comparison; it was 13% of intervention vs. 18% of placebo, p = 0.3. Secondary outcomes were also pretty boring, except for the most important: 4 people in the placebo group died, compared to 0 in ivermectin (p = 0.045).
On the one hand, this is one outcome of many, reaching the barest significance threshold. Another fluke? Still, there are no real problems with this study, and nobody has anything to say against it. Let’s add this one to the scale as another very small and noisy piece of real evidence in ivermectin’s favor.
Actually, there are some pretty real concerns with the PCR testing data in this trial, given that the treatment arm is missing 28/55 (51%) results, and the control arm is missing 18/57 (32%) results. The lopsidedness seems to be related to treatment arm patients being discharged since they were cured early—and therefore not having data for the day 6 test—but still, the viral negativity results cannot be trusted (and this will become relevant down the line).
Vallejos et al.
Vallejos et al: Another Argentine study. It’s big (250 people in each arm). It’s an RCT. It tries to define a primary outcome (“Primary outcome: the trial ended when the last patient who was included achieved the end of study visit”), but that’s not what “primary outcome” means, and they don’t offer an alternative.
The paper is pretty consistent in that “The efficacy of ivermectin to prevent hospitalizations was evaluated as primary outcome.“ Checking with clinicaltrials.gov indicates that this was indeed a pre-registered primary outcome.
Other outcomes: no difference in PCR on days 3 or 12.
Cue the usual “no statistically significant difference does not mean no difference” etc. etc. Actually, in this case the difference is against ivermectin, but the point stands.
Hospitalization is nonsignificantly better in the ivermectin group (14 vs. 21, p = 0.2), but death is nonsigificantly better in the placebo group (3 vs. 4, p = 0.7). This isn’t even the kind of nonsignificant that might contribute to an exciting meta-analysis later. This is just a pure null result.
Given that hospitalization was their primary endpoint, what it does tell us is that the study was not sufficiently powered to identify a 33% reduction with statistical significance.
In fact, the paper itself says as much:
This study has several limitations. Firstly, the percentage of events in relation to the primary outcome was below the estimate, so this trial was under powered.
Secondly, the mean dose of ivermectin was 192.37 μg/kg/day (SD ± 24.56), which is below the doses proposed as probably effective [20, 33]. Thirdly, a middle-aged population was included which, in accordance with the first point raised in this section, had hospitalization events below the 10% set at the time of calculating the sample size.
Not only did they undersize the study—given the low hospitalization percentage observed in the placebo population—they actually admit to underdosing the patients as well.
I cannot find any problem with this study, and neither can anyone else I checked.
I guess he didn’t check the limitations mentioned in the paper itself, but also didn’t check with ivmmeta, which is odd, given that he talks about them throughout his article. Ivmmeta seems to have had a number of serious issues, even if we check the archived form of the webpage as it was at the time. Some of those issues:
74 patients had symptoms for >= 7 days
The companion prophylaxis trial [Vallejos], which reported more positive results, has not yet been formally published
Authors pre-specify multivariate analysis but do not present it, however multivariate analysis could significantly change the results
An extremely large percentage of patients (55%) were excluded based on ivermectin use in the last 7 days, suggesting the usual South American background use issues were at play here too.
As an interesting aside, you will note that the archived page mentions “more than 25% of patients were hospitalized within 1 day (Figure S2).” When I tried to confirm this claim, I noticed that they had misinterpreted the figure. So I let them know, and that claim is no longer there.
This is such a critical point. While Scott criticizes ivmmeta for bias, they accept feedback that goes against their perceived bias. In contrast, Scott—at least so far—has not been receptive to the dozens of errors pointed out in this series.
Going back to the list of issues, on that last point about a large percentage of patients (55%) being excluded based on prior ivermectin use, all the Google trend patterns are identical to those seen on the TOGETHER and Lopez-Medina trials.
Again, the researchers confirm the issue. Here’s a google translated pull-quote from an article commenting on the trial:
This is the biggest RCT we’ve seen so far, so we should take it seriously.
I’m not sure why size matters here. In fact, as we saw, the trial was not large enough to support the endpoint chosen, as per the admission of the researchers. Size may be indicative of the budget available for the trial, but not so much about the quality of the results.
The Last Mile
So now we’ve come up to 21 of the 30 studies Scott has reviewed.
I also have no huge objection to Scott’s evaluation of Mahmoud et al., which we both agree is of high quality. We’re both somewhat neutral on Bukhari et al., as well as Roy et al., and we both agree that Loue et al. is pretty bad. Mourya et al. is a mess. I don't really know what we're supposed to get from this trial for a meta-analysis.
On Szente-Fonseca et al., I agree with Scott that the results are unusable—though keep this in mind when we get to the worms hypothesis later—it relies on this paper quite substantially.
When it comes to Faisal et al., I mostly agree with Scott's criticisms, though the study doesn't seem to be as bad as he makes it sound. The one criticism that doesn't make sense to me is the "no primary endpoint" one, but without a pre-registration, there's not much we can say.
When it comes to Aref et al., I can’t find what dose the patients were given, which is a pretty big omission. This is the first paper Scott likes and I don’t, but ultimately he ended up dropping it due to a recommendation by Gideon Meyerwitz-Katz.
Elgazzar et al. was already removed from ivmmeta at the time Scott wrote his piece, so it’s more of a (dis)honorable mention. Some people have mentioned a number of issues with the take-down of that study, but in my estimation it’s not a fight worth fighting. Given the lack of substantive response by the author, we must assume the accusations are true.
This brings us to 30 of the 30 studies in the literature review section. WE DID IT!
Scott’s Criteria, Distilled
What have we learned so far? It looks like Scott has a number of red lines when it comes to studies:
Not explicitly declaring a primary outcome in the text of the paper
Not being an RCT
Not being published and peer-reviewed in a high-impact journal
Not having a plausible result
Being caught in Carlisle-style filtering (or looking like you might be)
The authors’ reputation in relation to other studies they’ve published
Gideon Meyerowitz-Katz not being a fan
As you can see, it’s a bit of a mixed bag. Some of these criteria are somewhat reasonable, others are context-dependent, while some cross the line into the anti-scientific.
However, what stands out to me is the near-zero-tolerance shown towards studies. When two or three of the above triggers are present—or Scott believes them to be present—the study goes out the window, regardless of whether the alleged issues explain the magnitude of the results seen, or not.
This seems like quite a non-Bayesian approach to the problem, shredding considerable amounts of usable signal. I’d also say that I found it remarkable that factors such as dosing, treatment delay, variants, and background use didn’t come into the picture. However, so long as it’s followed consistently, we can at least recognize it as a somewhat-legible, coherent methodology, the value of which we can debate.
But therein lies the rub: when Scott comes to the the “big, professional” RCTs—particularly Vallejos, TOGETHER, and Lopez-Medina—the ones that have gotten all the headlines in the media and hailed as definitive—he switches from strict rule-following to consequentialist mode.
These aren’t real quotes, but they totally could have been:
“Maybe TOGETHER shifted the control group in time, but how bad could that be, really?”
“Perhaps Vallejos didn’t state a primary endpoint, but it’s big, so we should take it seriously.”
“OK, fine, Lopez-Medina altered the placebo mid-trial, but that’s because they’re conscientious!”
My sense is that it is this double standard that drives readers to form an impression that the researchers producing the positive studies are unscrupulous and perhaps unserious, while the researchers producing the critical studies are simply struggling with the fundamental complexity of doing a clinical trial. It’s like a real-life version of fundamental attribution error. When applied broadly to groups, this is also known as “myside bias:”
Myside bias occurs when people evaluate evidence, generate evidence, and test hypotheses in a manner biased toward their own prior opinions and attitudes. Research across a wide variety of myside bias paradigms has revealed a somewhat surprising finding regarding individual differences. The magnitude of the myside bias shows very little relation to intelligence. Avoiding myside bias is thus one rational thinking skill that is not assessed by intelligence tests or even indirectly indexed through its correlation with cognitive ability measures.
Do I claim that Scott is uniquely subject to this bias, while I, alone, am free of it? Far from it. What I am claiming is the knowledge that we’re all subject to such a bias— regardless of whatever intrinsic strength of intellectual machinery we might be working with—means that we need to be extremely rigorous with applying guardrails in our analyses. The fact that Scott is quick to personalize the issue of me offering corrections to his essay makes me trust his work less. The fact that ivmmeta accepts corrections even when they go against their claimed bias makes me trust their work more.
To be notified when new articles in this series are published, you can subscribe below: