The Potemkin Argument, Part 4: How Scott Alexander and GidMK Caricatured the Work of Honest Scientists
In the quest for a snappy narrative, corners were cut and a game of telephone made bad claims worse, while broadcasting them to the world.
This is a public peer review of Scott Alexander’s essay on ivermectin, of which this is the fourth part. You can find an index containing all the articles in this series here.
Biber et al. in a Nutshell
An Israeli medical team from the Shiba Medical Center—one of the world’s top-ranked hospitals—gave ivermectin or placebo to residents of two of Israel’s quarantine hotels for SARS-CoV-2-positive patients. The dosing was reasonable for the early variants, the patients were treated reasonably soon after infection (average four days), and the researchers did a lot of work to measure whether ivermectin had an effect on accelerating viral clearance. They found that it did. They also cultured the samples collected for the PCR tests and found that when you focus on cases where there was viable virus (not just fragments) the improvement in the ivermectin group stands out even more.
The study was led by Prof. Eli Schwartz, who is, among other things:
Full Professor of Medicine at Tel Aviv University
President of the Israeli Society of Parasitology and Tropical Diseases
Founder of Center of Geographic Medicine and Tropical Diseases at Shiba Medical Center
Head of the Israeli Ministry of Defense task-team on evaluating therapeutic options for COVID-19
This looks like the high-quality study, from a well-respected scientist and his team, at a top-ranked medical system, in a country with extremely low presence of Strongyloides that can settle the debate, right? Not quite.
How GidMK Got Biber Wrong
Scott Alexander based his view of Biber et al. on a thread by epidemiologist Gideon Meyerowitz-Katz (henceforth GidMK). If you don’t know who that is, you can find my take here.
Gideon, apparently, used the study as a “teaching tool” in “how little effort it takes to be critical about research findings”:
After summarizing the basics of the trial—as well as his hypothesis for why it’s attracted attention—he dives into what he claims is a huge discrepancy:
So, what is it? Well, certain patients that had been randomized either to treatment or control, tested negative for COVID-19 right at the start of the study and then tested negative again in the next test administered.
Negative, in this context, is a PCR test that could not find any evidence of the virus after 35 or more cycles of amplification. Technically, at the time in Israel—just as in the US—a negative test required 40 rounds of amplification. By the time of the study, research made it clear that anything over 30 was overkill. Still, the researchers didn’t disqualify patients unless they got two negatives, with 35 cycles or more in a row. This seems to have been compatible with mainline thinking. As the New York Times wrote at the time:
Any test with a cycle threshold above 35 is too sensitive, agreed Juliet Morrison, a virologist at the University of California, Riverside. “I’m shocked that people would think that 40 could represent a positive,” she said.
It’s also interesting to read the level of care the researchers invested in getting accurate and comparable data from their PCR tests:
Since results of the test could have been influenced by the examiner who performed the swab and with differences between labs, (Basso et al., 2020; Carroll and McNamara, 2021) a small number of trained practitioners were allocated to obtain the swab during the entire trial and were instructed to use a uniform technique. In addition, all RT-PCR tests, including verification that patients were positive on day zero, were conducted by the same lab, at the Israel Central Virology Laboratory of the Ministry Of Health (located at Sheba Medical Center).
Getting two such negative tests in a row is solid evidence that the patients did not have any relevant presence of the virus. Given that the study focused on whether the intervention would affect when a positive patient would become negative (and even included asymptomatic patients), not being positive to begin with meant that those patients were not going to add any information to the study, so they were excluded.
In the paper, the researchers describe their thought process:
Unexpectedly, some patients who were isolated in the hotels as verified positive patients were found to be borderline or negative upon our RT-PCR test (Figure 1). Therefore, patients who had RT-PCR results with Ct (cycle threshold) value >35 in first two consecutive RT-PCR tests were excluded (two consecutive tests were done in order to be sure that a borderline test was not at very early stage of the disease, but rather they already were cured or were sent to hotel by mistaken results), and equivalent number of patients were further recruited. This was amended by IRB in November 2020.
Note two important elements here:
The twice-negative patients were replaced with further recruitment
The protocol change was approved by the Institutional Review Board (IRB) of the Sheba Medical Center while the trial was ongoing
How does Gideon process all this? Badly.
First, he claims that the “first two consecutive tests” were on days 2 and 4. Given that the patients were tested on day 0, the most reasonable interpretation is that the first test was on day 0, not on day 2.
Then, he claims that changing the protocol (remember, the change was approved by IRB) is somehow a bad thing, which he doesn’t seem to mind in so many other studies (TOGETHER, Lopez-Medina, ACTIV-6, etc. etc.).
He proceeds by claiming that this is somehow nefarious because the patients were “meeting the primary outcome,” even though they were clearly meeting it from day 0, making any improvement impossible. Is his contention that antivirals should be tested on already-recovered or never-infected patients? Hard to know.
He then goes in for the kill:
If Gideon had actually paid close attention, the pre-registered inclusion criteria he linked to was pretty straightforward:
Participants eligible for inclusion will include non-pregnant adult (>18 years old) with molecular confirmation of COVID-19. [Participants will be eligible in a period of no longer than 72 hours after exposure].
If the patient tested negative in the first test of the trial, then the “molecular confirmation” (i.e. positive PCR test at 30 cycles or less) required to qualify for the study comes into question. What happened? Well, the researchers used patients from two quarantine hotels. In doing so, they had to make the assumption that people there were correctly verified as positive. If they had to confirm each patient before randomization, that would add several days from onset of symptoms before a patient could be randomized—a well-known failure mode for early treatment trials. However, when tested, the patients came out negative. And since that negative came out twice in a row, they knew it wasn’t a case of early infection going under the radar. The most logical thing to do was to exclude those patients, and I would hope that any scientist faced with such a dilemma would make the exact same choice.
Another difference we can see from the pre-registration is that they expanded their criteria to include patients up to seven days of symptoms (instead of just three days from symptoms, as intended). It stands to reason that they’d want to make sure that the patients they got were actually PCR positive when the objective of the study is to see if there’s a difference in time to viral clearance.
Gideon, not realizing any of this, keeps going:
Why is he choosing to modify the day 10 results, when the primary endpoint of the study is day six? Who knows. What we do know, is that it is improper to add patients back in that have been replaced (because they arguably didn’t meet the inclusion criteria), and somehow conclude the study is invalid. At least, he makes a basic effort to maintain some standards of academic conduct by saying he is not accusing the authors (though the implication of what he wrote is pretty clear).
In the replies of the various Tweets to the thread, many people try to explain to Gideon parts of the issue with his thesis. He doesn’t seem to have any interest in understanding what is being said to him, and as a result, the thread is still there—to this day—uncorrected.
I suppose when Gideon said he’d demonstrate “how little effort it takes to be critical about research findings,” he didn’t actually promise that the criticisms would be accurate. Just that with relatively low effort he could come up with criticism, any criticism. Mission accomplished.
How Scott Alexander Got GidMK Wrong
Things are bad enough at this point with Gideon’s low-effort statistical mangling of results of the study on the basis of confused arguments that misunderstand what took place.
However, the situation gets much, much worse when Scott attempts to interpret what Gideon wrote for his readers. Let’s walk through what Scott wrote line-by-line:
Biber et al: This is an RCT from Israel. 47 patients got ivermectin and 42 placebo. Primary endpoint was viral load on day 6. I am having trouble finding out what happened with this; as far as I can tell it was a negative result and they buried it in favor of more interesting things.
Scott is… “having trouble” finding the main result of the paper, which is right there in the abstract: “On day 6, OR was 2.62 (95% CI: 1.09-6.31) in ivermectin arm reaching the endpoint.”
The result is given as an odds ratio (OR), and so if your lower boundary is above 1 (in this case it’s 1.09) then you’ve got a “statistically significant” result. Throughout his essay, Scott has a terrible habit of describing positive but not statistically significant studies as negative. Here, he’s actually describing a positive and statistically significant study as negative. His description is literally the opposite of the underlying reality, even within the overly-restrictive frequentist paradigm.
False claims so far: 1
In a "multivariable logistic regression model, the adjusted odds ratio of negative SARS-CoV-2 RT-PCR negative test" favored ivermectin over placebo (p = 0.03 for day 6, p = 0.01 for day 8), but this seems like the kind of thing you do when your primary outcome is boring and you’re angry.
This description of the secondary outcome as “the kind of thing you do when your primary outcome is boring and you’re angry,” is based on his prior misunderstanding, but still, the implication of a hasty analysis for the sake of making their study “pop” borders on mind reading. Being confused is one thing. Being confused and insulting others’ integrity is completely different, and much harder to let go of. But he doesn’t stop there. He highlights the thread from Gideon that we analyzed in the previous section, writing:
[Gideon] notes that the study excluded people with high viral load, but the preregistration didn’t say they would do that.
Read these words closely. Scott seems to think that requiring many amplification cycles on a PCR test means you have “high viral load.” It actually means the opposite: that the traces of the virus’ presence, if any, are so faint, as to require extreme amplification to potentially identify.
Now, Scott is a medical doctor, so I’m sure on some level he knows how PCR tests work. What I am guessing happened here is that he absorbed this connotation from Gideon’s thread because the opposite would not really be as big a problem as Gideon pretends it is (and really, it isn’t). So as his recall filled in the gaps, he converted the actual complaint that Gideon makes into a similar but much more damning one that would be a lot more justified… if it were true. I may be completely wrong here, it’s just the most innocent explanation I can come up with for such a basic error.
False claims so far: 2
Looking more closely, he finds they did that because, if you included these people, the study got no positive results.
Now, Gideon was careful not to say what he was claiming was done on purpose. Scott doesn’t seem to give a damn, falsely attributing intent (note the words “they did that because”).
False claims so far: 3
So probably they did the study, found no positive results, re-ran it with various subsets of patients until they did get a positive result, and then claimed to have “excluded” patients who weren’t in the subset that worked.
And, having gone that far, why stop there? He accuses the authors of intentional academic fraud, of retroactively filtering patients and running many different analyses to get the result they wanted. If this isn’t libel, I have no idea what would be.
I think I’m going to call this one False claim #4. And it only took him a couple of paragraphs for Scott to pack all of those in. Keep in mind, all these claims would be false, even if Gideon’s thread was 100% correct (which it isn’t).
Just to be clear about why Scott’s explanation is obviously false: the study’s power calculation specifies they needed 48 patients in each arm. They recruited more than that—which makes perfect sense—because, as they write in their paper, they were replacing patients that were never meaningfully PCR-positive positive to begin with. Scott is alleging post-hoc manipulation, when the authors clearly state that they submitted this change in protocol to their IRB as the trial was ongoing, had it approved, and replaced patients as the study was still ongoing. In fact, when they decided to replace the patients, they had not seen the results of the study (as the paper clearly states—the envelope with the randomization key was opened only after the study ended), and thus could not have known whether this would create a stronger or weaker result.
I’m incredibly saddened about the treatment that Professor Eli Schwartz and his team got in the hands of the Scott/Gideon duo. This is a mockery of science. I just can’t understand the downright hostile treatment Scott gives them, and on such flimsy grounds. Between this and several other errors of comparable magnitude that I will describe in upcoming parts in this series, it really doesn’t seem that he wrote his article with the level of sacred terror required when taking the reputations of honest scientists in your hands.
Much as he did in the case Flavio Cadegiani (now corrected), he seems satisfied to turn people into caricatures that serve a narrative of sloppy research from far away countries, rather than display basic curiosity about the claims of GidMK. And yet, the sloppy research appears to be his own.
Let’s Correct the Record
In a Jerusalem Post interview, Professor Schwartz himself—the motivating force behind the study—told his story:
“Since ivermectin was on my shelf, since we are using it for tropical diseases, and there were hints it might work, I decided to go for it,” he said.
Researchers in other places worldwide began looking into the drug at around the same time. But when they started to see positive results, no one wanted to publish them, Schwartz said.
“There is a lot of opposition,” he said. “We tried to publish it, and it was kicked away by three journals. No one even wanted to hear about it. You have to ask how come when the world is suffering.”
This is one of many stories of journals rejecting high-quality positive-for-ivermectin publications, often outside of the peer-review process. If Scott wants to claim publication bias, as he does in another part of his article, he should consider what is actually happening on the ground. A year and a half later, the study is finally published. Think about the implications for the world on the off-chance the drug actually works.
It’s also important to note that in the pubpeer record where Gideon’s confused thread is linked, we also see another comment by Gideon’s frequent collaborator, Kyle Sheldrick:
Our group have been examining raw data of RCTs of ivermectin for Covid-19 to look for signs of fraud and fabrication. The anonymized raw data for this trial was provided by Prof Schwartz on October 9 2021.
The data appear genuine and there are no signs of fraud or fabrication.
We examined the data for terminal digit bias and none was found.
We examined the data for unexpected baseline differences between groups and none were found.
We examined the distribution of dichotomous variables for unexpected homogeneity and none was found.
We examined the data for signs of patient duplication and none was found.
We examined the recruitment rate by date for evidence of non-random allocation and none was found.
In short, the data are entirely consistent with what is expected from genuine data measured in a real experiment. There were no red flags and I have no concerns about the genuineness of this data.
That’s right: the data were requested, provided, checked, and raised no concerns from Sheldrick, who is not known for letting positive ivermectin studies off the hook. Elsewhere, Scott seems to praise the transparency of the TOGETHER trial, a trial that, to this day, has still not released its data, despite promising to do so upon publication. And yet, in the case of Biber et al., Scott either didn’t know its data had been shared and verified, or he didn’t care to let his audience know.
Later in his article, Scott writes that:
Here this question is especially tough, because, uh, if you say anything in favor of ivermectin you will be cast out of civilization and thrown into the circle of social hell reserved for Klan members and 1/6 insurrectionists. All the health officials in the world will shout “horse dewormer!” at you and compare you to Josef Mengele. But good doctors aren’t supposed to care about such things. Your only goal is to save your patient. Nothing else matters.
Scott seems blissfully unaware that his article would form a core part of the memeplex that is doing exactly this kind of reputational assassination. The high-brow profile he has built over the years allows intellectuals to enjoy the guilty pleasure of laughing at the horse dewormer fanatics without feeling dirty, while throwing the reputations of researchers, who have dedicated their lives to medical research, under the bus.
I can’t think of a better way to end but with this video of Professor Eli Schwartz discussing the results of his study without intermediaries. He’s a real human being, not a caricature as he’s been painted, and the video is definitely worth a watch if you want to understand what the study did and what it found.
As mentioned, this is just one of several grave misrepresentations of the literature on ivermectin that can be found in Scott’s article. This article is part of a series discussing them.
The next paper I’ll focus on is Elalfy et al., and Scott’s treatment of it. If you want to flex your “do your own research” muscle, feel free to do your own analysis ahead of time, and see how it compares with mine when I release the next installment.
P.S. I just realized that Don Ryan left a comment in my original article responding to Scott Alexander that basically reflected the material in this article, almost to a tee. While it is very disappointing that I did not see the comment earlier, the similarity of our analysis provides me with a very helpful sanity check, as we essentially came to the same conclusions, without prior communication.
Thank you, Don, for sharing the information back in December, and thank you for letting me know about it again now. It would have been a shame for me to never have seen your original comment.
This is a public peer review of Scott Alexander’s essay on ivermectin, of which this is the fourth part. You can find an index containing all the articles in this series here.
I thought Ivermectin would have been vindicated, or at least given a fair chance by now. With people suffering rebound infections and Merck's drug having some concerns, that the floor would open up to other treatments. I guess in for the penny in for the pound. There is no relenting now.
From the get-go, the thing that irked me most about Scotts IVM piece was his seeming infatuation with GidMK. Leaving opinions about IVM aside, it had seemed to abundantly clear to me on so many levels that this was no dispassionate observer with an interest in striving for objective truth; and I had to wonder what goggles Scott was viewing him though, to be copy-pasting his opinions around with such abandon. Im still not sure.
Scotts factual errors are for him to clear up of course; but it does seem to me that outsourcing his thinking to this particular twitter user was one of the prime causes making that article into what it was.