The Potemkin Argument, Part 11: TOGETHER vs. Carvallo, A Tale of Two Studies
The second of the two studies Scott highlights as “bigger and more professional” that “don’t show good results for ivermectin” is the TOGETHER trial. Now, if you’ve been reading this Substack for a while, you know I’ve been doing a lot of research on that trial, unearthing a whole host of critical issues. Cato Institute has also just published a good summary of the findings.
Let’s work through Scott’s take:
TOGETHER Trial, Ivermectin
TOGETHER Trial: Speaking of big RCTs…
This one hasn’t been published yet. There’s a video of a talk about it, but I am not going to watch it, because it is a video, so I am getting information secondhand from eg here.
Scott’s information on the TOGETHER trial is coming via Wired? Good god, things are starting to make sense.
Apparently, it compares 677 people (!) randomized to ivermectin to 678 people randomized to placebo. 86 ivermectin patients ended up in the hospital compared to 95 placebo patients, p-value not significant.
This was a really big professional trial done by bigshot researchers from a major Canadian university, and the medical establishment is taking it much more seriously than any of these others.
I’ve made no secret of my preference for Bayesian statistics over the evidence-eviscerating exercise that is frequentism (don’t @ me). If only we could have a trial that was done by “bigshot researchers” that ”the medical establishment” is taking “much more seriously“ that gave us Bayesian probabilities instead.
Turns out, that would be the TOGETHER trial. Not a single p-value in sight. Here’s figure S6, strangely stuffed into the appendix:
Let me break this down: the researchers found that in their trial, ivermectin had 79.4% probability of superiority over placebo. When removing patients that didn’t really have a chance to be treated, that increases to 81.4%. Now, one might say, “but it doesn’t rise over the predetermined threshold of significance.” That person would be someone who doesn’t know that Bayesian statistics don’t actually have any such concept of significance. 80% chance it works means just that. In fact, the paper has no mention of the concept of statistical significance anywhere. Refreshing.
I hear what you’re saying. What if it is because it killed worms? Isn’t Brazil drowning in the stuff anyway? What if it accidentally treated the subjects’ infections? Well, it turns out that the paper published on that hypothesis classified Brazil as “low prevalence.” What if we classify Brazil as “high prevalence” instead? Does the whole paper collapse?
Grasshopper, you’re running too far ahead.
For reasons I will go over soon enough, I don’t believe the strongyloides/ivermectin hypothesis as stated has significant merit, nor that the TOGETHER trial should be taken as strong evidence of anything. But if one holds both of those beliefs—as Scott claims to—then one must admit that the TOGETHER results—such as they are—are not explained by the strongyloides hypothesis.
When it comes out, it will probably get published in a top journal. When discussing Lopez-Medina, I wrote:
When people say things like “sure, a lot of small studies show good results for ivermectin, but the bigger and more professional trials don’t”, this is one of the two big professional trials they’re talking about.
This is the other one.
I’d like to register an objection: what the medical establishment takes seriously or not should not affect our judgement. If the exercise we’re engaged in is figuring out if the establishment is actually correct, then we should make every attempt to avoid what the establishment says about the evidence affecting our thought process, and that includes what the relative status of the journals that publish the studies is.
Otherwise, the case we’re constructing is a tautology: the experts are correct because the studies that the experts take seriously tell us that the experts are correct.
After all, a heuristic that would miss the invention of the PCR test is not a heuristic that is of any use at all. From Joomi Kim’s excellent piece on Kary Mullis:
After Mullis came up with PCR, by then he knew what he had discovered. He knew the impact it would have, and that it would spread across the world like wildfire.
This time there was no doubt in my mind: Nature would publish it.
They rejected it. So did Science, the second most prestigious journal in the world.
He eventually published the results in Methods in Enzymology.
This experience taught me a thing or two, and I grew up some more.
No wise men sit up there, watching the world from the vantage point of their last twenty years of life, making sure the wisdom they have accumulated is being used.
We have to make it on the basis of our own wit.
Now, Kary Mullis—despite holding a Nobel prize—is someone who is often misunderstood and misrepresented. In the words of none other than Scott Alexander, “He is a global warming denialist, HIV/AIDS denialist, and ozone hole denialist; on the other hand, he does believe in the efficacy of astrology.” If these strawmemes happen to be circulating in your brain, do yourself a favor and read Joomi’s piece from top to bottom:
Not coincidentally, it’s also the other trial that ivmmeta.com has a warning letter underneath telling you to disregard. Their main concern is that instead of truly randomizing patients to ivermectin vs. placebo, they did a time-dependent randomization that meant during some weeks more patients were getting one or the other.
As far as I can tell, there is no such thing as a “time-dependent randomization” scheme in clinical trial literature. It appears to be a newly minted term for Scott’s essay. If a reader sees it and walks away with the impression that this is something legitimate and well-understood, well, that would be a false impression.
What ivmmeta alleges is that the trial was confounded by time. This kind of issue, specifically in platform trials, is quite serious and has also been discussed as a concern in the New England Journal of Medicine.
This is a problem because the trial takes place in Brazil, where different variants were more common at different times. Here’s their image:
There are very many reasons why this is a problem. Besides violating the randomization properties of the trial, the way TOGETHER implemented the control group also meant that this issue voided the blinding properties of the trial. To complete the picture, recruiting patients from two different periods—with different inclusion/exclusion criteria in place—also violates any notion of control. In other words, this issue is bad enough to raise very serious questions about each of the key properties of a double-blind RCT. The Gamma variant shift—which Scott highlights as the reason the randomization anomaly is a problem—is just one of the many confounders that enter the picture once treatment and placebo are out of sync.
On the one hand, I have immense contempt for ivmmeta for letting all those other awful studies pass and then pulling out all the stops to try to nitpick this one.
As discussed previously, the “awful studies” Scott refers to are far better than he lets on, or at least the critiques that he has levied against them do not rise to the level of the issues seen here. Charitably, Scott’s “immense contempt” is based on his misunderstanding of the material, not on reality.
I have no idea if their proposed randomization failure really happened. And no doubt the reason they’re even able to investigate this is that this study is really careful and transparent - most of them don’t tell you anything about their randomization method.
The reason we’ve been able to investigate the randomization failure has nothing to do with the authors being transparent about their randomization method. In fact, I’ve dug into the randomization scheme the trial used, and—it turns out—it was changed twice during the trial, which is weird. When the changes went into effect are not always clear, and the published paper describes the last version of the randomization method—which was only really used for fewer than 20% of the patients in the trial—this makes the description essentially false. So much for “careful and transparent.”
All evidence points to the randomization “failure” as having happened, and it relates to the fact that this is a platform trial. So, it doesn’t really have anything to do with the “randomization method,” as Scott says here. In order to prove it, a group of online sleuths assembled the data from all the different publications that came out of the TOGETHER trial, as well as materials shared in presentations and elsewhere, to reconstruct a picture of what happened.
The problem ultimately relates to the time period from which the placebo patients were taken. This resulted in a catch-up enrollment required within the treatment arm. You see, in a platform trial, there is a single, shared placebo arm among many interventions. This means that the investigators have some freedom of choice on exactly which patients to allocate to the control group by choosing the start and end points. In the case of the ivermectin study, they seem to have started the placebo group such that more than 10% of its patients ended up being from an earlier period, when the lethal Gamma variant was not as prevalent.
What’s more, this forced the treatment arm to be disproportionally allocated with patients during the peak of the Gamma wave. Overloading of the health system during the same time may also have led to worse outcomes, as could a whole host of other factors. After all, THIS IS THE SORT OF THING WE DO RANDOMIZED TRIALS TO AVOID, SUPPOSEDLY. To make matters worse, the inclusion/exclusion criteria changed between the two periods, meaning that for those earlier patients, vaccination was a reason for inclusion, while for the patients recruited during the peak of the Gamma variant wave, vaccination was an exclusion criterion. I wish I were kidding.
Unfortunately, in the published paper, the authors explicitly say they did not use patients enrolled before March 23, 2021, which contradicts their own previously released materials. If this sounds confusing, I have written it up in a lot more detail here.
I would be shocked if other studies don’t have all these problems and worse.
Lo and behold, the absurdity heuristic fails again! Other ivermectin studies in the ivmmeta set Scott examines don’t have this issue. How do I know? Because they’re not platform trials. The trial investigators of TOGETHER may well be a lot more sophisticated—they are seasoned Big Pharma trial designers after all—but they took on a trial design that is easily an order of magnitude more complex, and far less mature, opening it up to a lot more opportunities to mess up. Problems such as this one are simply not possible in good ol’ two-arm RCTs.
On the other hand, the point isn’t to be fair, it’s to be right. And this is a potential confounder. Not a huge one. But a potential one.
Actually, we can’t know if it’s a huge one or not. Depending on the mortality rate in the various periods, I’ve shown that this confounder very well could have shifted four or five extra deaths onto the treatment arm, reducing the apparent mortality improvement from something like 28% down to 12.5%, which probably would have been enough to push the result below the threshold of statistical significance. In other words, this issue alone could have flipped the result of the trial. I’d call that pretty huge. Naturally, we don’t really know how big the impact was, since the investigators have not shared any data.
I guess all we can do is try to bound the damage. Even if the confounding is 100% real and bad, there’s no way to make this study consistent with the crazy super-pro-ivermectin results of studies like Espitia-Hernandez and Aref.
I really wish Scott wouldn’t make such strong claims of impossibility. The results are what one would expect given the parameters of the trial. Ivmmeta has an excellent regression analysis of all its early treatment studies, which I’ve helpfully modified:
Given that the median treatment in the TOGETHER trial was provided at around 4 days since symptom onset,* we’d expect to see something like a 50% improvement. We already discussed how the randomization failure alone could have obscured something like 15% of this benefit. If we also consider that…
…the investigators capped the dose for the most obese (and most at risk) patients in violation of the medical justification in their own protocol.
…there was heavy background use of ivermectin in the population (as acknowledged in the paper) and we have contradictory accounts on whether patients using it were excluded, and how.
…the patients were infected with the Gamma variant, the most aggressive variant we’ve seen to date. It is indicative that vaccines performed 15% worse when faced with the Gamma variant in protecting against severe disease.
The result we see from the TOGETHER trial is in the 15-20% range we would expect. A clinical trial is a fragile instrument that is easy to miscalibrate.
* funnily enough, the trial seems to be missing time since symptom onset data for about 23% of its patients. (Remember when Babalola was dragged over the coals for missing a single datapoint? How quaint.) This is not only a big deal because we can’t really figure out what the average time since symptom onset really was, but also because that same number is an inclusion criterion. If the investigators didn’t know it, how were they sure those patients could be included?
And even if we deny any confounding, we see the same slight pro-ivermectin trend - 86 hospitalizations vs. 95 - that we’ve seen in so many other studies.
Once again, studies aren’t soldiers, nor are they votes. Objections should be examined and understood, as well as their meaning to the interpretation of the results.
Nothing is going to make me believe that this isn’t in the top 33% of studies we’ve been looking at, so let’s add it as grist for the meta-analysis (though maybe not quite as much grist as its vast size indicates) and move on, angrily.
Did I mention I really wish Scott wouldn’t make such strong claims of impossibility?
Carvallo et al.
In order to put the TOGETHER trial on ivermectin in context, let’s compare it to what Scott has to say about the Carvallo trial:
Carvallo et al: This one has all the disadvantages of Espitia-Hernandez, plus it’s completely unreadable. It’s hard to figure out how many patients there were, whether it was an RCT or not, etc.
I’ll grant that the paper isn’t written well at all, though it does make clear that it was not a randomized trial.
It looks like maybe there were 42 experimentals and 14 controls, and the controls were about 10x more likely to die than the experimentals. Seems pretty bad.
That was the outpatient portion of the trial. In the inpatient portion, they report treating 135 patients with ivermectin and having zero hospitalization. Again, not a well-written paper, so I understand the frustration.
On the other hand, another Carvallo paper was retracted because of fraud:
There’s that “F” word again…
Also, that other paper isn’t actually retracted. Here, see for yourself.
apparently the hospital where the study supposedly took place said it never happened there.
So, this isn’t true, either. Scott’s source is almost certainly this Buzzfeed expose which is written as sensationally as possible. Even so, it says that Carvallo did provide Buzzfeed with the written ethics approval from the main hospital, Eurnekian, which they did not challenge.
So what is Scott talking about? Well, one of the three other hospitals, Cuenca Alta— from which a quarter of the participants were sourced—said it had “no record of participating in the study,” though a member of its ethics committee “acknowledged that staff members may have individually participated and noted that it is common in Argentina for employees to work at different hospitals.” Carvallo, speaking to a far more sympathetic journalist, said that after their initial encouraging results, “the union representing our health care workers demanded the prophylaxis be given to everyone [on staff] who wanted it.” Since the trial was approved and running at a major hospital, it’s not clear why—if the workers were participating in personal capacity through their union—additional ethics approvals were required. After all, healthcare workers—as far as the law is concerned—aren’t cattle, and can presumably make their own choices without the blessing of their employer. If there’s something amiss here, Buzzfeed hasn’t demonstrated it.
What’s more, the article continues:
A local Rotary Club donated $30,000 to purchase more PPE, Carvallo said, and the ivermectin and the carrageenan were donated by two drugmakers in Argentina.
BuzzFeed News was unable to reach one of the companies he cited, Laboratorios Panalab. But the other, Laboratorio Pablo Cassará, confirmed that it had provided both ivermectin and carrageenan to Carvallo’s study. According to company executive Roxana Pickenhayn, Carvallo had provided authorizations for his study; the main hospital where the research was done subsequently requested additional supplies. She noted that Cassará had also provided supplies for a copycat study in Argentina, which similarly found that the compounds were effective in preventing COVID.
I will note here that not only did Carvallo provide Buzzfeed with contacts to the suppliers, but the supplier Buzzfeed reached confirmed that they saw the authorization, provided the drugs, as well as resupplied the hospital directly. When it comes to the TOGETHER trial, not only do we not have similar confirmation, the investigators have never clarified the provenance of the drug they used. Also, there is some question as to the quality of the supply they used. After all, ivermectin effectiveness can vary based on the source, even if the pills claim to have the same amount of active substance.
The Buzzfeed article makes sure to highlight a statement that the trial “did not have approval from any accredited ethics committee or local health officials in that province,” according to members of the health ministry of Buenos Aires province. However, the context of the statement is unclear—and taken at face value—would appear to be false, given the ethics authorization from the Eurnekian hospital, which Buzzfeed didn’t question. The only other explanation is that there was some nuance missed in translation.
Back to Scott:
I can’t tell if this is a different version of that study, a pilot study for that study, or a different study by the same guy.
It’s a different study.
Anyway, it’s too confusing to interpret, shows implausible results, and is by a known fraudster, so I feel okay about ignoring this one.
There we go again with the “F” word. Retired endocrinologist, professor of internal medicine, and former director of one of Buenos Aires’ largest hospitals? Nah. “Known fraudster,” should do just fine. As far as I’ve seen, what we have is evidence of a badly run and written-up study. The accusations of fraud have not been demonstrated—and if anything—there is substantial confirmation that the trial did take place as described.
The biggest sin of Carvallo, to my knowlege, is that he (1) didn’t keep decent records of what happened in his prophylaxis trial, instead keeping only summary statistics and (2) when pressed to show his data, he covered up the situation with a number of false and/or misleading statements. This isn’t good, and makes me not want to trust that study for anything. In fact, ivmmeta removed all of Carvallo’s studies from its exclusion analysis, along many of the other more controversial ones. The results don’t change much.
To sum up, the story of Carvallo seems overblown, which is becoming a pattern if you’ve read this entire series. This is not because the core of the accusations does not have validity, but because in the frenzy to “take down” ivermectin—that we all witnessed in the second half of 2021—the Buzzfeed article is embellished with a number of exaggerated and/or misleading claims. I mean, check out this quote:
“It’s one of the worst studies I’ve ever come across,” Meyerowitz-Katz told BuzzFeed News. He recalled wondering, Did this even happen?
For at least one hospital, the answer is no.
As with his other sources, Scott takes the raw material, already heavily slanted, bleaches it until any trace of nuance is washed away, and delivers to us the complete caricature that remains: “Hector Carvallo, known fraudster.” The only problem is that his claim isn’t actually true.
Mirror, Mirror, on the Wall, Who’s the Fraudiest of Them All?
Just for fun, or perhaps for some perspective, let’s compare what we know about the TOGETHER trial versus what we know about the Carvallo prophylaxis trial.
Here’s what they have in common:
Hugely influential study about ivermectin from South America ✅
The timeline of the studies is unclear ✅
Exactly how many people took part is a source of confusion ✅
Mistaken sums in data tables ✅
Refusal to share data despite repeated, persistent requests ✅
Contradictory and/or false statements to peers and the press ✅
Conflicting explanations for lack of data sharing ✅
Journal standing by the paper despite obvious issues ✅
Authorship changes late in the process ✅
Changes to published paper with no clear explanation ✅
Key elements of the trial depend on trusting the researchers ✅
Ethics board approval attained prior to the commencement of this study ❌
Independent oversight ❌
And here’s where they’re different:
Claimed as randomized but wasn't... (only TOGETHER)
Serious issues with blinding… (only TOGETHER)
Unclear sourcing of ivermectin and placebo... (only TOGETHER)
Well funded... (only TOGETHER)
Awarded “clinical trial of the year”… (only TOGETHER)
Gideon Meyerowitz-Katz and crew are suspicious... (only Carvallo)
Worldwide humiliation in the press... (only Carvallo)
While this is slightly tongue-in-cheek, I am able and willing to defend each bullet point in these lists.
The point here is not to claim Carvallo did a high-quality study, nor to deny that he reacted inappropriately when challenged. The point is that the sins of Carvallo are remarkably analogous to the sins of the TOGETHER trial, and yet their coverage in the press and beyond could not be more different.
Say what you will about Dr. Hector Carvallo, at least he didn’t respond to people asking him about his trial with the kind of invective TOGETHER trial principal investigator—one of the “bigshot researchers from a major Canadian university”—Dr. Ed Mills did to a pastor from Quebec who persistently, but politely, asked him for updates on the trial he was running:
Fuck off. Pray to your stupid god for insights.
Or sign off other emails to them with:
We are grateful for your support of the trials.
The TOGETHER Trialists Collaboration
The largest placebo-controlled trial of COVID in the world.
Glory to Satan.
These emails (and more!) have been published in Pierre Kory’s Substack, but I can also vouch for them as I had been made aware of their existence through a different source before they were public.
One of the most important lessons I’ve learned in this pandemic is that while people like Ed Mills are seen as respectable and impartial scientists, often juxtaposed with someone far more volatile like Pierre Kory, at least Pierre Kory—from personal experience—is the same in private and in public. In contrast, people like Ed Mills are social chameleons, presenting a very different face depending on the context they are in.
TOGETHER Trial: One Year Anniversary
Today marks one year and one day since the publication of the early results of the TOGETHER trial in the form of a single slide. Some complained about the investigators engaging in science-by-press-release, but it didn’t matter: the press ran with it. It was almost seven months until we saw a published paper. Though promises were made that the data would be available “upon publication,” those data are still nowhere to be found.
In the Scientific Takeaway section of his essay, Scott writes:
This is going to require a social norm of always sharing data. Even better, journals should require the raw data before they publish anything, and should make it available on their website. People are going to fight hard against this, partly because it’s annoying and partly because of (imho exaggerated) patient privacy related concerns.
Here’s what Dr. Ed Mills had to say on April 3rd, 2022—to a request for the data briefly after the publication of the paper:
We are providing all data sets to ICODA. Applicants can then propose an analysis to ICODA and, if they approve it, can get access to the data. It’s not a minor issue to prepare CDISC data sets and we have been swamped with getting the FDA EUA submission on lambda such that our statisticians are occupied with that. The data sets will be available soon.
Four months later, we’re still waiting.
Somebody’s going to try make some kind of gated thing where you have to prove you have a PhD and a “legitimate cause” before you can access the data, and that person should be fought tooth and nail (some of the “data detectives” who figured out the ivermectin study didn’t have advanced degrees).
Uncanny! Here’s what the TOGETHER trial investigators wrote in their Fluvoxamine data sharing statement:
Data from the TOGETHER trial will be made available following publication of this manuscript to interested investigators through the International COVID-19 Data Alliance after accreditation and approval by the TOGETHER trial principal investigators (EJM and GR).
Not only does it exhibit precisely the kind of evasive behavior Scott described, the investigators never actually made good on that half-hearted promise to deliver data to ICODA:
After this email was circulated, the data sharing statement in the New England Journal of Medicine was updated (without the journal issuing any notice) to say that the data would be available upon publication at a platform called VIVLI:
Naturally, one should not be allowed to retroactively state that the data will be available immediately after publication at a platform that was only mentioned for the first time more than three months after the fact. I mean, they should not be allowed to say that the data will be available and not make it so, either, but the NEJM disagrees, I suppose.
Regardless, there is no indication that VIVLI has the data either, so the wait continues.
I want a world where “I did a study, but I can’t show you the data” should be taken as seriously as “I determined P = NP, but I can’t show you the proof.”
And yet, a year after the results of TOGETHER were published, and despite several commitments of the data being released being broken, the trial is being taken extremely seriously by most people, including Scott. Seriously enough to outweigh all the other studies.
As for the “data detectives” and “fraud hunters”… they’ve said nothing about the conspicuous lack of data sharing whatsoever. In fact, they’re collaborating with Dr. Ed Mills on new work. Almost like they’re not actually hunting for fraud, or detecting for data, or whatever it is they claim to be doing.
I feel the biggest hurdle in conveying the issue with TOGETHER, is that many people fundamentally refuse to believe that big-name researchers would do such sloppy work or that a top-tier journal would publish a study this bad. Scott literally endorsed it sight-unseen. Even though he knows and understands all the different ways in which things could go wrong, he refuses to believe that they did, in fact, go wrong. The implications, after all, would be earth-shattering. Absurd!
It is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgment of trusted physicians or authoritative medical guidelines. I take no pleasure in this conclusion, which I reached slowly and reluctantly over my two decades as an editor of The New England Journal of Medicine.
That was written in 2009. I don’t think things have gotten better in the last 13 years.
To be notified when new articles are released, you can subscribe below: