TOGETHER Trial & The Negative Number of Metformin Patients
When your numbers come from a real dataset, you're not supposed to have a negative number of patients anywhere.
As I looked at the TOGETHER Metformin data over on ivmmeta.com, something stood out to me:
In the “intention-to-treat” analysis (ITT)—which includes all patients that were part of the study—Metformin does better than placebo at reducing risk of an ER visit and also at reducing risk of hospitalization. And yet—on the primary endpoint—a composite of “extended ER observation for over 6 hours OR hospitalization,” it actually does worse.
This is very strange, even though—in principle—there are some improbable cases in which such a result could arise. I thought it was interesting enough to dig into the numbers further.
Here are the relevant sources from the paper and supplementary appendix:
Let’s bring it all together. The Per-Protocol group (P-P) is the patients who took at least 80% of the doses they should have taken. I’ve added the “Not Per-Protocol” group (NPP), which is a simple subtraction of the P-P group from the ITT group.
Risk of ER visit:
ITT: treatment 8 of 216 (3.7%), control 11 of 205 (5.4%)
P-P: treatment 7 of 171 (4.1%), control 10 of 181 (5.5%)
NPP: treatment 1 of 45 (2.2%), control 1 of 24 (4.2%)
Risk of hospitalization:
ITT: treatment 24 of 215 (11.2%), control 24 of 203 (11.8%)
P-P: treatment 8 of 168 (4.8%), control 14 of 179 (7.8%)
NPP: treatment 16 of 47 (34%), control 10 of 24 (41.7%)
Risk of ER observation >6 hours or hospitalization:
ITT: treatment 34 of 215 (15.8%), control 28 of 203 (13.8%)
mITT: treatment 30 of 211 (14.2%), control 28 of 203 (13.8%)
P-P: treatment 14 of 168 (8.3%), control 17 of 179 (9.5%)
NPP: treatment 20 of 47 (42.6%), control 11 of 24 (45.8%)
I’ve added the NPP group (not per-protocol) which is basically the ITT patients that didn’t make it to the per-protocol analysis.
How Many Participated and How Many Adhered?
The first problem we have is that the “risk of ER visit” seems to be talking about two more treatment patients and one more placebo patient than the paper in the ITT analysis, and three more treatment patients and one more placebo patient than the paper in the per-protocol analysis. This really isn’t supposed to happen.
If we look at the enrollment numbers as seen in the August 6th presentation to the NIH Grand Rounds event, we see a different number, still:
For one, we have two odd metformin patients (circled in red) that seem to have been enrolled after the metformin arm was ended. We have to ignore these, but it is curious, and I don’t have a good explanation for why they show up there. If anyone has thoughts, please leave a comment.
Also, if we count the height of the dark blue bars until April 3, 2021—the end date of the trial—we come up with 217 patients. This is close to the 216 and 215 numbers we have in the paper, so it may be that some patient was disqualified for some reason. Though, in that case, it should have been noted in figure 1:
Still, looking here doesn’t really help us resolve the conflict in the paper. If anything, it complicates things further. It’s hard to know how to continue processing this paper given this incongruity, so we’ll have to note it and set it aside, assuming for the rest of the analysis that the differing totals are essentially a very bad typo.
Understanding the Composite Endpoint
Now, you may have noticed that we have a number of different terms in play here, so let’s clear them up as much as possible. First, the appendix gives us data for the “ER visit” endpoint, which I take to mean any ER visit. We also have data for the “hospitalization” endpoint, which should be self-explanatory. Finally, we have the “ER observation for >6h” endpoint. This must be a subset of ER visits where the patient was retained for more than 6 hours in an emergency setting. So when the composite endpoint is described as “ER observation for more than 6 hours or hospitalization,” that counts all the patients that were observed in ER for more than 6 hours, were hospitalized, or both. Here’s a visual representation of how all these terms interact:
The Mysterious Negative Patients
I wanted to know the smallest possible number of patients that were observed in ER for more than 6 hours without being hospitalized—which is the orange area in the venn diagram above. To find that number, I subtracted the number of patients that were hospitalized from the patients that were observed in the ER for more than 6 hours, or hospitalized:
Extended ER observation without hospitalization:
ITT: treatment 10 of 215 (4.7%), control 4 of 203 (2%)
P-P: treatment 6 of 168 (3.6%), control 3 of 179 (1.7%)
NPP: treatment 4 of 47 (8.5%), control 1 of 24 (4.2%)
Things here start to look odd: it’s twice as likely to be observed in ER for an extended period without being hospitalized if you’re on treatment than if you’re on placebo. This might be a fluke, or an artifact of what appears to be an aggressive way of dosing metformin. Perhaps treatment patients may be visiting the ER more frequently to get their adverse events looked at, and once the doctors are assured it’s not a case of COVID-19, being sent home.
However, that’s not all. On the basis of that table, I wanted to know the minimum number of patients who must have visited the ER and either had a short stay or ended up hospitalized. I calculated that by subtracting those who had extended ER observation—without being hospitalized—from those who visited the ER. In terms of our venn diagram, it represents the area highlighted here:
This is where things got really odd:
Risk of short ER visit OR ER visit with hospitalization:
ITT: treatment -2 of 215 (-1%), control 7 of 203 (3.4%)
P-P: treatment 1 of 168 (0.6%), control 7 of 179 (3.9%)
NPP: treatment -3 of 47 (-6.4%), control 0 of 24 (0%)
For one, in the per-protocol analysis, we have 6.5 times as many events in the control group, when compared to the treatment group.
Then, there are the negative numbers: -2 treatment patients in ITT and -3 in NPP. Obviously, there’s no such thing as negative patients. The numbers may be as low as zero, but no lower. Something is very very wrong with the numbers TOGETHER provided for metformin. Keep in mind that these numbers include a number of extremely charitable assumptions:
We ignore the extra three patients in the ITT analysis, and extra five in the PP analysis. If there were indeed extra patients that were removed for some reason, we must assume that the events were not related to the extra patients.
We assume that extended ER observations are maximally distinct from hospitalizations.
If any of these assumptions do not hold, the negative numbers in the final results get more extreme, and it’s highly unrealistic that both assumptions would hold fully. So the negative numbers we see are the least negative possible values for that variable. Since these numbers could not have been produced by a real dataset, something has gone very, very wrong.
Conclusion
In the previous post, we found that metformin was stopped early for futility, even though it most likely had not cleared the futility threshold. Here we see that something is very wrong with the published numbers in other ways also:
The number of patients recruited for this trial is inconsistent across sources. If additional patients were recruited and later disqualified, that should have been noted.
The number of patients who had an ER visit AND were hospitalized appears to be… negative. While the number could have been zero, it cannot be lower than that. As such, we know there is clearly an issue with the numbers that were published.
This requires explanation, and, as usual, the raw patient-level data would go a long way to explain what we’re looking at and which of these numbers is wrong. Depending on the error, the implications get increasingly serious. However, what we do know, is what was published was not sufficient for us to make sense of what happened in the trial.
Naturally, since this is a platform trial, what happened here may affect our understanding of the remaining trial. But until these numbers are corrected, we cannot know in what way.
Hands down you do some of the most amazing statistical analysis and forensic detective work on these “medical studies” I have ever seen. I applaud your outstanding break down of the problems with the Together trial. You work is fantastic. Frankly it blows me away. I am so impressed I would love if you could find the time to do the same analysis of sone of the “safety studies” for vaccines on the pediatric vaccine schedule. I read that the Hepatits B vaccine was only studied for 4 days in neonates before it was declared safe. I fear most of not all of the “required vaccines” have worrisome safety red flags. Can you help me find them?
"While the number could have been zero, it cannot be lower than that. As such, we know there is clearly an issue with the numbers that were published."
Math is my nemesis but Beltway Bandit roots skilled me in manipulation tactics. That's been far less useful in science than financials; complex, unfamiliar subjects and alphabet soup of measures are intimidating and confusing. I have assumed my inability to make sense of calculations is a personal problem not actual errors that slid by mountains of data munchers who live and breathe studies. Myth busting has a magic of its own!
Thanks so much for the Sherlock Holmes travels through the study realms and the valuable lesson that many studies are like legislation, written by industry sponsors with juicy details in footnotes and retractions then approved by media talking points despite almost nobody involved actually reading what the sh*t says. Valuable lesson thank you.. don't discount my math by default. :~)