A new study appearing last month in the American Journal of Psychiatry concluded that “gender-affirming” surgery is associated with reduced demand for subsequent mental health treatment in a sample of persons diagnosed with “gender incongruence.” Predictably, such news received wide media interest and coverage. And yet even a cursory reading of the study itself tells a far less optimistic story than the media narratives—as well as the authors’ own inexplicable confidence—have offered. Indeed, the analyses would seem to suggest the benefit of a hormonal or surgical course does not outweigh the demonstrable physical and financial costs of such treatments. Future studies might suggest otherwise, but not this one.
The study’s shortcomings have nothing to do with the data, nor the methods employed by its authors (and public health researchers) Richard Bränström and John Pachankis. The data come from the Swedish Total Population Register, a massive, longitudinal survey effort that collected information from over 9.7 million Swedes, or about 95 percent of the country. No complaints there. The analyses are high-quality: the authors tracked respondents over time and assessed their use of mental health treatment (for a mood or anxiety disorder) in 2015, as well as other related measures (such as hospitalization after a suicide attempt), as a function of time since gender-affirming hormone and surgical treatment. Its measurement precision is excellent, and would satisfy most methodological purists. So far, all good news. But then come the scholarly interpretations of the study’s results, which are remarkably out of step with the far more modest conclusions they merit.
First, a word about the hormones: the study found no mental health benefits for hormonal interventions in this population. There is no effect of time since initiating hormone treatment on the likelihood of subsequently receiving mental health treatment. Given the surge in interest, demand, and supply of hormonal therapies to self-identified transgender persons today, you would think that it is a solution that pays obvious benefits in reduced subsequent need—over time—for treatment of mood or anxiety disorder, or hospitalization after a suicide attempt. Yet there was no statistically significant effect. In fact, the confidence intervals actually reveal a nearly significant aggravating effect of hormonal treatment on subsequent mental health needs.
It is the surgical effect, however, that has grabbed all the attention. Bränström and Pachankis detected a statistically significant effect of time since last “gender-affirming” surgery on reduced mental health treatment. The adjusted (for controls) odds ratio for this was 0.92, meaning that, among respondents diagnosed with “gender incongruence” who then received gender-affirming surgical treatment, the odds of being treated for a mood or anxiety disorder (in 2015) were reduced by about 8 percent for each year since the last surgery. In other words, it would appear that the surgery—or more typically, the series of surgeries—benefited their mental health.
But the authors discuss a “linear decrease” in seeking subsequent mental health care that is simply not visible in the study’s graphs, where post-surgical mental health treatment hovers stably around 35 percent among those in their first nine years after surgery, and then drops to only 21 percent of those patients who are in their tenth (or higher) year since their last surgery. However, only 19 total respondents reported their last surgery as having been completed 10 or more years ago. By contrast, 574 (out of 1,018 total) reported their last surgery as having been conducted less than two years ago. (Surgical treatment is clearly surging.) This means that the apparently helpful overall effect of surgery is driven by this comparatively steep drop in mood/anxiety treatment among only 19 patients. By the math, that would seem to indicate that four out of these 19 Swedes (i.e., 21 percent) sought help in 2015 for mood/anxiety problems.
While the study reports the adjusted odds ratio of the overall effect of time since surgery (0.92), which I cannot replicate without having data access, you don’t need the data to calculate an unadjusted odds ratio from the information presented there. This can tell us the baseline effect of time since surgery on receiving mood and anxiety treatment, only without the controls (like age, income, etc.). Doing this reveals the fragility of the study’s key finding: if a mere three additional cases among these 19 had sought mental health treatment in 2015, there would appear to be no discernible overall effect of surgery on subsequent mental health. The study’s trumpeted conclusion may hinge on as few as three people in a data collection effort reaching 9.7 million Swedes, 2,679 of whom were diagnosed with gender incongruence and just over 1,000 of whom had gender-affirming surgery.
An increase of just three treated individuals (from 4 to 7 of 19) brings the overall effect to zero. On the other hand, a decrease of just three treated individuals (from 4 to 1 out of 19) yields an (unadjusted) odds ratio of 0.88, which would enable a claim of a 12-percent reduction in mental health assistance from getting the surgery. These large swings in estimates are due to very small adjustments in the data. As is often the case with small samples, tiny changes lead to large fluctuations in estimated effects. But, for this patient population, you are not going to find larger data collection projects than the Swedish data. This is as good as it gets when it comes to studying transgender medical experiences and outcomes.
Another helpful statistic I calculated is called the NNT, or “Number Needed to Treat.” It’s a measure of clinical impact. In this study, the NNT appears to be a staggering 49, meaning the beneficial effect of surgery is so small that a clinic may have to perform 49 gender-affirming surgeries before they could expect to prevent one additional person from seeking subsequent mental health assistance 1. If no other treatment was available, or the treatment was not invasive and the hazards were insignificant, clinics might consider surgery a low-risk but low-payoff approach. But none of those applies here. Conducting 49 surgeries to secure one additional patient who benefits? Unheard of.
The authors are nevertheless quick to declare that “this study provides timely support for policies that ensure coverage of gender-affirming treatments.” I cannot see how such confidence is merited. Time since hormonal treatment yielded no discernible effect on subsequent use of mental health treatment, while the modest effect of surgery hinges on a handful of cases from an earlier era (10 or more years ago) when very few gender dysphoric patients pursued surgery at all. And it’s not a leap to wonder whether those who did so a decade ago are a different kind of group than those who do so today. Moreover, suicide—the threat that seems to prompt all the urgency in doing something radical to alleviate psychological distress in these patients—may well have claimed an unknown number of Swedes who had had gender-affirming surgery ten or more years ago. We just don’t know, because the study does not track completed suicides for this sample.
If this were a clinical trial seeking to establish the efficacy of a particularly invasive medical treatment in comparison with a non-invasive standard protocol, there is no way that these published results would favor the invasive treatment—in this case, “gender affirming” surgery—when the statistical difference in outcomes was so tiny and fragile. This is not, contrary to what Bränström told ABC News, an evidence-informed treatment. That the authors corrupted otherwise excellent data and analyses with a skewed interpretation signals an abandonment of scientific rigor and reason in favor of complicity with activist groups seeking to normalize infertility-inducing and permanently disfiguring surgeries.
Physicians should not be pushed to prescribe such profoundly consequential treatment by threat of call-out, malpractice suits, patient demand, or—in this case—the overreaching interpretations of quality data. Clinicians are being bullied into writing a radical prescription based on fear, not on sensible conclusions from empirical data.
But this reasonable position is getting more difficult to defend. Less than two months after another team of activist psychiatrists landed a weak study on “conversion” therapy in the journal JAMA Psychiatry, its lead author has commenced a movement aimed at a wholesale ban on such a notion. This is alarming, especially since the idea itself—“converting” from having become convinced you are born in the wrong body to concluding that you can live with the body you have—is nonsensical. There is no defined psychotherapeutic method for treating gender dysphoria that can be widely characterized and consistently identified as “conversion therapy” in order to be banned. Nor has there been a clinical trial evaluating specific psychotherapeutic methods of counseling gender dysphoria that could potentially demonstrate whether such methods are helpful or harmful.
This is not how normal medical research works.
- In this case, the NNT = -1 * [1/[(OR-1)*UER] + OR/[(OR-1)*(1-UER)]]. Using an adjusted odds ratio (OR) of 0.92 and an unexposed event rate (UER) = 0.453, I get NNT = 49 cases. The UER I employ here comes from the mood/anxiety treatment rate for “perioperative” individuals with gender incongruence, as reported in the study’s Figure 1. The term “perioperative,” however, is vague and not defined by the authors. (It could mean right before or right after surgery.) It is not, however, a control group (that is, of Swedes diagnosed with gender incongruence who have not pursued surgical treatment). However, even such a control group would have limitations, including selection bias (e.g., screening for co-occurring mental illness or a lower level of gender dysphoria). Indeed, a prospective randomized controlled trial is not possible with this population. Nevertheless, the number employed offers a reasonable baseline UER, given the available data. NNTs below 10 “usually denote a worthwhile difference when comparing one intervention with another,” notes Leslie Citrome in “Quantifying Clinical Relevance.” ↩