“Why do researchers get away with sloppy science? In part because, far too often, no one is watching and no one is there to stop them.” –“The Irreproducibility Crisis of Modern Science,” National Association of Scholars.

 

A major correction has been issued by the American Journal of Psychiatry. The authors and editors of an October 2019 study, titled “Reduction in mental health treatment utilization among transgender individuals after gender-affirming surgeries: a total population study,” have retracted its primary conclusion. Letters to the editor by twelve authors, including ourselves, led to a reanalysis of the data and a corrected conclusion stating that in fact the data showed no improvement after surgical treatment. The following is the background to our published letter and a summary of points of the critical analysis of the study.

A Crisis of Irreproducibility in Psychology and Medicine

Start your day with Public Discourse

Sign up and get our daily essays sent straight to your inbox.

It has been an open secret for some time that there is a crisis of irreproducibility of scientific studies in medicine and other fields. No less a figure than the Director of the NIH, Dr. Francis Collins, wrote that, “the checks and balances that once ensured scientific fidelity have been hobbled. This has compromised the ability of today’s researchers to reproduce others’ findings.” For example, the National Association of Scholars reports, “In 2012 the biotechnology firm Amgen tried to reproduce 53 ‘landmark’ studies in hematology and oncology, but could only replicate 6 (11%).” In 2015 an article was published in Science in which there was an attempt to replicate 100 studies from three well-known psychology journals in 2008. In the original studies, nearly all had produced statistically significant results, whereas in the study replications, only a little over a third produced similar significant results.

Perhaps nowhere in medicine and psychology is this problem of irreproducibility worse than in studies of people who claim to have a mismatch between their sex and their internal sense of being male or female.

Perhaps nowhere in medicine and psychology is this problem of irreproducibility worse than in studies of people who claim to have a mismatch between their sex and their internal sense of being male or female.

 

When we first analyzed the study last October, it was obvious that it had major shortcomings. Dr. Van Mol led our team—which includes endocrinologist Michael Laidlaw, child and adolescent psychiatrist Miriam Grossman, and Johns Hopkins professor of psychiatry Paul McHugh—to summarize our findings into a compact, 500-word letter to the editor. We were not the only clinicians to question the study’s legitimacy. A total of seven letters, all critical of the study, were published on August 1, including our own. The editors included a response from the original authors, and they explained why it took ten months to publish the letters.

Let’s look at the study and the shortfalls we found. The Swedish Total Population Register of 9.7 million people and national patient databases were used to assess the effectiveness of “gender-affirming hormone treatment” and “gender-affirming surgery” in affecting three endpoints: prescriptions for antidepressants and anti-anxiety medications, healthcare visits for mood or anxiety disorders, and post-suicide attempt hospitalizations. The study authors, Bränström and Pachankis, concluded that gender-affirming hormones offered no effect but that surgery did reduce mental health treatment. They further asserted the finding “provides timely support for policies that ensure coverage of gender-affirming treatments.”

The authors used an odd combination of retrospective data collected over an eleven-year period from 2005 to 2015, together with limited psychiatric outcomes over a “prospective” one-year period during 2015 and no control group. Qualifying criteria were, to be alive in Sweden as of December 31, 2014, and to have a diagnosis of gender incongruence. The first graphic in the study specified “time since last gender affirming surgery” and traced back ten years. That chart could easily be misinterpreted as a prospective ten-year follow-up.

Where the Study Falls Short

One problem leading to irreproducibility is loss to follow-up. This refers to patients who participated in a study but at some point are considered “lost”: they are either unwilling or unable to communicate, missing, or dead. Loss to follow-up is frequently seen in studies that validate the benefits of transition, and it was strongly implied in the Bränström study by several metrics. First, the authors reported that 2,679 Swedes were diagnosed with “gender incongruence.” Though seemingly large, the numbers are a full order of magnitude below what DSM-V prevalence statistics would project. Where did the remainder go?

Overlooked were key data of completed suicides; healthcare visits, prescriptions, and hospitalizations for the litany of other medical or psychological diagnoses potentially related to gender-affirming treatments. Such information was available through Sweden’s multiple registry databases, so why not use it?

 

A paucity of gender-affirming surgeries also suggested loss to follow-up. Table 3 of their study showed that only 38 percent of people diagnosed with gender incongruence had any type of affirmative surgery, and only 53 percent of those—about 20 percent of the total—had surgery of the reproductive organs. Gender affirming surgery is free in Sweden, so where are these patients? And for those whose last surgery was ten or more years earlier, how many completed suicide, died of other related causes, or emigrated from Sweden prior to the study timeline?

In terms of follow-up care, the authors only measured three outcomes as listed above. Overlooked were key data of completed suicides, healthcare visits, prescriptions, and hospitalizations for the litany of other medical or psychological diagnoses potentially related to gender-affirming treatments. Such information was available through Sweden’s multiple registry databases, so why not use it? These omissions suggested cherry-picking data in order to obtain the desired results.

We concluded our letter by comparing this study to the one we consider perhaps the best of its kind, also from Sweden, the 2011 Dhejne study. The Dhejne team made extensive use of numerous, specified Swedish registries and examined data from 324 patients in Sweden over thirty years who underwent sex reassignment. They used population controls matched by birth year, birth sex, and reassigned sex. When followed out beyond ten years, the sex-reassigned group had nineteen times the rate of completed suicides and nearly three times the rate of all-cause mortality and inpatient psychiatric care, compared to the general population. These important findings could have easily been updated by Bränström and Pachankis to the more current time frame.

When followed out beyond ten years, the sex-reassigned group had nineteen times the rate of completed suicides and nearly three times the rate of all-cause mortality and inpatient psychiatric care, compared to the general population.

 

Which brings us back to the August AJP and why seven critical letters took ten months to see print. Along with the letters, the AJP editors published a correction that explained their need “to seek statistical consultations.” These consultants “concurred with many of the points raised.” The study’s authors were asked to reanalyze their data, and the results demonstrated “no advantage to surgery” for their three endpoints in the subject population. The authors noted in their response letter that their “conclusion” “was too strong.”

Unresolved Problems

The AJP correction is significant, but the study still suffers from numerous problems. This has been a win for patients insofar as sex-reassignment surgery has been demoted from improving mental health to having no effect. The reanalysis on the other hand showed an increase in treatment for anxiety after surgery. Why was there not also an expected increase in post-surgical depression, as Drs. Malone and Roman argued in their letter to the editor? Increased post-surgical anxiety without an accompanying increased depression rate is a highly unusual finding. Were these subjects also lost to follow-up?

With respect to cross-sex hormones, it has been shown that 23 percent of patients on high-dose anabolic steroids like testosterone, which is prescribed to every female-to-male patient, meet criteria for a major mood syndrome, and 3 to 12 percent have developed psychotic symptoms. Why is this not reflected in the study or the reanalysis?

There remain major deficits in knowledge that the authors easily could have filled by examining the Swedish databases. One of the strengths of the 2011 Dhejne study is that an increase in mortality is clearly seen at around 10 years. The current study fails to look at available data over a similar time course to assess if mortality has been affected. Similarly, completed suicide information is missing from Bränström. How can one understand suicidality in relation to hormones and surgery by only looking at suicide attempts and not deaths? Likewise, if one wants to understand the full range of psychiatric disorders in this population by examining medication data, then the use of all appropriate pharmaceuticals should be included, not only anti-anxiety and anti-depressant agents. However, simply tabulating prescriptions for psychiatric medications provides a limited and inadequate measure of the degree of emotional distress in any population. Many distressed individuals decline to seek professional help or will refuse pharmaceuticals if they do. The effects of these gaps in knowledge are much like holes cut out of a portrait; the full picture is lost and distorted when the key facial features are removed.

Our co-author Dr. Paul McHugh ended sex reassignment surgeries at John Hopkins Medical School when a study from his department revealed that the mental and social health of patients undergoing sex reassignment surgery did not improve. He adds here that this paper, and even the correction, misdirects clinical thought in many ways. Most crucially it presumes an unproblematic future for these subjects, despite evidence that the psychological state of many will, after surgery, worsen with time. Our experience at Hopkins, when we first recognized that the psychological well-being of patients undergoing surgery did not improve, rested on relatively short-term assessments. The long-term Swedish study of Dhejne demonstrated that the serious fallouts including suicide emerged only after ten years. None of this clinical experience is reflected in this paper or its correction.

Nearly all of the detransitioners I spoke with are plagued with regret. . . . They possess a startlingly masculine voice that will not lift. . . . They live with slashes across their chests.

 

Now how will the thirteen-year-old girls who have had breast amputations and testosterone fare? Abigail Shrier writes in her excellent exposé Irreversible Damage that, “Nearly all of the detransitioners I spoke with are plagued with regret. . . . They possess a startlingly masculine voice that will not lift. . . . They live with slashes across their chests . . . and flaps of skin that don’t quite resemble nipples.”

How about children who are ultimately sterilized by puberty blockers followed by cross-sex hormones and even gonad removal? These unethical surgeries are receiving funding by the very NIH that claims to be working to correct problems of irreproducibility. These experiments are beyond reproducibility problems: they are ethical failures by which doctors cause long-term harm to children and adolescents, all based on political activism supported by faulty science.

The Bränström study reanalysis demonstrated that neither “gender-affirming hormone treatment” nor “gender-affirming surgery” reduced the need of transgender-identifying people for mental health services. We appreciate the editors, the study authors, and other letter writers for carefully scrutinizing the study and publishing these findings. However, our team believes that many of the pro-transition studies we have read fare no better. Fad medicine is bad medicine, and gender-anxious people deserve better.