In a “study” that arrived to much media fanfare last week in the journal JAMA Psychiatry, researchers affiliated with Harvard University and Massachusetts General Hospital purported to offer convincing proof that “conversion therapy” predicts longstanding toxic outcomes among Americans who self-identify as transgender, including greater recent suicidality and more severe psychological distress in the past month. Its results, the authors state, “support the policy positions” of such medical professional organizations as the American Medical Association and American Academy of Pediatrics.

I am agnostic on the topic of “conversion,” though I suspect the subject is more diverse and complicated than political soundbites let on. But I’m not agnostic about the new JAMA Psychiatry study. There are at least four good reasons for being leery of the results appearing therein.

First, the study fails to define or better distinguish what it means by GICE—that is, gender identity conversion efforts—its key variable and a term the authors appear to have invented. It comes from a solitary question that respondents were asked:

“Did any professional (such as a psychologist, counselor, or religious advisor) try to make you identify only with your sex assigned at birth (in other words, try to stop you being trans)?”

Start your day with Public Discourse

Sign up and get our daily essays sent straight to your inbox.

That’s what the survey asked. Given the hundreds of questions and items the United States Transgender Survey, or USTS, posed to its respondents, that it lumps any scenario that does not involve unqualified affirmation (including “watchful waiting” for minors) into one imprecise, binary measure is, I hold, psychometrically irresponsible.

Psychiatrist and longtime gender identity expert Stephen Levine highlights the quandary facing professionals attempting to counsel transgender patients on the biological, social, and psychological risks posed by any treatment approach. Such risks are real and ought to be discussed. This is what ethical informed consent does. But in the USTS survey lingo, an ethical discussion of risk could be interpreted by the patient as “trying to stop you being trans.” In other words, obtaining informed consent may constitute GICE. It need not even stretch the imagination. Levine sees it. He notes that while the World Professional Association for Transgender Health endorses informed consent, this principle remains at odds with its recommendation of providing hormones on demand.

But the authors of the JAMA Psychiatry study, following the USTS’s survey measurement, aren’t interested in subtleties. The authors paint an entire class of cautious therapeutic approaches as intrinsically harmful, sending a clear message to psychiatrists and psychotherapists alike. Scientifically, we learn nothing of the respondent’s motivations for interacting with the “professional” in the first place. It’s not hard to understand how reality is far more complicated than the USTS data allow here. Their question can’t distinguish between truly harmful approaches and potentially beneficial considerations.

Second, the data come from a nonrandom, opt-in survey—the USTS—that only targeted networked, self-identified transgender or nonbinary persons by advertising their survey among “active transgender, LGBTQ, and allied organizations.” There’s nothing wrong with collecting data using a nonrandom approach like this—I’ve done it myself and will do it again. The problem is when such data are delivered to the reader, as these were, in a way that suggests the conclusions would be consonant with everyone who has identified as transgender or experienced gender identity disorder or dysphoria. The survey’s “United States” label further creates the impression that the data collection effort was a population-based random sample, sort of like the US Census. It is not. And you can’t extrapolate the results of a non-random sample to the general population as a whole. (But you can hope that the media and readers will.)

When compared with a 2017 study of the demographic characteristics of transgender adults from the CDC’s Behavioral Risk Factor Surveillance System—a genuinely population-based sample—the USTS respondents appear decidedly dissimilar. How different are they?

  1. Unemployment: 15% in the USTS vs. 8% in the BRFSS
    2. Sexual orientation: 47% of male-to-female identify as LGB in the USTS vs. 15% in the BRFSS; 24% of female-to-male identify as LGB in the USTS vs. 10% in the BRFSS
    3. Currently married: 18% in the USTS vs. 50% in the BRFSS
    4. Child in the household under 18: 14% in the USTS vs. 32% in the BRFSS
    5. General health rated as fair or poor: 22% in the USTS vs. 26% in the BRFSS

To be sure, some of the questions were posed differently, but the differences here are not cosmetic. The two samples are at odds with each other. The JAMA Psychiatry authors employed “sample weights” that the USTS data creators designed “to improve generalizability by addressing sampling biases around age, educational level, and race/ethnicity,” but the notion of weighting such data makes little sense, since you cannot “generalize” an opt-in sample no matter what you do to it. The study treats the survey in the way its designers appear to desire—as if it were a population-based, representative sample of transgender Americans. But it isn’t. A simple acknowledgment of the sampling strategy and potential bias is small consolation, given the wide media coverage and attention the study has received.

Third, building on the dubious perception of representativeness, the authors report “confidence intervals” for their statistical “estimates.” Why they do so is beyond me. It’s a charade. Those terms are only truly sensible and appropriate for probability samples. A confidence interval, after all, is commonly used to indicate the probability that the “population parameter” (e.g., the true share of transgender persons reporting severe psychological distress) falls within the interval around the estimate generated by the sample you have. But if the sample you have isn’t representative of the population from which it was drawn—something the above comparison between the USTS and BRFSS strongly suggests—then it’s pointless to use these terms. Saying so suggests you have statistical rigor when you do not. Hypothesis testing, using P values, seems an odd approach under these circumstances.

It is no small irony that just one month earlier JAMA Psychiatry published an opinion piece by Helena Chmura Kraemer, Stanford University biostatistics professor and fellow of the American Statistical Association, entitled, “Is it Time to Ban the P Value?” She makes a compelling case, in light of what the Harvard researchers are attempting to accomplish:

For more than 20 years, there have been rumbles about banning the P value, because it is so often misused, miscomputed, and, even when used and computed correctly, misinterpreted. Consequently, findings that affect medical decision-making, policy, and research are often misled by the very research that is supposed to provide their evidence base.

Well put.

Fourth, the authors seem largely uninterested in putting their implied causation—that past conversion attempts affect present mood and suicidality—to the test. Instead, a subtext of injustices committed against the respondents infuses the study, suggesting a decidedly external locus of control in the lives of transgender Americans. This narrative is only interrupted once, when to their rare credit the authors admit that it “is possible that those with worse mental health or internalized transphobia may have been more likely to seek out conversion therapy rather than non-GICE therapy, suggesting that conversion efforts themselves were not causative of these poor mental health outcomes.” I think the average reader would believe this is probable, not just possible.

By way of comparison, the relative risk of attempting suicide among women currently or recently using hormonal contraceptives—a monumentally larger share of the overall population—is demonstrably higher than those who do not. And yet no medical organization is calling for stripping access to the Pill. Instead, the authors of the study reporting those findings suggest that we should work to understand better why women seek contraception when they do, as well as exploring why it may prompt the development of adverse mood reactions. That’s what curious investigators do. No similar quest appears to characterize the JAMA Psychiatry article, which never explores why a portion of the USTS respondents found themselves in an office listening to cautious therapeutic counsel—something now considered dubious if not outright banned. This lack of intellectual curiosity is unfortunate, the hallmark of an utterly politicized science whose bar for publishing studies on a topic now exploding in popularity is much too low.

When the study was released, Dr. Jack Turban—the lead author—was unabashed about its novelty. As NBC Out covered it, Turban claimed “it was the first study ‘to show that gender identity conversion efforts are associated with adverse mental health outcomes, including suicide attempts,’” and that “previous reports showing the negative effects of conversion therapy . . . have focused on efforts to change a person’s sexual orientation,” rather than gender identity. Turban then turned to its political and legal value:

“This is important because some experts continue to advocate for gender identity conversion efforts for young children,” Turban said in a statement. “We hope our findings contribute to ongoing legislative efforts to ban gender identity conversion efforts.”

Rarely have researchers been so explicit about the political aims of their research. If this study is really “the first study” to show “adverse mental health outcomes” related to “conversion therapy,” how can it be sufficient—even if it were high-quality—to justify government bans? And how could researchers have supported such bans prior to any study at all? Simple: it was never about science.

I do not wish to make light of the suffering of self-identified transgender persons. It is, from all discernible sources, significant. Nor am I claiming that various forms of therapy are helpful or unhelpful, ethical or unethical. My conclusion is more modest. Weak data are being used to make empirical—and then clinical and legal—truth claims while subsidized by nascent political will. Discerning a generalizable answer to ethical questions about “conversion” therapy from the JAMA Psychiatry study is simply not possible, despite what its authors confidently assert and imply. Theirs is what Professor Kraemer would probably call a “hypothesis-generating” study, not a “hypothesis-testing” one. But you wouldn’t know it from reading it.