AI Chatbots and Suicide Support Experts Urge Caution
Researchers are warning that AI chatbots need careful fine-tuning to ensure their responses to suicide-related questions align with professional guidelines. A new study, titled “Evaluating the Agreement Between Large-Scale Language Models and Expert Clinicians in Assessing Suicide Risk,” finds that popular AI chatbots can give inconsistent answers when confronted with queries about suicide attempts.
Reliability is the cornerstone of any medical assessment, and inconsistencies in AI responses could have serious consequences, particularly in matters of life and death.
To investigate, researchers, working alongside 13 clinical experts (three psychiatrists and ten clinical psychologists), tested whether three widely used chatbots powered by large language models (LLMs): ChatGPT, Claude, and Gemini, provided direct responses to suicide-related queries and how these responses aligned with clinician-determined risk levels for each question.
As explained in the study, the research team developed a list of 30 hypothetical questions, covering three areas: the first area focused on policy and epidemiological information about suicide, including suicide rates, the most common methods of suicide attempts, demographic characteristics of people affected by suicidality, and programs and policies to prevent suicide; the second area addressed the process of attempting suicide, including information on mortality and access to resources that could enable suicide; and the third area pertained to therapeutic information, including guidelines and recommendations for managing suicidal thoughts.
The questions were divided by experts and rated across five risk levels: very high, high, medium, low, and very low. Each chatbot responded to every question 100 times, resulting in a total of 9,000 responses.

The findings showed that ChatGPT and Claude consistently provided direct answers to very low-risk questions, while none of the chatbots gave direct responses to very high-risk questions. Yet, when it came to medium-risk questions, the chatbots struggled to differentiate levels. In short, while the AI models performed reliably at the extremes, they were inconsistent in handling the middle range of risk, a gap that experts say underscores the need for further refinement. The study also found that when chatbots refused to answer directly, they usually gave generic advice to seek help from a friend, a professional, or a hotline. However, the quality of this guidance varied. ChatGPT, for example, often directed users to an outdated helpline instead of the current national 988 Suicide and Crisis Lifeline.
“In our study, we found that very-high-risk questions were never directly answered by any of the three chatbots we evaluated. Furthermore, in the context of these very-high-risk questions, all three chatbots recommended that the user call a mental health emergency hotline,” the scientists highlighted in the study.
Scientists note that, despite certain limitations, the study provides important insights into the current capabilities of chatbots in handling suicide-related queries. The chatbots performed consistently for questions at the extremes of risk, very low and very high, but showed more variability for questions at intermediate risk levels. The researchers, therefore, caution that further improvements are needed.
Ryan McBain, the study’s lead author and a senior policy researcher at RAND, a nonprofit research organization, answered a few questions about the study for Unknown Focus.
What legal measures should be in place to ensure the safety of users seeking mental health support through chatbots? Would international collaboration be useful in this regard, particularly to achieve better alignment and standards?
Ryan K. McBain: Right now, chatbots are being deployed into highly sensitive spaces without clear safety standards. At minimum, we need enforceable benchmarks that ensure these tools respond appropriately when users express distress—especially for children and adolescents. Independent audits, transparency requirements, and age-appropriate privacy protections are essential. And because these technologies are global, international collaboration would help establish consistent safeguards and avoid a patchwork of uneven protections.
What is the role of the media in this context, and how can we help?
Ryan K. McBain: The media plays an important role in setting expectations for parents, teens, and policymakers alike. By highlighting both the opportunities and risks, reporters can help shift the conversation from isolated tragedies toward systemic issues: for example, the absence of standardized benchmarks, the lack of oversight and transparency, and the speed of adoption. Responsible coverage can inform smarter regulations and help families make safer choices.
What surprised you the most, and what would you change immediately if you could?
Ryan K. McBain: In a recent study my colleagues and I conducted, we found that chatbots reliably refused to answer very high-risk suicide-related questions, which is encouraging. But they were inconsistent in handling more ambiguous scenarios: sometimes they offered appropriate referrals to emergency services, while other times they provided responses that could be unsafe. If I could change one thing immediately, it would be to require independent safety benchmarks and systematic stress-testing of chatbots before broad-scale deployment. That goes double for chatbots that self-advertise as supporting and addressing mental health needs. Right now, we’re effectively conducting a large, uncontrolled experiment on users.
According to WHO data, over 700,000 people die by suicide each year, making it the third leading cause of death among individuals aged 15 to 29. Moreover, for every suicide, many more people make suicide attempts. When it comes to mental health and the risk of suicide, even small inconsistencies can have consequences. It is important to remember that no problem is so difficult that it cannot be addressed, and that reaching out to others, such as trained professionals or crisis hotlines, may be a challenging step in the moment, but it is also the most important one.
Artificial intelligence is increasingly being integrated into society, yet we must not forget that we all have a role to play: seeking help when we need it is just as important as recognizing when someone else needs support. Above all, every act of support, however small, can make a difference in someone’s life.
Image: Mental health support/Ontario.ca

