Friday, June 19, 2026

When AI sounds right, but isn't: how to better train your AI for clinical care

The most dangerous thing about health advice from AI isn’t that it can be wrong. It’s that it’s almost always delivered like it’s right. It is confident, step-by-step, empathetic, tuned to how you like to read. Without the proper training, it can be impossible for someone seeking advice from an AI tool to recognize when and where the AI reasoning has gone sideways.

I've seen this in my everyday life as I try out new apps that use AI. When I review another AI chatbot’s answer to a health question, gaps jump out to me, as a physician, immediately. But, ask me to use an AI design for a woodworking DIY project and that instinct disappears — I cut, drill, mortise and assemble until the project is done. And upon completion I find the furniture piece is too big for the space or the door swings the wrong way. Whether the problem is my AI prompt or the woodworking knowledge trained into the underlying model, the result is the same - a costly, failed project.

The same confidence that has misled me in furniture-making is what patients could experience, when the cost of being wrong is far greater.

As a trained emergency medicine physician, what keeps me up at night is the patient who acted on plausible-sounding AI generated advice that was incomplete or wrong and never had a reason to doubt it. Or, it’s someone navigating a complex, life-altering treatment decision who needed a comforting hand on their shoulder, not merely an AI-generated polished paragraph.

Physicians, health systems, clinical researchers, have spent years asking whether AI is fit for healthcare: purpose-built models, clinical workflows, safety guardrails.

Patients are asking a simpler question: Can I trust this answer? They can’t tell from tone alone.

What it will take to earn trust

I often ask myself: what will it take to feel confident that the people using the AI tools I help develop will be safe? Trust has to be earned through consistently delivering accurate recommendations that are relevant for the user, transparently stating the tool’s limitations, and explaining how safety and performance is watched after launch, including when a human is looped in for oversight.

Recommend the way clinicians do

The answer starts with the same foundation clinicians rely on — access to underlying data and an understanding of the individual’s complete health picture. Human clinicians rarely recommend where to seek care, suggest possible diagnoses, offer treatment advice, or refill prescriptions without first consulting a medical record, asking questions to fill gaps, and accounting for context: social circumstances, health literacy, and what a patient may be leaving unsaid. We also have been trained to recognize when we don’t know, and we should say so.

Patient-facing AI should behave the same way. Recommendations need to be grounded in the highest quality available information and explicit about where that information comes from and what might be missing. The danger isn’t only wrong answers. It’s confident, polished answers built on patient-reported inputs alone, with no acknowledgment that a critical signal may be absent, such as an unreported pregnancy, an escalating mental health crisis, or a misunderstanding of medication instructions. Any of those gaps can turn plausible guidance into unintentional harm.

Equally important is attempting to fill in knowledge gaps before delivering guidance. AI tools that offer health related advice should be trained to recognize missing information in an individual’s health record or from their description of symptoms - similar to how clinicians fill in medical histories and update medication lists upon clinic intake.

At Verily, the patient-facing AI agents that power Violet in Verily Me are designed with that clinical posture in mind. We’ve described how that architecture works in practice — from structured assessments to multi-stage safety checks. My focus here is simpler: a confident tone is not a substitute for clinical context.

Be clear about monitoring — and about its limits

Clinicians don’t stop learning after training. We undergo continuing education, peer review, and performance oversight throughout our careers. AI that delivers health-related guidance needs the same discipline after launch — not only before it.

It is unrealistic to assume every patient interaction can be reviewed by a human in real time. What is realistic is transparency: the goal for any AI tool used for health advice should be to offer explanations of what gets monitored, how often monitoring occurs, who is notified when something goes wrong, how quickly issues are addressed, and how learnings feed back into product improvement. Patients and clinicians should be able to calibrate trust to risk — the way we already approach many aspects of care today.

For the Verily Me app, our clinical quality team has designed a monitoring framework with clear standards for performance oversight. That framework has been designed to connect real-world safety and performance signals back into our formal product development lifecycle — so iteration, escalation, and quality improvement are driven by evidence, not assumption. Models can improve over time; they can also drift. Monitoring has to account for both.

Know when only a physician will do

One of the most important aspects of designing AI in healthcare is ensuring that patients are directed to licensed medical providers when additional clinical evaluation or judgment is needed. Today, AI tools can help people prepare for visits, understand options, and navigate routine questions. But when symptoms are complex, evidence is uncertain, or decisions may significantly impact a patient’s health, the system must reliably recognize its limitations and direct patients to appropriate medical care immediately.

While AI can support many aspects of care navigation and decision support, it still cannot replace the patient-clinician relationship when evidence is uncertain and choices are life-altering. As AI continues to advance, defining — and then redefining — clear rules for when to redirect to a telehealth or in-person medical visit will remain as important as any model capability.

Our responsibility

At Verily Health, we put these principles into practice by ensuring our AI agents prioritize accuracy and compliance. We achieve this through strict data quality standards and automated guardrails that oversee the AI-generated recommendations. By clearly defining how our AI tools make recommendations using available health data — and how they are monitored — we are laying the foundation to build trust for all users. We owe patients more than confidence. We owe them clarity, context, and care that knows its limits.

Disclaimers

*The information contained in this page is intended to outline Verily’s general product direction and is not a commitment or legal obligation to deliver any functionality. Product capabilities, timeframes, and features are subject to change.*