Microsoft Says Its New AI System Diagnosed Patients 4 Times More Accurately Than Human Doctors

Microsoft has taken “a genuine step towards medical superintelligence,” says Mustafa Suleyman, CEO of the company’s artificial intelligence arm. The tech giant says its powerful new AI tool can diagnose disease four times more accurately and at significantly less cost than a panel of human physicians.

The experiment tested whether the tool could correctly diagnose a patient with an ailment, mimicking work typically done by a human doctor.

The Microsoft team used 304 case studies sourced from the New England Journal of Medicine to devise a test called the Sequential Diagnosis Benchmark (SDBench). A language model broke down each case into a step-by-step process that a doctor would perform in order to reach a diagnosis.

Microsoft’s researchers then built a system called the MAI Diagnostic Orchestrator (MAI-DxO) that queries several leading AI models—including OpenAI’s GPT, Google’s Gemini, Anthropic’s Claude, Meta’s Llama, and xAI’s Grok—in a way that loosely mimics several human experts working together.

In their experiment, MAI-DxO outperformed human doctors, achieving an accuracy of 80 percent compared to the doctors’ 20 percent. It also reduced costs by 20 percent by selecting less expensive tests and procedures.

“This orchestration mechanism—multiple agents that work together in this chain-of-debate style—that’s what’s going to drive us closer to medical superintelligence,” Suleyman says.

The company poached several Google AI researchers to help with the effort—yet another sign of an intensifying war for top AI expertise in the tech industry. Suleyman was previously an executive at Google working on AI.

AI is already widely used in some parts of the US health care industry, including helping radiologists interpret scans. The latest multimodal AI models have the potential to act as more general diagnostic tools, though the use of AI in health care raises its own issues, particularly related to bias from training data that’s skewed toward particular demographics.

Microsoft has not yet decided if it will try to commercialize the technology, but the same executive, who spoke on the condition of anonymity, said the company could integrate it into Bing to help users diagnose ailments. The company could also develop tools to help medical experts improve or even automate patient care. “What you’ll see over the next couple of years is us doing more and more work proving these systems out in the real world,” Suleyman says.

The project is the latest in a growing body of research showing how AI models can diagnose disease. In the last few years, both Microsoft and Google have published papers showing that large language models can accurately diagnose an ailment when given access to medical records.

The new Microsoft research differs from previous work in that it more accurately replicates the way human physicians diagnose disease—by analyzing symptoms, ordering tests, and performing further analysis until a diagnosis is reached. Microsoft describes the way that it combined several frontier AI models as “a path to medical superintelligence,” in a blog post about the project today.

The project also suggests that AI could help lower health care costs, a critical issue, particularly in the US. “Our model performs incredibly well, both getting to the diagnosis and getting to that diagnosis very cost effectively,” says Dominic King, a vice president at Microsoft who is involved with the project.

“It is quite exciting,” says David Sontag, a scientist at MIT and cofounder of Layer Health, a startup that builds medical AI tools. Sontag says the work is important not only because it more closely mirrors the way physicians operate but also because it is rigorous about addressing potential issues with the underlying methodology. “That’s what makes this paper strong,” he says.

But Sontag says that Microsoft’s findings should be treated with some caution because doctors in the study were asked not to use any additional tools to help with their diagnosis, which may not be a reflection of how they operate in real life. He adds that it remains to be seen whether the AI system would significantly reduce costs in practice. The doctors involved in the study may have taken into account factors that the AI could not, such as a patient’s tolerance for a procedure or the availability of a particular medical instrument.

“This is an impressive report because it tackles highly complex cases for diagnosis,” says Eric Topol, a scientist at the Scripps Research Institute. Showing that AI could, in theory, reduce the cost of medical care is novel, he adds.

Both Topol and Sontag of MIT say that the next step in validating the potential of Microsoft’s system ahead of general deployment would be demonstrating the tool’s effectiveness in a clinical trial comparing its results with those of real doctors treating real patients. “Then you can get a very rigorous evaluation of cost,” Sontag says.

Related Posts

SEC to rule on Grayscale’s proposal to convert Bitcoin, Ether, XRP, and ADA large-cap fund into a spot ETF this week

Ripple-backed XRP Ledger launches EVM-compatible sidechain to unlock cross-chain DeFi

Backed rolls out tokenized Apple, Microsoft, and Amazon stocks on Bybit and Kraken

Leave a Reply Cancel Reply