Munjal Shah and seven co-founders bring to life Palo Alto-based Hippocratic AI, dedicated to providing AI-powered solutions for medical students, doctors and patients.
Munjal Shah envisions a future where everyone has access to a nutritionist, genetic counselor, and health insurance billing specialist at the push of a button. However, none of them will be human: they will all be voice or text chatbots. These bots, he says, will answer patients’ questions and provide guidance, with one important caveat: They won’t (at least not yet) diagnose medical conditions.
“We are projected to have an overall shortfall of three million healthcare workers over the next few years,” says Shah. “We believe that one of the biggest risks to the quality of healthcare in the United States is the staffing shortage. And there’s a shortage of staff. We have to fill that gap and use technology to help us.”
Munjal Shah
Shah and seven co-founders have raised a $50 million seed round from General Catalyst and Andreessen Horowitz to develop the grand linguistic model that will drive all these different healthcare robots. They’re calling for the ethical code doctors embrace, a Palo Alto-based startup called Hippocratic AI. That code, based on the writings of the ancient Greek physician Hippocrates, is often expressed as “do no harm”.
But generative AI models can’t be sworn to an ethical code and, as viral chatbot ChatGPT has shown, they can even give false information in response to questions. Regulators have vowed to take a closer look at their use in health care, and FDA Commissioner Robert Califf told a conference earlier this month that he “views regulation of large language models as critical to our future”. .
While the future regulatory landscape is unclear, Shah says that Hippocratic AI is taking a three-pronged approach to testing its large language model in health care settings, including passing certification, training with human feedback and a company that This includes testing what it calls “header manners”. Shah says that instead of providing healthcare customers with access to the entire model, Hippocratic AI plans to provide access to different healthcare “roles,” which will be released when a certain role meets a set of “performance and safety” criteria. will reach a certain level.
FDA Commissioner Robert Calif
An important criterion would be the licensing exam and certification a human would have to pass in order to work in that role.
That outlook is one reason Julie Yu, a general partner at Andreessen Horowitz, decided to invest. “It takes a lot of hard work and weight on the manufacturing side to get it right, rather than building a prototype and throwing it over the fence as you would with a typical enterprise software company,” says Yu. Former company, Health IQ. That company used AI to match senior citizens with Medicare plans based on their medical history.
Future doctors spend years painstakingly preparing for a series of national medical licensing examinations that test their knowledge gained from books, lectures and practical experience. In April, Google claimed that its large medical language model Med-PaLM 2 achieved 85.4% accuracy on the US medical licensing exam, while Microsoft and OpenAI claimed that GPT-4, which trains with public Internet data , he secured 86.65%.
Julie Yu, general partner at Andreessen Horowitz
Shah points out that each company is running a subset of the test (and the models may not be answering the same questions), so it’s hard to compare, but Hippocratic AI’s model outperformed the GPT-4 in the text-based questions. Defeated by 0.43%. They tried to estimate the same subset.
Along the same lines, they claim that Hippocratic AI tested their model against 114 different benchmarks against the GPT-4, which includes exams used for doctors, nurses, dentists, pharmacists, audiologists, and medical coders. and certifications are included. Hippocratic defeated GPT-4 105, drew six and lost three.
But this brings us to the larger question of what exactly is captured when a machine does a test, and what does the test suggest about human equivalence. Shah acknowledged that testing was “necessary but not sufficient” when implementing these models in the real world. He declined to name any specific healthcare data sets that Hippocratic was trained on.
“When humans do these kinds of exams, we’re making all kinds of assumptions,” says Kurt Langlotz, professor of radiology and medical informatics and director of Stanford’s Center for Artificial Intelligence in Medicine and Imaging. Aye. The assumption is that the human has been to college and medical school and has clinical training and experience. “These linguistic models are a different kind of intelligence. They are both smarter than us and dumber than us,” he says. They are trained on large amounts of data, but they also have the ability to “hallucinate”, generate incorrect answers and spot simple math errors.
Kurt Langlotz, professor of radiology and medical informatics and director of Stanford’s Center for Artificial Intelligence in Medicine and Imaging
One of the guardrails that Hippocratic AI implements is the use of real humans to refine the responses of other models, known as reinforcement learning with human feedback. This means that for a given role, say a dietician, a Hippocratic AI will rank its responses to human dieticians and adjust them accordingly. The company will also continue to develop a set of benchmarks called “Patient Care,” which involves scoring AI models based on performance metrics such as empathy and compassion.
David Sontag, professor of electrical and computer engineering at MIT, says, “The same techniques that are useful for improving the communication of information … are useful for recognizing when the model does not know or should not respond to the model.” ” He is not affiliated with Hippocratic AI and is working on his own stealthy startup. He gives an example of a scenario in which the correct response would be for the patient to call 911. Training the model not to respond is an important part of the reinforcement learning process, he says.
David Sontag, Professor of Electrical and Computer Engineering at MIT
Hippocratic AI will use healthcare workers to train its models, and plans to work closely with healthcare customers during the development phase, as their patients will be the end users. While the company hasn’t announced any customers yet, Hemant Taneja, CEO and managing director of General Catalyst, said there is “a lot of interest” among the various health systems his company works with. “To solve the problem of labor shortage, and unlock that human potential at a larger scale, it can be launched for more people at a more affordable price,” he says. “I think it’s a huge health equity play.”