The original article requires translation.
While the chatbot ChatGPT is hotly debated in schools, more and more teachers are putting the text robot to the test. This shows that artificial intelligence is far from being a top candidate. But in some cases she manages to at least not fail.
The OpenAI software did remarkably well in an exam for prospective physicians in the USA. In the three theoretical parts of the United States Medical Licensing Exam (USMLE), ChatGPT repeatedly achieved the required minimum score under certain conditions. This is reported by US experts in the journal PLOS Digital Health. However, the team led by Victor Tseng from the Californian start-up AnsibleHealth left out a few questions.
Simplified examination conditions
The USMLE is a standardized three-part exam that medical students must pass in order to practice medicine in the United States. The knowledge from the various medical disciplines - from biochemistry to diagnostic thinking to bioethics - is evaluated. The authors of the study asked ChatGPT 350 questions, which were part of the official exam last summer.
However, when testing with ChatGPT, they had to take into account that the OpenAI AI software can only accept text input. So they refrained from asking questions that involved images. In addition, ambiguous answers were sorted out. With these adjustments, ChatGPT achieved 52.4 to 75 percent of the points that could still be achieved in the three USMLE exam parts. The threshold for passing is around 60 percent and changes slightly depending on the year. If the ambiguous answers were included in the result, ChatGPT achieved 36.1 to 61.5 percent of the possible points. According to the authors, ChatGPT outperformed PubMedGPT, a counter-model trained exclusively on biomedical literature.
There is a problem with the German Abi
In a German test conducted by Bayerischer Rundfunk together with teachers at Bavarian grammar schools, the result was significantly worse . Here the software was confronted with several Abitur exams from 2022. Ironically, ChatGPT failed in the supposed parade discipline, text analysis.
In the test, the program was asked to analyze a text by Miriam Meckel entitled »When an algorithm revolutionizes the entire history of literature: In the Maschinocene, machines write better texts than people«. The devastating verdict of the evaluating teacher Patrick Dorn: »That's a lot of drivel«. His rating of the AI work: three points, i.e. a 5. The system was also not convincing in another German task. After all: In the math part, the program would have passed with a grade of 4-.
The only bright spot in the test was history. The population development from the 15th to the 18th century was described by the program as "very vague, simplified and general", but it was awarded nine points - i.e. a 3+. Computer science, of all things, was the weakest subject for the product of the most modern computer science: There were only two points here.
The text robot ChatGPT had previously asserted itself in other university subjects, even if it did not achieve top marks in the exams. According to professors at the University of Minnesota, the system passed the law exams in four courses and in another course at the Wharton School of Business at the University of Pennsylvania.