Experts from all over the world have appealed to all countries, demanding more transparency in the evaluation of artificial intelligence systems.
“Recent advances in artificial intelligence based on systems that require large amounts of data and computation, such as GPT-4, have highlighted the difficulties in understanding the capabilities and weaknesses of these artificial intelligence systems. We do not know where it is safe to use these systems or how they can be improved. And this is because of the way artificial intelligence is evaluated today, which urgently needs change.
Behind these words are 16 leading experts in artificial intelligence from around the world, including Jose Hernandez-Orallo, Fernando Martinez Plumed and Wout Schelert, researchers at the VRAIN Institute of the Polytechnic University of Valencia (UPV) in Spain.
Coordinated by Professor Hernandez-Orallo, 16 researchers recently published a paper in the academic journal Science in which they claim the need to “rethink” the evaluation of artificial intelligence tools to move towards more transparent models and to learn how For this is their real effectiveness, what they can and cannot do.
In their paper, the authors propose a roadmap for artificial intelligence models, in which their results are presented in a more nuanced manner and the results of evaluation are made publicly available on a case-by-case basis.
As Hernandez-Orallo points out, the performance of an artificial intelligence model is measured by the collected data. And this implies a risk, because although they may give the impression of their good overall performance, they may also hide less reliability/usefulness in specific cases, the greater minority,” and yet it is given to understand that It is equally valid in all cases when in fact it is not.”
In the document, the signatories explain this with the case of artificial intelligence models to help with clinical diagnosis and point out that these systems can have problems when they analyze people from a specific ethnic group or demographic group. do, because they are matters that are only a small part of your training.
“What we ask is that every time a result in artificial intelligence is published, it is broken down as much as possible, because if this is not done, it will not be possible to know its true usefulness and to reproduce the analysis. In the article published in Science we also talked about artificial intelligence facial recognition system which gave 90% success rate and later it was found that for white men the success percentage was 99.2% but for black women it was only 65% only. 5%. This is why on some occasions the results sold on the utility of Artificial Intelligence tools are not completely transparent and reliable. If they don’t give you the details, you think the model work very well and this is not the reality. The absence of this breakdown with all possible information on artificial intelligence models means that their implementation can increase the risk”, explains José Hernández-Orallo.
The VRAIN UPV researchers highlight that the changes they propose may contribute to improving knowledge about the true efficiency of artificial intelligence systems. And to belittle the “fierce” competition currently existing among artificial intelligence labs to announce that their model improves on other previous systems by a certain percentage.
“There are labs that want to go from 93% to 95% no matter what and that goes against the ultimate applicability and reliability of artificial intelligence. In short, we want to help us all better understand how artificial intelligence works, what are the limits of each model to guarantee good use of this technology.
Together with researchers from the VRAIN Institute of the Polytechnic University of Valencia, University of Cambridge, Harvard University, Massachusetts Institute of Technology (MIT), Stanford University, Google, Imperial College London, University of Leeds, Alan Turing Institute London, DeepMind, US National Institute of Standards and Technology (NIST), Santa Fe Institute, Shanghai Tongji University and Jinan Shandong University. (Source: UPV)