The performance of large language models has been measured in recent years primarily by taking into account the number of parameters established during the training phase. Under this logic, it was entirely logical to think that models improved their ability to perform tasks or solve problems as more parameters were included.
But there are indications to believe that we are witnessing a major paradigm shift in which the amount of parameters is not as important as previously believed. While an increasingly complex competitive landscape keeps a lot of information under lock and key, a clear example of this is the path that major players like Google and OpenAI will follow.
It is necessary at this point to point out the significance of this apparent change in trend. Providing a language model with a large amount of parameters involves a high investment of time and money. Now, if it is possible to build better models while saving money in this area, we could see much faster and more significant progress in various areas of AI.
PaLM 2, Fewer Parameters, More Data
A week ago, Google introduced its PaLM 2 language model, which aims to compete with OpenAI’s GPT-4. It is an evolution of PaLM, which came out a year earlier to compete with other products from Sam Altman’s company, the promising GPT-3 at the time. What’s Recently Viewed? That the Mountain View company is changing the way it trains its models.
Details about the technical features of Google’s latest model haven’t been released to the public, but internal documents seen by CNBC indicate that PaLM 2 has been trained with millions fewer parameters than its predecessor, And still boasts of better performance. In particular, the new generation model will have 340 billion parameters compared to 540 billion of the previous one.
In a blog post, the search engine company acknowledged the use of a new technique called “computational optimal scaling” to make the model’s overall performance more efficient, involving the use of fewer parameters and, consequently, a lower cost of implementation. Is. less training. Google’s trick for PaLM 2 comes from the second part: growing the data set.
Remember that datasets are made up of different types of information collected from web pages, scientific studies, etc. In this sense, leaked information indicates that the new Google has been trained with five times more data than the PaLM introduced in 2022. This transformation is presented in tokens, which are the units that make up the dataset.
PaLM 2 would have been trained with 3.6 billion tokens, while PaLM would only have 780 billion tokens. To give an idea of this scenario we can mention, for example, that Meta’s LLaMA model has been trained with 1.4 billion tokens. This information about GPT-4 is unknown, but the GPT-3 papers state that the model holds 300 billion coins.
This paradigm of using fewer parameters to train a model is not unique to Google. OpenAI is also working in this direction. For months Altman has pointed out that the race to increase the number of parameters reminded him of the late 1990s when the hardware industry was obsessed with increasing processor clock speeds.
As our colleague from GenBeta points out, the head of the AI company assures that “gigahertz has gone to the background” and gives an example of a scenario in which most people do not know the processor speed of their iPhone, but you know. it’s fast. “We really care about capabilities, and I think it’s important to focus on capabilities,” he says.
What are parameters?
Broadly speaking, the parameters enter the scene in the training phase of an AI model. These allow models to learn from the data and provide answers based on predictions. For example, if we train a model specifically designed to find homes based on price, it will learn parameters such as dimensions, location or amenities.