ChatGPT is great, but it has one small problem: we need to be connected to their cloud service in order to use it. OpenAI uses hundreds (thousands?) of very expensive professional GPUs to provide the service to users, and this means that the cost of providing and using the service can be high. Another one is now joining this race to win the battle of generative AI models: it makes it possible for us to use them locally (and even offline) on our smartphones, not in the cloud.
A MiniChatGPT for your mobile. Considering the massive resources ChatGPT consumes, one might think that being able to have a chatbot with this capability running natively on our mobile sounds unimaginable, but it is not. In fact, in recent weeks we’ve seen a number of projects that point towards that future.
Google and Gecko. One of them is Gecko, one of the variants in which the Mountain View company proposes to deploy its new LLM PaLM 2 model—which competes with OpenAI’s GPT-4. According to Google, Gecko is small enough to be able to run natively on a smartphone—for example, they managed to get it working on a Samsung Galaxy—and although they didn’t demonstrate this capability, the announcement of intent was compelling. .
Hybrid AI. Some companies like Qualcomm are already starting to talk about hybrid artificial intelligence platforms, in which we use both models like ChatGPT in the cloud and others like Gecko on mobile. The company’s CEO, Cristiano Amon, told the Financial Times how relying only on a cloud model would be too expensive. Combining this usage with an LLM model capable of running on mobile will reduce costs. At Qualcomm they’ve already experimented with that option, and managed to run local and locally stable diffusion on one of their SoCs.
Call. The trend to “miniature” ChatGPT gained momentum with the appearance of LLaMA, the LLM model of meta. There is a variant of this model –in other– which is 7 billion parameters (“7B”) in size, portable to mobile devices to run locally. That’s exactly what a team at Stanford University did, creating a specific version that they managed to get working on the Google Pixel 6. it was slow going yeah, but it worked. The same organization would also publish the Alpaca, a “tuned” model based on the LLaMA 7B that was capable of running on much more modest hardware than Meta’s model.
And there are (many) more. Mobile-ready generative AI models are on the rise. A few days ago the open source MLC LLM project appeared with a clear objective: to be able to deploy the LLM model to different hardware platforms including, of course, mobile phones. This project can be installed on many MacBooks, but also on some iPads or iPhone 14 Pro. Performance is very modest: about 7.2 tokens/sec on the iPhone 14 Pro, something like ChatGPT is writing its responses at 4-6 words per second. ,
Don’t let the rhythm stop. This explosion of open source projects has some in the field of AI already talking about an “Android moment” of sorts. At Madrona he talked about promising projects like Dolly (Databricks), OpenChainKit (Together.XYZ), Cerebras-GPT (Cerebrus) or HuggingFace. Apple just dropped a small nod to this segment these days with the announcement of a feature that allows you to train the iPhone to read sentences with your voice (and keep everything running on the device). With the speed everything is moving in this area, it doesn’t seem out of the question that soon we will have ChatGPT working locally, directly on mobile without connecting to the cloud.