Imagine “a robot couple having a fine meal with the Eiffel Tower in the background”? For us humans, it is very easy to picture this in our mind. Of course, the more creative people among us can easily bring these words to life in their artwork. And now Google’s AI model called Imagen is capable of doing something similar. In a new announcement, Google has shown that Imagen, a text-to-image diffusion model, is capable of creating images based on written text.
However the most notable part is the accuracy and photorealism seen in the pictures, all created by these models. Google has displayed several artifacts created by Imagine, which accurately depict the sentence in question. For example, there is an android mascot made of bamboo. Another Angry Birds shows up. Another shows a chrome-plated duck with a golden beak arguing with an angry turtle in the woods.
Check out some of the artwork below
Google says Imagen is based on its “big transformer language model” that helps AI understand text. Imagine also helped Google researchers make another important discovery, that general large language models are “astonishingly effective at encoding text for image synthesis.”
However, the company notes that it has limitations, including “a number of ethical challenges facing widespread text-to-image research”. It recognizes that it can affect “society in complex ways”, and risks misuse of such models. That’s why it is not releasing code or public demo right now.
Google’s blog notes “the data requirements of text-to-image models have led researchers to rely heavily on large, mostly unsupervised, Web-scraped datasets”. The problem with such datasets is that they often reflect “associations of social stereotypes, repressive attitudes, and abusive, or otherwise harmful, marginalized identity groups,” according to the blog.
The post states that “a subset of our training data was filtered to remove noise and undesirable content, such as pornographic imagery and toxic language,”. But the dataset used by Google, which is the LAION-400M, “is known to contain a wide range of inappropriate content, including pornographic imagery, racist slurs and harmful social stereotypes,” notes the company.
Google acknowledges that “there is a risk that Imagen has encoded harmful stereotypes and representations, which guide our decision not to release Imagen for public use without any safeguards.”
In the end, Imagination is still very limited when it comes to producing art that depicts people, and is producing mostly stereotypical results. Google says it has “social bias and stereotypes, including an overall bias toward generating images of people with lighter skin.” Furthermore, when asked to depict different occupations, there is a preference for depicting Western gender stereotypes.