When the camera was invented, painters must have felt betrayed. It took less time and skill to develop a single photograph and it was exactly the same as the subject, a mirrored reflection. And then a few years later, digital arts came into being. Anyone, regardless of skill or experience, can create art and photos.
Machine learning was on another level. Programs can identify what is in a picture and create an appropriate caption for it, making it easier for search engines to bring the right result to every query. From picture to text, it didn't require human skill at all.
Around 2015, a group of researchers was curious about reversing the process. What if a string of words can be translated into a whole picture? If machines can make out what a picture is and come up with a string of words to describe it, would it be possible to flip the process around? It was obviously more difficult.
One way is having a catalog of all pictures the machine can choose from, but researchers were not interested in that. Rather, they wanted a program that can create original, novel pictures, those that never existed in the real world. They wanted a program that can conjure images on its own.
A computer model researchers tested was a faint success. With the prompt “a green school bus parked in a parking lot,” the sample model generated a 32x32 pixel image of something color green, a little blob. It was blurred but it already meant a lot to the researchers. They sampled more prompts and it was the same thing, a small blob, but of the appropriate colors and the researchers could make out the outline of the object in the prompt.
In just a matter of a year, the text-to-image generator came alive. The green school bus parked in a parking lot could not be mistaken for something else. And in 2018, an AI-generated portrait was sold at an auction for $400,000.
Hyper Realistic photos even became available to the masses as a form of entertainment, with face apps where a user can see what they may look like when they get older.
The portrait-making model and the face-making model machines were trained to do just portraits and faces. However, a text-to-image model required a bigger and more complicated approach.
In 2021, an AI company announced the creation of a text-to-image AI that they called Dall-E, an allusion to artist Salvador Dali and the fictional robot Wall-E. It could generate images from a simple line of text as input; no camera, no canvas, and no code needed. Dall-E 2 which promised more realistic and a wider range of results was also created. However, it was never made available to the public.
Thanks to independent developers, free text-to-image generators were available on the internet. One of the well-known companies to adopt these projects was MidJourney which even has a Discord bot that translates input text to an image in less than a minute.
The massive development in text-to-image tech in such a short time is a testament to the seemingly limitless and unstoppable powers of technology. It makes one wonder what it is to me in the next two years or so.
Comments 0
NA