Meet DALL-E, the AI ​​that draws anything at your direction

SAN FRANCISCO – OpenAI, one of the world’s most ambitious artificial intelligence labs, researchers are developing technology that lets you create digital images by describing what you want to see.

They call it DALL-E in support of both the 2008 animated movie “WALL-E” and the surrealist painter Salvador Dali about an autonomous robot.

OpenAI, backed by a 1 billion funding from Microsoft, is still not sharing the technology with the general public. But one recent afternoon, Alex Nicole, one of the researchers behind the system, showed how it works.

When he wanted “a tea-pot in the shape of an avocado,” typing these words onto a large blank computer screen, the system created 10 distinct images of a dark green avocado tip, with holes in some and nothing else. “DALL-E is good at avocados,” said Mr Nicole.

When he typed “cats playing chess” it placed two fluttering kittens on either side of a checkered game board, with 32 chess pieces lined up between them. When he called out, “A teddy bear is playing the trumpet under the water,” a picture shows small bear air bubbles rising from the edge of the bear’s trumpet towards the water’s surface.

DALL-E can also edit photos. When Mr. Nicole removed the teddy bear’s trumpet and asked for a guitar instead, a guitar appeared between the hairy arms.

A team of seven researchers spent two years developing the technology, which OpenAI finally plans to offer as a tool for people like graphic artists, who will provide new shortcuts and new ideas when creating and editing digital images. Computer programmers already use Copilot, a tool based on a technology similar to OpenAI, to create snippets of software code.

But for many experts, DALL-E is worrisome. As such technology continues to evolve, they say, it could help spread confusion across the Internet, the kind of online campaign that helped influence the 2016 presidential election.

“You can use it for good, but of course you can use it for all sorts of other crazy, disturbing applications, and it has deep fakes,” said Subbarao Kamhampati, an Arizona state computer science professor. University.

A decade and a half ago, the world’s leading AI labs developed systems that could identify objects in digital images and even create their own images, including flowers, dogs, cars and faces. A few years later, they created a system that could do the same thing with written language, summarizing articles, answering questions, creating tweets, and even writing blog posts.

Now, researchers are combining those technologies to create new forms of AI DALL-E. This is a significant step because it awakens both language and image and, in some cases, realizes the relationship between the two.

“We can now use multiple, intersecting streams of information to create better and more advanced technology,” said Oren Itzioni, chief executive of the Allen Institute for Artificial Intelligence, an artificial intelligence lab in Seattle.

Technology is not perfect. When Mr. Nicole asked DALL-E to “set up the Eiffel Tower on the moon,” he did not fully understand the idea. It placed the moon in the sky above the tower. When he wanted “a living room filled with sand,” it created a scene that looked more like a construction site than a living room.

But when Mr. Nicole tweaked his requests a little, adding or subtracting a few words here or there, he provided what he wanted. When he wanted “a piano in a living room filled with sand”, the picture looked like a beach in a living room.

DALL-E is what artificial intelligence researchers call a neural network, a mathematical system built separately on a network of neurons in the brain. It’s the same technology that recognizes commands uttered on a smartphone and detects the presence of pedestrians as self-driving cars navigate city streets.

A neural network learns skills by analyzing large amounts of data. Thousands of avocados mark the pattern on the photo, for example, it can learn to recognize an avocado. DALL-E looks for patterns because it analyzes millions of digital images as well as text captions that describe what each image represents. In this way, it learns to recognize the connection between images and words.

When someone describes an image for DALL-E, it creates a set of key features that may be included in this image. A feature could be the line at the edge of a trumpet. Another may be the curve above the teddy bear’s ear.

Then, a second neural network, called a diffusion model, produces the image and the pixels needed to perceive these features. The latest version of the DALL-E, unveiled Wednesday with a new research paper describing the system, produces high-resolution images that in many cases look like images.

Although DALL-E often fails to understand what someone has described and sometimes manipulates the image it creates, OpenAI technology continues to improve. Researchers can often refine the efficiency of a neural network by feeding it more data.

They can create more powerful systems by applying the same concept to new types of data. The Allen Institute recently developed a system that can analyze audio as well as images and text. After analyzing millions of YouTube videos, including audio tracks and captions, it has learned to identify specific moments, such as barking dogs or closing doors on TV shows or movies.

Experts believe that researchers will continue to develop such systems. Finally, these systems can help companies improve search engines, digital assistants and other general technologies, as well as automate new tasks for graphic artists, programmers and other professionals.

But there are caveats to that possibility. AI systems can be biased against women and people of color, as they learn their skills from the huge pool of online text, images and other data that show bias. They can be used to create pornography, hate speech and other offensive material. And many experts believe that technology will eventually make it so easy to create confusion that people will have to be skeptical of almost everything they see online.

“We can forge text. We can put text in someone’s voice. And we can forge pictures and videos, “said Dr. Itzioni. “There is already misinformation online, but the concern is that this scale of confusion is at a new level.”

OpenAI is taking a firm stand on DALL-E. This will prevent outsiders from using their own systems. It puts a watermark in the corner of each image generated. And although the lab plans to open the system to testers this week, the group will be smaller.

The system also has filters that prevent users from creating inappropriate images. When asked “a pig with a sheep’s head” it refuses to make an image. According to the lab, the combination of the words “pig” and “head” probably tripped OpenAI’s anti-bullying filters.

“It’s not a product,” said Mira Murati, head of research at OpenAI. “The idea is to understand the strengths and limitations and give us a chance to calm down.”

OpenAI can control the behavior of the system in some ways. But others around the world could soon create similar technologies that hold almost the same power in the hands of almost anyone. Working from a research paper describing an early version of DALL-E, Boris Dema, an independent researcher in Houston, has already developed and published a simplified version of the technology.

“People need to know that the pictures they are seeing may not be real,” he said.

Leave a Comment

Your email address will not be published.