Instant video could represent the next leap in AI technology

go through Cade Metz

For more than a decade, Cade Metz has been writing about advances in artificial intelligence.

Ian Sansavera, a software architect at a New York-based startup called Runway AI, entered a short description of what he wanted to see in the video. “A peaceful river in the forest,” he wrote.

Less than two minutes later, an experimental internet service generated a short clip of a peaceful river in the forest. Glittering in the sun, the river cut through the trees and ferns, turned a corner, and lapped gently on the rocks.

Runway, which plans to open its service to a small group of testers this week, is one of several companies developing artificial intelligence technology that will soon let people just type a few words into a box on a computer screen Generate video.

They represent the next stage in the industry race — including giants such as Microsoft and Google as well as much smaller start-ups — to create new types of artificial intelligence systems that some believe could be the next big thing in technology, with the Internet browser or iPhone.

A new video-generating system could speed up the work of filmmakers and other digital artists, while serving as a new and quick way to create hard-to-detect online misinformation that makes it harder to discern what’s really going on on the internet.

These systems are examples of so-called generative AI, which can instantly create text, images and sounds. Another example is ChatGPT, an online chatbot developed by San Francisco startup OpenAI that shocked the tech industry late last year with its capabilities.

Meta, the parent company of Google and Facebook, Launched its first video generation system last yearbut not shared with the public for fear that these systems could eventually be used to spread disinformation with new speed and efficiency.

But Runway’s chief executive, Cris Valenzuela, said that despite the risks, he thinks the technology is too important to stay in a research lab. “This is one of the most impressive technologies we’ve created in the last hundred years,” he said. “You need to get people to actually use it.”

Of course, the ability to edit and manipulate movies and videos is nothing new. Filmmakers have been doing this for over a century. In recent years, researchers and digital artists have been using various AI techniques and software programs to create and edit videos commonly known as deepfake videos.

But systems like the one Runway created could, in time, replace editing skills with the push of a button.

Runway’s technology can generate a video from any short description. To start, you simply type a description, just like you type a quick note.

It works best if the scene has some action – but not too much action – like “rainy day in big city” or “dog with phone in park”. Hit enter and a video will be generated within a minute or two.

The technique can reproduce common images, such as a cat sleeping on a rug. Or it can combine different concepts to generate really funny videos, like a cow at a birthday party.

The video is only four seconds long, and if you look closely, the video is choppy and blurry. At times, the images are weird, distorted and disturbing. The system has a way of blending animals like dogs and cats with inanimate objects like balls and cell phones. But with the right prompts, it makes videos that show where the technology is headed.

“At this point, if I see high-resolution video, I might believe it,” said Philip Isola, a professor at the Massachusetts Institute of Technology who specializes in artificial intelligence, “but that will change soon.”

Like other generative AI techniques, Runaway’s system learns by analyzing digital data — in this case, photos, videos and captions describing what those images contain. By training this technique on increasing amounts of data, the researchers believe they can rapidly improve and expand its skills. Experts believe they will soon have a professional-looking mini-movie complete with music and dialogue.

It’s hard to define what the system is currently creating. This is not a photo. This is not a cartoon. It’s a collection of many pixels mixed together to create a realistic video. The company plans to offer its technology alongside other tools it believes will speed up the work of professional artists.

Last month, social media services were flooded with photos of Pope Francis wearing a white Balenciaga puffer jacket – surprisingly stylish for an 86-year-old pontiff. But the image is not real.A 31-year-old construction worker from Chicago is becoming a viral sensation Use a popular artificial intelligence tool called Midjourney.

Dr. Isola spent several years building and testing this technology, first as a researcher at UC Berkeley and OpenAI, and then as a professor at MIT. A complete sham of Pope Francis.

“Once upon a time, people would post deepfakes, but they wouldn’t fool me because they were too wacky or unrealistic,” he said. “Right now, we can’t take any of the images we see on the internet at face value.”

Midjourney is one of many services that can generate realistic still images based on short prompts. Others include Stable Diffusion and DALL-E, an OpenAI technology that started this wave of photo generators when it launched a year ago.

Midjourney relies on neural networks, which learn skills by analyzing large amounts of data. It looks for patterns as it combs through millions of digital images along with text captions describing what each image depicts.

When someone describes an image for the system, it generates a list of features that the image might contain. One characteristic might be the curve at the top of the dog’s ears. Another possibility is the edge of the phone. Then, a second neural network called a diffusion model creates the image and generates the pixels needed for the features. It ultimately converts pixels into a coherent image.

Companies like Runway, which has about 40 employees and raised $95.5 million, are using the technology to generate moving images. By analyzing thousands of videos, their technique can learn to string many still images together in a similarly coherent fashion.

“Video is just a series of frames – still images – put together in a way that gives the illusion of movement,” Mr Valenzuela said. “The trick is to train a model that understands the relationship and coherence between each frame.”

Like earlier versions of tools like the DALL-E and Midjourney, the technique sometimes combines concepts and imagery in strange ways. If you ask for a teddy bear playing basketball, it might give you a mutated stuffed animal with a basketball in its hand. If you ask for a dog with a cell phone at the park, it might give you a puppy with a cell phone with a weird human body.

But experts believe they can eliminate these flaws as they train their systems on more and more data. They believe the technology will eventually make video production as easy as writing a sentence.

“In the past, to do anything remotely like this, you had to have a camera. You had to have props. You had to have a location. You had to have permission. You had to have money. “You don’t have to own anything now. You can sit back and imagine it. “

Source link

A new generation of chatbots

Leave a Reply Cancel reply

Related News

You may have missed