META'S AI can turn any text into a video.


Meta announced Make-A-Video, a tool that generates short video clips from text descriptions—an unsettling, albeit inevitable, next step for the world of AI image generation. Watch on YouTube

Generative AI research is pushing creative expression forward by giving people tools to quickly and easily create new content. With just a few words or lines of text, Make-A-Video can bring imagination to life and create one-of-a-kind videos full of vivid colors, characters, and landscapes. The system can also create videos from images or take existing videos and create new ones that are similar.

It's much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they'll change over time. Research Paper

Make-A-Video solves this by adding a layer of unsupervised learning that enables the system to understand motion in the physical world and apply it to traditional text-to-image generation.

The example videos on the Make-A-Video site show videos of “a dog wearing a Superhero outfit with red cape flying through the sky” and “a teddy bear painting a portrait.” The videos are clearly AI-generated, with a blurry, painterly quality native to AI-generated images. Yet, they nonetheless show the fast-moving progress of AI art systems, which only a few years ago were the stuff of memes and science fiction.

Meta seems to be aware of the dangers behind AI art-generating systems, and claims it is “openly sharing this generative AI research and results with the community for their feedback, and will continue to use their responsible AI framework to refine and evolve their approach to this emerging technology.”

But according to the Make-A-Video research paper, the image models were trained using a subset of the LAION dataset, which is known for scraping unfiltered web data that produces biased results.

Within this dataset were images of ISIS executions, nonconsensual nudes, and photoshopped nudes of celebrities. Meta seems to address this issue by parsing down the original data set of over 5.8 billion images down to 2.3 billion, with the paper’s authors claiming, “We filter out sample pairs with NSFW images, toxic words in the text, images with a watermark probability larger than 0.5.

Meanwhile, AI ethics researchers have pushed back against the use of these large language models, warning that their sheer size creates fundamental problems of harmful bias that can not be easily solved. Even Facebook’s own researchers have admitted that their language models have a “high propensity” for producing racist and harmful results.

Comments

Popular posts from this blog

MX-Phoenix | The tarantula-inspired robot was built by one man in his garage

DRAGON | A Japanese dragon drone moves independently through the air, resembling the sinuous motion of a flying snake

Google DeepMind is Using AI to Teach Robots to Play Soccer