Meta is currently offering an AI video generation service called Make-A-Video via Twitter. As scary as it may seem now, the number of comments in just one day suggests that the fad of AI image generation will soon be replaced by AI video generation. This is a giant leap, as we know researchers are pushing the boundaries of generative art, specifically how much data is needed to bring images to life.
“With just a few words, this state-of-the-art AI system can generate high-quality video from text prompts,” Meta AI wrote in a tweet, calling for the prompt. The trick to stopping a lot of unregulated gore and pornography from being generated and posted on Twitter?send them the prompt and they possible Post results.
We’re excited to launch Make-A-Video, our latest #GenerativeAI research! With just a few words, this state-of-the-art AI system can generate high-quality videos based on text prompts. Have an idea you’d like to see? Reply to your tips with #MetaAI and we’ll share more results. pic.twitter.com/q8zjiwLBjbSeptember 29, 2022
An alternative to waiting for the Meta AI team (possibly scarred for life) to choose your tip from thousands of reviews is to go to production video studio (opens in new tab) and sign up using Google Forms Register your interest (opens in new tab) in the tool.
The accompanying research paper (PDF warning (opens in new tab)) referred to the Make-A-Video process as “an efficient method for extending diffusion-based T2I models to T2V by decomposing diffusion models in space and time”. It’s a fancy way of saying that they used an evolved version of a diffuse text-to-image generative model to make pictures move.
“While significant progress has been made in T2I generation,” the paper reads, “the progress in T2V generation is lagging for two main reasons: the lack of large-scale datasets with high-quality text-video pairs, and the lack of large-scale datasets with high-dimensional video data. modeling.”
Essentially, the dataset size and accuracy required to train current text as a video AI model is too large to be achieved.
The amazing thing about this evolution, the paper states, is that “it does not require paired textual video data.” This is unlike many video and image generators that rely on content libraries already paired with text. “This is a significant advantage over previous work,” it explained, because it’s not limited and doesn’t require much data to work.
There are several ways to use this tool, it can fill in motion between two images, simply add motion to a single image, or create a new video variant based on the original video. The results are fascinating. They are dreamy and psychedelic and can be generated in several different styles.
Sure these are a little scary, especially when you remember the results only get more real, but on Halloween, a small hike through the Uncanny Valley never hurts.