Alibaba open sources its video-generation AI model
Chinese cloud provider Alibaba has released four versions of its video-generation AI model as open source, allowing users to download and run them for free on capable PCs.
The Wan2.1 text-to-video model “excels at generating realistic visuals by accurately handling complex movements, enhancing pixel quality, adhering to physical principles, and optimizing the precision of instruction execution,” the company said in a blog post.
The model is a free alternative to OpenAI’s Sora video-generation model, which created waves when it was commercially released last year. Sora is part of ChatGPT Plus plan and costs $20 per month with per-month limits of up to 50 videos at 480p resolution and fewer 720p videos. Another option, Google’s Veo 2 is only available to select users.
The four Wan2.1 models “are designed to generate high-quality images and videos from text and image inputs,” Alibaba said.
The models have between 1.3 billion and 14 billion parameters to generate videos lasting a few seconds at a resolution up to 720p video. It’s not clear whether the company plans to release a model capable of generating 1080p video.
Video generation AI could be a useful productivity tool, but it has a long learning curve, said Jack Gold, principal analyst at J. Gold Associates. “A lot of models are rudimentary,” he said. “You aren’t making three-hour movies out of it. It’s still early days.”
Gold likened video-generation AI models today to word processors in the 1980s, which got better over time. What’s different with AI is that users are feeding information to the model.
“From the perspective of an enterprise user, the question is — what am I giving away for free? A lot of these programs are going to learn from what you use them for,” Gold said.
Even so, the open-source text-to-video model gives enterprise users something they never had, said Karl Freund, founder and principal analyst at Cambrian AI Research.
“It’s going to be a huge market,” Freund said, with a lot of interest from creative, media, and enterprise users.
Freund said enterprises spend a lot of money on multimedia, with many text-to-image generation models from Adobe, OpenAI, Google and X.AI already being used in the cloud. Video is the next step.
Chinese AI providers are already shaking up the market, with Alibaba’s Wan2.1 the latest to arrive. The DeepSeek chatbot tool, for example, demonstrated advances made by Chinese companies in AI, and Wan2.1 demonstrates progress in video models. Also in the mix: Microsoft and Amazon, which now offer DeepSeek R1 through their cloud services.
“We’ve always believed that no single model is right for every use case, and customers can expect all kinds of new options to emerge in the future,” Amazon Web Services CEO Matt Garman said in a LinkedIn post last month.
As they did with DeepSeek, cloud providers may take Wan2.1 and offer it through their own services to generate revenue, Freund said.
The analysts were mixed about security concerns that could arise from the video-generation model. Gold pointed out the Wan2.1 model could be used maliciously to generate deepfakes.
“There’s bad and good with everything,” he said.
The Chinese origins of the model also concerned Gold, but it is open for inspection and open-source advocates will comb through it as they did with DeepSeek.
The models are available for download on Alibaba Cloud’s AI model community, Model Scope and via Hugging Face, which also hosts public AI models such as Meta’s Llama, Microsoft’s Phi and Google’s Gemma.
Source:: Computer World
No comments