Sign Up
Pricing
Launch App
Sign Up
Post Hero Image

Wan 2.1 in the Cloud – Open-Source AI Video Generation - No Installation Required

Post Hero Image

Wan 2.1 in the Cloud – Open-Source AI Video Generation - No Installation Required

What is Wan 2.1?

Wan 2.1 is an AI text-to-video and image-to-video model from Alibaba that generates videos from text, images, and video inputs. It’s open-source, powerful, and runs on consumer GPUs. No paywalls, no closed API.

Github: https://github.com/Wan-Video/Wan2.1

Download the workflow here.

Why Does Wan 2.1 Stand Out?

  • Beats most other open & closed-source models
  • Runs on consumer GPUs – A 4090 can do 5s of 480P video in ~4 min.
  • More than just text-to-video – Supports image-to-video, video editing, AI text rendering, and sound generation.
  • Better motion & physics – Handles dancing, action scenes, natural camera movement.

What You Can Do with Wan

🎥 Generate Cinematic Clips – From retro animations to realistic movie scenes.
🏃 Create Dynamic Motion – Think sports, parkour, fast camera angles.
🖌️ Edit Like a Pro – Inpainting, outpainting, structure & pose control.
🔤 Generate Video Text – Supports Chinese & English stylized fonts.
🎵 Auto-Generate Sound – Adds music & effects that actually match the visuals.


How to Use Wan 2.1 on ThinkDiffusion

Wan 2.1 is now in ComfyUI Beta on ThinkDiffusion.

We preloaded everything. You literally just drag and drop the workflow and start making incredible videos.

To start creating, just follow the steps below and get going:

  1. Download the workflow here or shared below.
  2. Launch a ComfyUI beta Ultra machine.
  3. Drag the json into the ComfyUI canvas (no need to install custom nodes or download any models)
  4. Input your prompt and input image, and click generate.

That's all you need, to tinker with the Wan playground!


Download Wan 2.1 Workflow & Examples

If you want to create AI-generated videos without breaking the bank or running closed-source tools, Wan 2.1 is the best open option right now.

Workflow

Example 1: daring cat and dog jet ski

0:00
/0:05

Wan 2.1 – Open-Source AI Video Generation - Download Workflow and Examples

Prompt:

A daring cat, wearing a safety jacket and goggles, speeds across the water on a jet ski, leaving a trail of splashes behind. Chasing close behind, two determined dogs, also in safety jackets, ride their own jet skis, their ears flapping in the wind. The cat grips the handlebars tightly, eyes wide with excitement, while the dogs bark playfully, closing the gap. The bright sun reflects off the waves, creating a thrilling high-speed chase on the open water.

Input image:

Example 2: girl playing with her dog

0:00
/0:05

Wan 2.1 – Open-Source AI Video Generation - Download Workflow and Examples

Example 3: cat flying her jet

0:00
/0:03

Input image:

Prompt:

A fearless tabby cat, dressed in a vintage aviator outfit with leather goggles and a pilot's scarf, grips the controls of an old retro biplane soaring through the sky. The wind ruffles its fur as it confidently navigates the aircraft, with a spinning propeller and a trail of white smoke behind. Below, rolling green fields and a golden sunset create a dramatic backdrop. The cockpit is filled with classic dials and levers, giving the scene an adventurous, nostalgic feel. The cat's focused eyes and determined expression make it look like a true flying ace.


Detailed Guide on Why Wan 2.1 is the Best AI Video Model Right Now

Renowned for its ability to exceed the performance of other open-source models like Hunyuan and LTX, as well as numerous commercial alternatives, Wan 2.1 delivers truly incredible text2video and image2video generations with little effort. While it sacrifices generation speed compared to competitors, the quality of its video production is exceptional.

Discover why Wan 2.1 is the best AI video model right now.
0:00 /0:05 1× The recently released Wan 2.1 is a groundbreaking open-source AI video model. Renowned for its ability to exceed the performance of other open-source models like Hunyuan and LTX, as well as numerous commercial alternatives, Wan 2.1 delivers truly incredible text2video and image2video generations


Frequently Asked Questions


1. What is Wan 2.1?
Wan 2.1 is an open-source AI model from Alibaba that generates videos from text and images. It’s powerful, runs on consumer GPUs, and supports video editing, text rendering, and sound generation.

2. Do I need a supercomputer?
No. A 4090 can generate a 5-second 480P video in about 4 minutes. The smallest model runs on 8.19GB VRAM, which works on many consumer GPUs.

3. What kind of videos can I make?

  • Text to Video (describe a scene, get a video)
  • Image to Video (turn an image into an animation)
  • Video Editing (inpainting, outpainting, structure/pose control)
  • AI Text in Videos (auto-generated video titles, stylized effects)
  • Auto Sound Effects (syncs music & sound to video)

4. Can I use it for commercial projects?

Yes. Paid plans include a commercial license. If you're running it yourself, check the open-source license terms.

6. How long does it take to generate a video?

Depends on your GPU and settings.

  • 5s of 480P video → ~4 min on a 4090
  • Higher resolutions or longer videos take more time

7. What input formats does it support?

  • Text (describe what you want)
  • Images (use a reference image)
  • Video (for editing)

8. Can it generate videos in different languages?

Yes. It supports English and Chinese text in videos.

9. Is there a max video length?

If you're using the hosted version, yes (depends on your plan). If you're running it locally, it's limited by your hardware.

10. How does it compare to closed-source AI models?

It competes with some commercial models and beats most open-source options. Performance is solid, especially for text-to-video and realistic motion.


ComfyUI on ThinkDiffusion


ComfyUI offers a flowchart/node-based user interface for advanced students, providing a powerful and intuitive tool for managing workflows.

0:00
/0:22

ComfyUI – The Complete Guide to Node-Based Workflows
ComfyUI is a node-based interface for Stable Diffusion. Instead of using sliders and buttons like Automatic1111, you build visual workflows using nodes—modular blocks that perform specific tasks like loading a model, processing a prompt, or generating an image.

Skip Installation with ThinkDiffusion cloud

ThinkDiffusion lets you spin up virtual machines loaded with your choice of open-source apps. Run models from anywhere—Hugging Face, Civitai, or your own—and install custom nodes for ComfyUI. Your virtual machine works just like a personal computer.

The best part? Pick hardware that fits your needs, from 16GB to 48GB VRAM (with an 80GB option coming soon).

Plus, ThinkDiffusion updates apps quickly and maintains multiple versions to keep your workflow running smoothly.


Learn About Other Open-Source AI Video Models

AI Video Speed: How LTX is Reshaping Video2Video as We Know It
LTX Video to Video can deliver really powerful results in amazing speed and that’s just the power of LTX - speed. With this video2video workflow we’ll be transforming input videos in AI counterparts with amazing efficiency.

LTX Video2Video

Unleashing Creativity: How Hunyuan Redefines Video Generation
💡Update 03/14/2025: Uploaded a new version of workflow 0:00 /0:04 1× Hey there, video enthusiasts! It’s a thrill to see how quickly things are changing, especially in the way we create videos. Picture this: with just a few clicks, you can transform your existing clips

Hunyuan in ComfyUI

Let's stay connected

PRODUCT

THINKDIFFUSION STUDIO

Copyright © 2025 Think Diffusion Inc. All rights reserved.
Privacy policy

|

Terms of Service