VidToMe: Video Token Merging for Zero-Shot Video Editing
Paper
•
2312.10656
•
Published
•
11
Edit videos instantly with just a prompt! 🎥
Diffusers Implementation of VidToMe is a diffusion-based pipeline for zero-shot video editing that enhances temporal consistency and reduces memory usage by merging self-attention tokens across video frames. This approach allows for a harmonious video generation and editing without needing to fine-tune the model. By aligning and compressing redundant tokens across frames, VidToMe ensures smooth transitions and coherent video output, improving over traditional video editing methods. It follows by this paper.
from diffusers import DiffusionPipeline
# load the pretrained model
pipeline = DiffusionPipeline.from_pretrained(
"jadechoghari/VidToMe",
trust_remote_code=True,
custom_pipeline="jadechoghari/VidToMe",
sd_version="depth",
device="cuda",
float_precision="fp16"
)
# set prompts for inversion and generation
inversion_prompt = "flamingos standing in the water near a tree."
generation_prompt = {"origami": "rainbow-colored origami flamingos standing in the water near a tree."}
# additional control and parameters
control_type = "none" # No extra control, use "depth" if needed
negative_prompt = ""
# Run the video-to-image editing pipeline
generated_images = pipeline(
video_path="path/to/video.mp4", # add path to the input video
video_prompt=inversion_prompt, # inversion prompt
edit_prompt=generation_prompt, # edit prompt for generation
control_type=control_type # control type (e.g., "none", "depth")
)
Model Authors:
For more check the Github Repo.