From 05d555f923306bf8458f34e79646fe8e0aff7599 Mon Sep 17 00:00:00 2001 From: YuxuanCai Date: Mon, 21 Oct 2024 00:56:13 +0000 Subject: [PATCH 1/2] Update text_encoder/config.json --- text_encoder/config.json | 1 - 1 file changed, 1 deletion(-) diff --git a/text_encoder/config.json b/text_encoder/config.json index 5f53503..87eddfc 100644 --- a/text_encoder/config.json +++ b/text_encoder/config.json @@ -1,5 +1,4 @@ { - "_name_or_path": "/cpfs/data/user/larrytsai/Projects/Yi-VG/allegro/text_encoder", "architectures": [ "T5EncoderModel" ], From e522c750bf494bd426954ccf016940cf8753c58c Mon Sep 17 00:00:00 2001 From: RhymesAI Date: Mon, 21 Oct 2024 02:30:04 +0000 Subject: [PATCH 2/2] Update README.md --- README.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index c56469c..560a87c 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,10 @@ library_name: diffusers # Key Feature -Allegro is capable of producing high-quality, 6-second videos at 30 frames per second and 720p resolution from simple text prompts. +- **High-Quality Output**: Generate detailed 6-second videos at 15 FPS with 720x1280 resolution, which can be interpolated to 30 FPS with EMA-VFI. +- **Small and Efficient**: Features a 175M parameter VAE and a 2.8B parameter DiT model. Supports multiple precisions (FP32, BF16, FP16) and uses 9.3 GB of GPU memory in BF16 mode with CPU offloading. +- **Extensive Context Length**: Handles up to 79.2k tokens, providing rich and comprehensive text-to-video generation capabilities. +- **Versatile Content Creation**: Capable of generating a wide range of content, from close-ups of humans and animals to diverse dynamic scenes. # Model info @@ -29,7 +32,7 @@ Allegro is capable of producing high-quality, 6-second videos at 30 frames per s Description - Text-to-Video Diffusion Transformer + Text-to-Video Generation Model Download @@ -76,17 +79,14 @@ Allegro is capable of producing high-quality, 6-second videos at 30 frames per s You can quickly get started with Allegro using the Hugging Face Diffusers library. For more tutorials, see Allegro GitHub (link-tbd). -Install necessary requirements: -```python -pip install diffusers transformers imageio -``` -Inference on single gpu: +1. Install necessary requirements. Please refer to [requirements.txt](https://github.com/rhymes-ai) on Allegro GitHub. +2. Perform inference on a single GPU. ```python from diffusers import DiffusionPipeline import torch allegro_pipeline = DiffusionPipeline.from_pretrained( - "rhythms-ai/allegro", trust_remote_code=True, torch_dtype=torch.bfloat16 +"rhymes-ai/Allegro", trust_remote_code=True, torch_dtype=torch.bfloat16 ).to("cuda") allegro_pipeline.vae = allegro_pipeline.vae.to(torch.float32) @@ -121,8 +121,10 @@ out_video = allegro_pipeline( ).video[0] imageio.mimwrite("test_video.mp4", out_video, fps=15, quality=8) - ``` +Tip: +- It is highly recommended to use a video frame interpolation model (such as EMA-VFI) to enhance the result to 30 FPS. +- For more tutorials, see [Allegro GitHub](https://github.com/rhymes-ai). # License This repo is released under the Apache 2.0 License.