Skip to content

Gradio application for spawning audio items from a RAVE V2 VAE using RAVE-Latent Diffusion models.

License

Notifications You must be signed in to change notification settings

devstermarts/black-latents-latent-diffusion-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Black Latents | Latent Diffusion

A gradio application that allows you to spawn audio items from Black Latents, a RAVE V2 VAE trained on the Black Plastics series series using RAVE-Latent Diffusion models.

UI

A demo version of this application is accessible on Huggingface.

Author: Martin Heinze | marts~.

Initial commit year: 2025


Setup app for local inference

Create and activate a new conda environment e.g.:

conda create -n black-latents-latent-diffusion-app python=3.11
conda activate black-latents-latent-diffusion-app

Clone this repository, run pip install on requirements.txt

git clone https://github.com/devstermarts/black-latents-latent-diffusion-app.git
cd black-latents-latent-diffusion-app
pip install .

Start the app with...

gradio app.py

Open app in browser (usually localhost:7860).

Request model access and download

To use the app, you need both diffusion models and the Black Latents VAE from the model hub on on Huggingface. The hub is public, but you're required to request access.

After you've been granted access, create an access token on Huggingface. Next: register your token to your shell configuration file and reload.

export HF_TOKEN="your_hugging_face_access_token_here" # Add token to shell configuration
source ~/.bashrc # Reload 
echo $HF_TOKEN # Double check if the token has been set. 

The models will be downloaded to your local machine (>13GB) on the app's first request - expect some initial loading time.

Credits

  • Diffusion models have been trained using RAVE-Latent Diffusion (Flex'Ed), a fork of the original repository RAVE-Latent-Diffusion by Moisés Horta Valenzuela.
  • RAVE is a variational autoencoder for fast and high-quality neural audio synthesis by Antoine Caillon and Philippe Esling (acids/IRCAM)

License

Black Latents | Latent Diffusion © 2025 by Martin Heinze is licensed under CC BY-NC-SA 4.0.

About

Gradio application for spawning audio items from a RAVE V2 VAE using RAVE-Latent Diffusion models.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages