Skip to content

aapetukhov/Neural-Vocoder

Repository files navigation

Neural Vocoder (HiFi-GAN) with PyTorch

W&B report link: https://wandb.ai/aapetukhov-new-economic-school/Neural-Vocoder/reports/Neural-Vocoder-Updated---VmlldzoxMDY2ODUzOQ

AboutInstallationHow To UseCreditsLicense

About

HiFi-GAN implemented from scratch in PyTorch.

This repository contains a project on Neural Vocoder HiFi-GAN (Generative Adversarial Network) with all necessary scripts for training and evaluating the model. Model was trained on a single P100 GPU.

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using or venv (+pyenv).

    # create env
    ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env
    
    # alternatively, using default python version
    python3 -m venv project_env
    
    # activate env
    source project_env
  2. Install all required packages

    pip install -r requirements.txt
  3. Install pre-commit:

    pre-commit install

Hear with your ears

This is a sample of generated speech from text about Dmitry Shostakovich.

shostakovich.mp4

Hear more on wandb.

How To Train

To train a model, log in to wandb, clone the repo on Kaggle into the working area and run the following command:

python train.py -cn=kaggle_big_grad

Where all configs are from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments.

How To Evaluate

Download the pretrained model from here and locate them in your directory.

  1. To download from terminal:
# install gdown
pip install gdown

# download my model
gdown 1UH1s6hQSKoNIrokyk1_ZJc6Ud8b2HAaa

To run inference LOCALLY your dataset should have a strict structure. See, for example, test_dataset folder.

  1. To generate an audio which speaks arbitrary text <TEXT> (which you print in the command line), run the command below. In my case, for example, <MODEL_PATH> is "checkpoint-epoch20.pth".
python synthesize.py -cn=infer_input2speech \
   '+datasets.test.index=[{text: "<TEXT>", audio_len: 0, path: "anything.txt"}]' \
   'inferencer.from_pretrained="<MODEL_PATH>"'
  1. To generate audio from a given audio, put your audios into <AUDIO_DIR> and their transcriptions to <TRANS_DIR> (see, for example, my text_dataset/audios and text_dataset/transcriptions).
python synthesize.py -cn=infer_speech2speech \
   '++datasets.test.audio_dir="<AUDIO_DIR>"' \
   '+datasets.test.transcription_dir="<TRANS_DIR>"' \
   'inferencer.from_pretrained="<MODEL_PATH>"'
  1. To generate audio from given texts in a form of dataset, put your texts into <TEXT_DIR> (see, for example, my text_dataset/transcriptions) and run:
python synthesize.py -cn=infer_text2speech \
   '++datasets.test.data_dir="<TEXT_DIR>"' \
   'inferencer.from_pretrained="<MODEL_PATH>"'

After generation check the folder saved_audios, you will find your audios there

Credits

This repository is based on a PyTorch Project Template.

License

License

About

HiFi-GAN implemented from scratch in PyTorch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages