Neural Vocoder (HiFi-GAN) with PyTorch

W&B report link: https://wandb.ai/aapetukhov-new-economic-school/Neural-Vocoder/reports/Neural-Vocoder-Updated---VmlldzoxMDY2ODUzOQ

About • Installation • How To Use • Credits • License

About

HiFi-GAN implemented from scratch in PyTorch.

This repository contains a project on Neural Vocoder HiFi-GAN (Generative Adversarial Network) with all necessary scripts for training and evaluating the model. Model was trained on a single P100 GPU.

Installation

Follow these steps to install the project:

(Optional) Create and activate new environment using or venv (+pyenv).

# create env
~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env

# alternatively, using default python version
python3 -m venv project_env

# activate env
source project_env

Install all required packages
```
pip install -r requirements.txt
```
Install pre-commit:
```
pre-commit install
```

Hear with your ears

This is a sample of generated speech from text about Dmitry Shostakovich.

shostakovich.mp4

Hear more on wandb.

How To Train

To train a model, log in to wandb, clone the repo on Kaggle into the working area and run the following command:

python train.py -cn=kaggle_big_grad

Where all configs are from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments.

How To Evaluate

Download the pretrained model from here and locate them in your directory.

To download from terminal:

# install gdown
pip install gdown

# download my model
gdown 1UH1s6hQSKoNIrokyk1_ZJc6Ud8b2HAaa

To run inference LOCALLY your dataset should have a strict structure. See, for example, test_dataset folder.

To generate an audio which speaks arbitrary text <TEXT> (which you print in the command line), run the command below. In my case, for example, <MODEL_PATH> is "checkpoint-epoch20.pth".

python synthesize.py -cn=infer_input2speech \
   '+datasets.test.index=[{text: "<TEXT>", audio_len: 0, path: "anything.txt"}]' \
   'inferencer.from_pretrained="<MODEL_PATH>"'

To generate audio from a given audio, put your audios into <AUDIO_DIR> and their transcriptions to <TRANS_DIR> (see, for example, my text_dataset/audios and text_dataset/transcriptions).

python synthesize.py -cn=infer_speech2speech \
   '++datasets.test.audio_dir="<AUDIO_DIR>"' \
   '+datasets.test.transcription_dir="<TRANS_DIR>"' \
   'inferencer.from_pretrained="<MODEL_PATH>"'

To generate audio from given texts in a form of dataset, put your texts into <TEXT_DIR> (see, for example, my text_dataset/transcriptions) and run:

python synthesize.py -cn=infer_text2speech \
   '++datasets.test.data_dir="<TEXT_DIR>"' \
   'inferencer.from_pretrained="<MODEL_PATH>"'

After generation check the folder saved_audios, you will find your audios there

Credits

This repository is based on a PyTorch Project Template.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
pretrained_models/tts-fastspeech2-ljspeech		pretrained_models/tts-fastspeech2-ljspeech
saved_audios		saved_audios
src		src
test_dataset		test_dataset
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
synthesize.py		synthesize.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neural Vocoder (HiFi-GAN) with PyTorch

About

Installation

Hear with your ears

How To Train

How To Evaluate

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

aapetukhov/Neural-Vocoder

Folders and files

Latest commit

History

Repository files navigation

Neural Vocoder (HiFi-GAN) with PyTorch

About

Installation

Hear with your ears

How To Train

How To Evaluate

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages