W&B report link: https://wandb.ai/aapetukhov-new-economic-school/Neural-Vocoder/reports/Neural-Vocoder-Updated---VmlldzoxMDY2ODUzOQ
About • Installation • How To Use • Credits • License
HiFi-GAN implemented from scratch in PyTorch.
This repository contains a project on Neural Vocoder HiFi-GAN (Generative Adversarial Network) with all necessary scripts for training and evaluating the model. Model was trained on a single P100 GPU.
Follow these steps to install the project:
-
(Optional) Create and activate new environment using or
venv(+pyenv).# create env ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env # alternatively, using default python version python3 -m venv project_env # activate env source project_env
-
Install all required packages
pip install -r requirements.txt
-
Install
pre-commit:pre-commit install
This is a sample of generated speech from text about Dmitry Shostakovich.
shostakovich.mp4
Hear more on wandb.
To train a model, log in to wandb, clone the repo on Kaggle into the working area and run the following command:
python train.py -cn=kaggle_big_gradWhere all configs are from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments.
Download the pretrained model from here and locate them in your directory.
- To download from terminal:
# install gdown
pip install gdown
# download my model
gdown 1UH1s6hQSKoNIrokyk1_ZJc6Ud8b2HAaaTo run inference LOCALLY your dataset should have a strict structure. See, for example, test_dataset folder.
- To generate an audio which speaks arbitrary text
<TEXT>(which you print in the command line), run the command below. In my case, for example,<MODEL_PATH>is"checkpoint-epoch20.pth".
python synthesize.py -cn=infer_input2speech \
'+datasets.test.index=[{text: "<TEXT>", audio_len: 0, path: "anything.txt"}]' \
'inferencer.from_pretrained="<MODEL_PATH>"'- To generate audio from a given audio, put your audios into
<AUDIO_DIR>and their transcriptions to<TRANS_DIR>(see, for example, mytext_dataset/audiosandtext_dataset/transcriptions).
python synthesize.py -cn=infer_speech2speech \
'++datasets.test.audio_dir="<AUDIO_DIR>"' \
'+datasets.test.transcription_dir="<TRANS_DIR>"' \
'inferencer.from_pretrained="<MODEL_PATH>"'- To generate audio from given texts in a form of dataset, put your texts into
<TEXT_DIR>(see, for example, mytext_dataset/transcriptions) and run:
python synthesize.py -cn=infer_text2speech \
'++datasets.test.data_dir="<TEXT_DIR>"' \
'inferencer.from_pretrained="<MODEL_PATH>"'After generation check the folder saved_audios, you will find your audios there
This repository is based on a PyTorch Project Template.